Logistic Regression

Logistic regression is a statistical method used to model and predict the probability of a binary outcome (e.g., success/failure, yes/no) based on one or more independent variables. Unlike linear regression, logistic regression predicts probabilities, which are then used to classify outcomes.

Key Concepts

Binary Dependent Variable:
The outcome variable ($Y$) has two possible values, typically coded as 0 and 1.
- Example: 1 = Approved, 0 = Not Approved.
Independent Variables:
These can be continuous, categorical, or a mix of both.
- Example: Income, age, education level.
Logistic Function (Sigmoid Function):
Converts the linear combination of inputs into probabilities between 0 and 1.

$$P(Y=1 \mid X) = \frac{1}{1 + e^{-(\beta_0 + \beta_1X_1 + \beta_2X_2 + \dots + \beta_kX_k)}}$$
Log-Odds Transformation: The logistic model estimates the log-odds of the probability:

$$\text{Log-Odds} = \ln\left(\frac{P}{1-P}\right) = \beta_0 + \beta_1X_1 + \beta_2X_2 + \dots + \beta_kX_k$$

Assumptions of Logistic Regression

Binary Outcome: The dependent variable is binary.
Independence of Observations: Each observation is independent.
Linearity in the Log-Odds: The relationship between the independent variables and the log-odds is linear.
No Multicollinearity: Independent variables are not highly correlated.

Types of Logistic Regression

Binary Logistic Regression:
Predicts a binary outcome (e.g., pass/fail).
Multinomial Logistic Regression:
Predicts outcomes with more than two categories without an order (e.g., types of transportation).
Ordinal Logistic Regression:
Predicts outcomes with ordered categories (e.g., customer satisfaction levels: low, medium, high).

Steps in Logistic Regression

Step 1: Define the Model

Specify the dependent and independent variables.

Step 2: Fit the Model

Estimate the coefficients ($\beta_0, \beta_1, \dots$) using maximum likelihood estimation (MLE).

Step 3: Interpret Coefficients

Positive Coefficient: Increases the log-odds of $Y = 1$.
Negative Coefficient: Decreases the log-odds of $Y = 1$.

Step 4: Predict Probabilities

Convert the log-odds to probabilities using the logistic function.

Step 5: Evaluate the Model

Assess model performance using accuracy, precision, recall, and $AUC-ROC$ (Area Under the Curve - Receiver Operating Characteristic).

Example

Problem:

A bank wants to predict whether a loan applicant will default ($Y = 1$) or not ($Y = 0$) based on income ($X_1$) and credit score ($X_2$).

Dataset:

Income ($X_1$)	Credit Score ($X_2$)	Default ($Y$)
40,000	600	1
50,000	650	0
60,000	700	0
70,000	750	0
80,000	800	1

Solution:

Logistic Model:

$$P(Y=1 \mid X_1, X_2) = \frac{1}{1 + e^{-(\beta_0 + \beta_1X_1 + \beta_2X_2)}}$$
Estimated Coefficients:

$$\beta_0 = -10$$

$$\beta_1 = 0.0001$$

$$\beta_2 = 0.02$$
Prediction for New Applicant: Applicant with income $X_1 = 65,000$ and credit score $X_2 = 720$:

$$\text{Log-Odds} = -10 + 0.0001(65,000) + 0.02(720) = -10 + 6.5 + 14.4 = 10.9$$

$$P(Y=1) = \frac{1}{1 + e^{-10.9}} \approx 0.999$$

The applicant is very likely to default.

Model Evaluation Metrics

Confusion Matrix:
- True Positive (TP): Correctly predicted defaults.
- False Positive (FP): Incorrectly predicted defaults.
- True Negative (TN): Correctly predicted non-defaults.
- False Negative (FN): Incorrectly predicted non-defaults.
Accuracy:

$$\text{Accuracy} = \frac{\text{TP} + \text{TN}}{\text{Total Observations}}$$
Precision:

$$\text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}}$$
Recall:

$$\text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}}$$
AUC-ROC: Measures the model's ability to distinguish between classes. A higher value indicates better performance.

Applications of Logistic Regression

Healthcare: Predicting the likelihood of a disease based on patient attributes.
Finance: Determining loan defaults or creditworthiness.
Marketing: Classifying customers as likely or unlikely to purchase.

Conclusion

Logistic regression is a powerful and widely used statistical tool for binary classification problems. By understanding its principles, assumptions, and evaluation metrics, you can use logistic regression to make accurate predictions and informed decisions.

Next Steps: ANOVA and MANOVA

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3. Logistic Regression.md

3. Logistic Regression.md

Logistic Regression

Key Concepts

Assumptions of Logistic Regression

Types of Logistic Regression

Steps in Logistic Regression

Step 1: Define the Model

Step 2: Fit the Model

Step 3: Interpret Coefficients

Step 4: Predict Probabilities

Step 5: Evaluate the Model

Example

Problem:

Dataset:

Solution:

Model Evaluation Metrics

Applications of Logistic Regression

Conclusion

Files

3. Logistic Regression.md

Latest commit

History

3. Logistic Regression.md

File metadata and controls

Logistic Regression

Key Concepts

Assumptions of Logistic Regression

Types of Logistic Regression

Steps in Logistic Regression

Step 1: Define the Model

Step 2: Fit the Model

Step 3: Interpret Coefficients

Step 4: Predict Probabilities

Step 5: Evaluate the Model

Example

Problem:

Dataset:

Solution:

Model Evaluation Metrics

Applications of Logistic Regression

Conclusion