Binomial Logistic Regression

Binomial Logistic Regression

Binomial logistic regression is a statistical test for predicting the likelihood of an observation belonging to one of two possible categories of a binary dependent variable. This prediction is based on one or more independent variables, which can be continuous or categorical.

This form of regression shares similarities with linear regression, except for the nature of the dependent variable; linear regression deals with a continuous dependent variable, while binomial logistic regression works with a binary (dichotomous) one. Instead of predicting a specific value for the dependent variable, binomial logistic regression aims to predict the probability of an observation falling into a particular category based on the independent variables. The observation is then classified into the category that is deemed most probable. Binomial logistic regression can incorporate interactions between independent variables to enhance prediction accuracy.

For example, binomial logistic regression could be used to determine whether employees are likely to stay with or leave a company. The dependent variable here is “employment status,” with the two categories being “stay” or “leave.” The independent variables might include:

• “Number of years with the company” (a continuous variable).
• “Satisfaction with management” (a categorical variable with levels like “satisfied,” “neutral,” and “dissatisfied”).
• “Type of employment” (a dichotomous variable, such as “full-time” vs. “part-time”).

Based on these factors, this model would allow predictions about an employee’s likelihood to stay or leave.

In general, binomial logistic regression offers a robust way to analyze and predict outcomes with binary dependent variables, utilizing continuous and categorical independent variables to inform its predictions.

Assumptions of Binomial Logistic Regression

In order to run a binomial logistic regression, there are seven assumptions that need to be considered. The first four assumptions relate to your choice of study design and the measurements you chose to make, while the other three assumptions relate to how your data fits the binomial logistic regression model. These assumptions are:

• Assumption #1 for conducting binomial logistic regression is the presence of a dichotomous single dependent variable, meaning it consists of two possible outcomes or categories. This variable is nominal, where the outcomes are distinctly categorized without any inherent order. Examples of such dichotomous variables include smoking status (with two outcomes: “smoker” or “non-smoker”), passing a driving test (two outcomes: “pass” or “fail”), housing type (two outcomes: “apartment” or “house”), and medication adherence (two outcomes: “adherent” or “non-adherent”).

Note 1: The dependent variable in this context may also be referred to as the “outcome,” “target,” or “criterion” variable. Regardless of the terminology used, the concept remains the same, and for clarity, we will continue to use “dependent variable” throughout.

Note 2: While we describe the dependent variable as having two “outcomes,” it’s also common to refer to these as “categories.” For example, a variable like “smoking status” would have the categories “smoker” and “non-smoker.” The choice of terminology is flexible and does not affect the underlying principle of the variable’s dichotomous nature.

• Assumption #2: You have one or more independent variables measured on a continuous or nominal scale. Continuous variables can take on infinite values within a given range. For instance, temperature, time, height, weight, distance, age, blood pressure, speed, electricity consumption, and sound level are typical continuous variables. For instance, the temperature in a room can be any value within the limits of the thermometer, such as 22.5°C, 22.51°C, and so on. Similarly, time can be measured to any level of precision, like seconds, milliseconds, or even smaller units. Height and weight can vary infinitely within their possible range, measured in units like meters or feet, and can include fractions (like 1.75 meters). Distance between two points, age measured in years, months, days, and even smaller units, blood pressure measured in millimeters of mercury (mmHg), speed measured in units like kilometers per hour or miles per hour, electricity consumption measured in kilowatt-hours or other units, and sound level measured in decibels are other examples of continuous variables that can take on a range of continuous values. Nominal variables are a type of categorical variable where the categories do not have a natural order or ranking. Here are some examples of nominal variables, including Blood Type (categories like A, B, AB, and O), Marital Status (categories such as single, married, divorced, and widowed), Nationality (categories based on country of origin, like American, British, Canadian, etc.), Gender Identity (categories including male, female, non-binary, etc.), Type of Employment (categories such as full-time, part-time, self-employed, unemployed), Religion (categories like Christianity, Islam, Judaism, Hinduism, Buddhism, Atheism, etc.), Hair Color (categories including blonde, brunette, redhead, black, etc.), Vehicle Type (categories like car, truck, motorcycle, bicycle, etc.), Favorite Food (categories could include pizza, sushi, pasta, salad, etc.), and Eye Color (categories such as blue, green, brown, hazel, etc.).
• Assumption #3 emphasizes the need for independence of observations and requires that the categories of the binary dependent variable and all nominal independent variables are mutually exclusive and exhaustive. This assumption means that each observation must fall into only one category of the dependent variable and each category of the independent variables without any overlap or ambiguity. For instance, in a study using binomial logistic regression to determine whether patients adhere to a medication regimen, the dependent variable might be “medication adherence” with two categories: “adherent” and “non-adherent.” For the independent variables, consider “prescription type” as a dichotomous variable (categories: “long-term” and “short-term”) and “patient education level” as an ordinal variable (categories: “high school,” “college,” “graduate”). In this context, a patient can only be classified as either adherent or non-adherent to their medication regimen, but not both. Similarly, a patient can only belong to one category of “prescription type” and one level of “patient education.” The independence of observations means that each patient’s classification in one category doesn’t influence their classification in another. Ensuring the independence of observations and the mutual exclusivity of categories is more a matter of careful study design than something that can be tested using software like SPSS Statistics. It’s a critical assumption for the validity of binomial logistic regression. If this assumption is violated, indicating that observations are related or categories overlap, alternative statistical methods like linear mixed models or Generalized Estimating Equations (GEE) may be more appropriate.
• Assumption #4 is that the sample size should be adequate for generating reliable estimates. It is generally recommended to have a minimum of 15 cases for each independent variable to achieve this, although some experts suggest aiming for as many as 50 cases per independent variable. This recommendation is consistent with other multivariate techniques like multiple regression, which also have recommendations for minimum sample size. It’s worth noting that the reliability of estimates tends to decline when there are fewer cases for certain combinations due to the maximum likelihood estimation (MLE) method used in binomial logistic regression.
• Assumption #5 involves the necessity of a linear relationship between the continuous independent variables and the logit transformation of the dependent variable. This linearity assumption implies that for continuous independent variables like income level, hours of exercise per week, and blood sugar levels, there should be a linear relationship with the logit of the dependent variable, such as the probability of developing diabetes. Various methods can be employed to assess this linearity, with one common approach being the Box-Tidwell procedure. This technique involves creating interaction terms between each continuous independent variable and its natural logarithm and adding these to the logistic regression model. This technique can be implemented using software like SPSS Statistics, which offers the Binary Logistic procedure to test for this assumption. The results of this test are then interpreted to decide the next steps in the analysis, depending on whether the linearity assumption holds or is violated. If the assumption is met, the analysis can proceed as planned. However, if the assumption is not met, adjustments to the model or alternative methods may be necessary to address the non-linearity appropriately.
• Assumption #6 is that avoiding multicollinearity in your data is essential to ensure accurate results. Multicollinearity occurs when two or more independent variables in your data set are highly correlated. This assumption creates difficulties in determining which independent variable contributes to the variance in the dependent variable, as well as technical issues in calculating a binomial logistic regression model. You can identify multicollinearity by examining correlation coefficients and Tolerance/VIF values in your data set.
• Assumption #7 is that no significant outliers, high leverage, or highly influential points should be present in your data set. Outliers, leverage, and influential points are terms used to describe unusual observations in your data that can negatively impact the regression line. These different types of unusual points have varying degrees of impact on the regression equation used to predict the dependent variable based on the independent variables. When running binomial logistic regression in SPSS Statistics, detecting and eliminating possible outliers, high leverage points, and highly influential points is essential to ensure accurate and statistically significant results.

Interpret Your Binomial Logistic Regression Results

After completing the binomial logistic regression analysis and confirming that your dataset meets the necessary assumptions in SPSS Statistics, the software will generate tables containing essential information for reporting your regression results. The interpretation of these results will focus on five key objectives:

1. **Data Preparation and Preliminary Analysis**: Initially, you should review your data and variables. This objective includes checking for missing cases and verifying the expected number of cases using the “Case Processing Summary” table. Confirm the correct coding of your dependent variable through the “Dependent Variable Encoding” table, and identify any issues with low counts in categories of your categorical independent variables using the “Categorical Variables Codings” table. These steps ensure that the data set is properly prepared for analysis.

2. **Baseline Model Analysis**: Before adding independent variables, analyze the baseline model, which includes only the constant. The “Classification Table,” “Variables in the Equation,” and “Variables not in the Equation” tables are crucial here. This baseline information provides a comparison point for the whole model and helps explain the improvement in prediction when independent variables are included.

3. **Main Binomial Logistic Regression Analysis**: This involves assessing the overall statistical significance of the model to determine its predictive power compared to a model without independent variables. The Hosmer and Lemeshow test is used to evaluate the model’s goodness of fit. At the same time, the Cox & Snell and Nagelkerke R Square values help quantify the variation in the dependent variable explained by the model. The Nagelkerke R2 value is typically reported due to its interpretative advantages.

4. **Prediction of Categories**: The logistic regression model estimates the probability of an event (e.g., developing a certain health condition) based on independent variables. SPSS Statistics classifies an event as occurring if the estimated probability is 0.5 or higher and not occurring if it’s less than 0.5.

5. **Analyzing Individual Variables**: The “Variables in the Equation” table helps understand the contribution and significance of each independent variable. The odds ratios and confidence intervals are used to interpret each variable’s impact. For instance, you might find that “the odds of developing a health condition are higher for individuals with a specific lifestyle factor than those without it.” This interpretation applies to both categorical and continuous independent variables.

ELEVATE YOUR RESEARCH WITH OUR FREE EVALUATION SERVICE!

Are you looking for expert assistance to maximize the accuracy of your research? Our team of experienced statisticians can help. We offer comprehensive assessments of your data, methodology, and survey design to ensure optimal accuracy so you can trust us to help you make the most out of your research.

WHY DO OUR CLIENTS LOVE US?

Expert Guidance: Our team brings years of experience in statistical analysis to help you navigate the complexities of your research.

Build on a Foundation of Trust: Join the numerous clients who’ve transformed their projects with our insights—’ The evaluation was a game-changer for my research!’

ACT NOW-LIMITED SPOTS AVAILABLE!

Take advantage of this free offer. Enhance your research journey at no cost and take the first step towards achieving excellence by contacting us today to claim your free evaluation. With the support of our experts, let’s collaborate and empower your research journey.