Note: For a fuller treatment, download our online seminar Maximum Likelihood Estimation for Categorical Dependent Variables.

Logit and probit models are appropriate when attempting to model a dichotomous dependent variable, e.g. yes/no, agree/disagree, like/dislike, etc. The problems with utilizing the familiar linear regression line are most easily understood visually. As an example, say we want to model whether somebody does or does not have Bieber fever by how much beer they’ve consumed. We collect data from a college frat house and attempt to model the relationship with linear (OLS) regression.

`## Warning: Removed 25 rows containing missing values (geom_path).`

There are several problems with this approach. First, the regression line may lead to predictions outside the range of zero and one. Second, the functional form assumes the first beer has the same marginal effect on Bieber fever as the tenth, which is probably not appropriate. Third, a residuals plot would quickly reveal heteroskedasticity.

Logit and probit models solve each of these problems by fitting a nonlinear function to the data that looks like the following:

```
a = seq(3, 9, .1)
b = pnorm(a, mean = 6, sd = 1)
data2 <- data.frame(a,b)
data2%>%
ggplot(aes(x = a, y = b)) + geom_line() + scale_x_continuous(name = "Number of Beers Consumed", limits = c(3,9), breaks = seq(3,9,1)) + scale_y_continuous(name = "Beiber Fever? 0 = No, 1 = Yes", limits = c(0,1), breaks = seq(0,1,.2)) + ggtitle("Like Justin Bieber by Number of Beers Consumed")
```

The straight line has been replaced by an S-shaped curve that 1) respects the boundaries of the dependent variable; 2) allows for different rates of change at the low and high ends of the beer scale; and 3) (assuming proper specification of independent variables) does away with heteroskedasticty.

What logit and probit do, in essence, is take the the linear model and feed it through a function to yield a nonlinear relationship. Whereas the linear regression predictor looks like:

\[ \hat{Y} = \alpha + \beta x \]

The logit and probit predictors can be written as:

\[ \hat{Y} = f(\alpha + \beta x) \]

Logit and probit differ in how they define \(f (*)\). The logit model uses something called the cumulative distribution function of the logistic distribution. The probit model uses something called the cumulative distribution function of the standard normal distribution to define \(f (*)\). Both functions will take any number and rescale it to fall between 0 and 1. Hence, whatever α + βx equals, it can be transformed by the function to yield a predicted probability. Any function that would return a value between zero and one would do the trick, but there is a deeper theoretical model underpinning logit and probit that requires the function to be based on a probability distribution. The logistic and standard normal cdfs turn out to be convenient mathematically and are programmed into just about any general purpose statistical package.

Is logit better than probit, or vice versa? Both methods will yield similar (though not identical) inferences. Logit – also known as logistic regression – is more popular in health sciences like epidemiology partly because coefficients can be interpreted in terms of odds ratios. Probit models can be generalized to account for non-constant error variances in more advanced econometric settings (known as heteroskedastic probit models) and hence are used in some contexts by economists and political scientists. If these more advanced applications are not of relevance, than it does not matter which method you choose to go with.

Still have questions? Contact us!