# How to Estimate a Multiple Linear Regression Model

Posted on
r-squared regression

## Multiple Regression

A prior tutorial described simple regression as a mapping of a single predictor to an outcome variable. This tutorial covers the case when there is more than one independent variable, also known as multiple regression. Although simple regression is a useful tool for extracting information about bivariate relationships that goes beyond what we get from a correlation or t-test, the real power of regression comes from its ability to incorporate multiple independent variables. This tutorial will build on the concepts discussed in our simple regression tutorial to explain multiple regression.

The data used in this tutorial are again from the More Tweets, More Votes: Social Media as a Quantitative Indicator of Political Behavior study from DiGrazia J, McKelvey K, Bollen J, Rojas F (2013), which investigated the relationship between social media mentions of candidates in the 2010 and 2012 US House elections with actual vote results. The authors have helpfully provided replication materials. The results presented here are for pedagogical purposes only.

The variables used in this tutorial are:

• vote_share (dependent variable): The percent of voters for a Republican candidate
• mshare (independent variable): The percent of social media posts for a Republican candidate
• pct_white (independent variable): The percentage of white voters in a given Congressional district

Take a look at the first six observations in the data:

Tweet Share Vote Share Percent White
51.09 26.26 64.2
59.48 0.00 64.3
57.94 65.08 75.7
27.52 33.33 34.6
69.32 79.41 66.8
53.20 40.32 70.8

The simple regression post established an association between Republican tweets and Republican votes, but is this relationship spurious? In other words, is there some other variable, for example race, that is confounding this relationship?

In multiple regression, we partial out the independent effect that each independent variable has on the dependent variable. The following graphic demonstrates the sources of variation that are used to estimate $$y$$ in a model with two independent variables. Think of $$x_1$$ as the variable “Tweet Share” and $$x_2$$ as the variable “Percent White.”