This tutorial shows how to fit a multiple regression model (that is, a linear regression with more than one independent variable) using SPSS. The details of the underlying calculations can be found in our multiple regression tutorial. The data used in this post come from the *More Tweets, More Votes: Social Media as a Quantitative Indicator of Political Behavior* study from DiGrazia J, McKelvey K, Bollen J, Rojas F (2013), which investigated the relationship between social media mentions of candidates in the 2010 and 2012 US House elections with actual vote results. The replication data in SPSS format can be downloaded from our github repo.

The variables of interest are:

`vote_share`

(*dependent variable*): The percent of votes for a Republican candidate`mshare`

(*independent variable*): The percent of social media posts for a Republican candidate`pct_white`

(*independent variable*): The percent of white voters in a given Congressional district

All three variables are measured as percentages ranging from zero to 100.

We can run the following line of syntax to delete all other variables.

```
DELETE VARIABLES
eshare to median_age pct_college to mccain_tert.
```

## Data Visualization

It is always a good idea to begin any statistical modeling with a graphical assessment of the data. This allows you to quickly examine the distributions of the variables and check for possible outliers. Go to **Graphs \(\rightarrow\) Chart Builder…**

Then select **Simple Histogram** as chart type, and click and drag `vote_share`

to the x-axis.

We can clean up the x-axis label in **Element Properties** on the right hand side.

Then click **OK**.

This creates the following figure:

The variable’s values (x-axis) fall within the range we expect. There is some negative skew in the distribution.

We can do the same thing for our tweet share and percent white variables and get the following figures:

We again see that the values fall into the range we expect. Note that there are also spikes at zero and 100. These indicate races where a single candidate received either all of the share of Tweets or none of the share of Tweets.

The following figure shows the distribution of the percentage white variable.

Again, the values fall in the range we’d expect. There is a negative skew in the distribution.

It is also helpful to look at the bivariate association between the variables. This allows us to see whether there is visual evidence of a relationship, which will help us assess whether the regression results we ultimately get make sense given what we see in the data. We will do this using chart builder and selecting **Simple Scatter with Fit Line**. This will return a scatterplot of the variables along with the best linear fit (i.e. the regression line) to better see the positive relationship.

Then click **OK**.

We do the same thing for the percent white variable and get the following plot:

There is a clear, positive association between these variables.

## Running the Regression

To run the regression, go to **Analyze \(\rightarrow\) Regression \(\rightarrow\) Linear…**

Select `vote_share`

as the dependent variable and `mshare`

and `pct_white`

as the independent variables. Then click **OK**.

We get the following output:

The first table lists the variables in the model.

The second table provides the model summary. The \(R\) value is given, though the \(R^2\) value is more commonly used in interpretation. The `R square`

value tells us that the independent variable explains 55.4% of the variation in the outcome. The adjusted \(R^2\) provides a slightly more conservative estimate of the percentage of variance explained, 55.2%. The `Std. Error of the Estimate`

gives a summary of how much the observed values vary around the predicted values, with better models having lower standard errors.

The third table provides us with an ANOVA table that gives 1) the sum of squares for the regression model, 2) the residual sum of squares, and 3) the total sum of squares. Dividing the `Sum of Squares`

column by the `df`

(degrees of freedom) column returns the mean squares in the `Mean Square`

column. These values go into calculating the \(R^2\), adjusted \(R^2\), and Standard Error of the Estimate shown in the previous table. The \(F\)-statistic tests the null hypothesis that the independent variables together do not help explain any variance in the outcome. We clearly reject the null hypothesis with \(p < 0.001\), as seen by `Sig. = 0.000`

.

The final table gives us the results of the regression model. The `Unstandardized B`

gives the coefficients used in the regression equation. The `(Constant)`

line is the estimate for the intercept in the multiple regression equation. This is the vote share we expect when Tweet share and percent white both equal zero. Here we see that the predicted value is 0.865. This value is of less interest to us compared to assessing the coefficients for `mshare`

and `pct_white`

. We can see that for each increase of one on the `mshare`

variable, the vote share increases by 0.178, holding percent white constant. For each increase of one on the percent white variable, the vote share increases by 0.55, holding tweet share constant.

The `Coefficients Std. Error`

tells us how much sample-to-sample variability we should expect. Dividing the coefficient by the standard error gives us the \(t\)-statistic used to calculate the \(p\)-value. Here we see that both the `mshare`

and `pct_white`

coefficient estimates are easily significant, \(p < 0.001\), while the `(Constant)`

is not, \(p=0.865\). In this application we don’t especially care about the constant. The standardized coefficients give us the association between the independent variables and dependent variable in standard deviation units. A one standard deviation increase in `mshare`

is associated with a change of 0.338 standard deviations in `vote_share`

, holding percent white constant.