This tutorial shows how to fit a simple regression model (that is, a linear regression with a single independent variable) using SPSS. The details of the underlying calculations can be found in our simple regression tutorial. The data used in this post come from the *More Tweets, More Votes: Social Media as a Quantitative Indicator of Political Behavior* study from DiGrazia J, McKelvey K, Bollen J, Rojas F (2013), which investigated the relationship between social media mentions of candidates in the 2010 and 2012 US House elections with actual vote results. The replication data in SPSS format can be downloaded from our github repo.

In this example, we will assess the relationship between the percentage of social media posts that mention a Congressional candidate and how well the candidates did in the next election. The variables of interest are:

`vote_share`

(*dependent variable*): The percent of votes for a Republican candidate`mshare`

(*independent variable*): The percent of social media posts for a Republican candidate

We can run the following line of syntax to delete all other variables.

```
DELETE VARIABLES
eshare to mccain_tert.
```

Both variables are measured as percentages ranging from zero to 100.

## Data Visualization

It is always a good idea to begin any statistical modeling with a graphical assessment of the data. This allows you to quickly examine the distributions of the variables and check for possible outliers. To do this, create a histogram for the `vote_share`

variable, our outcome of interest. Go to **Graphs \(\rightarrow\) Chart Builder…**

Then select **Simple Histogram** as chart type, and click and drag `vote_share`

to the x-axis.

We can clean up the x-axis label in **Element Properties** on the right hand side.

Then click **OK**.

This creates the following figure:

The variable’s values (x-axis) fall within the range we expect. There is some negative skew in the distribution.

We can do the same thing for our independent variable and get the following plot:

We again see that the values fall into the range we expect. Note that there are also spikes at zero and 100. These indicate races where a single candidate received either all of the share of Tweets or none of the share of Tweets.

It is also helpful to look at the bivariate association between the two variables. This allows us to see whether there is visual evidence of a relationship, which will help us assess whether the regression results we ultimately get make sense given what we see in the data. Once again, go to **Graphs \(\rightarrow\) Chart Builder…**

This time select **Simple Scatter with Fit Line**

We can edit the x and y-axis labels, then click **OK**. We get the following plot.

Here we are looking at a scatterplot of our observations, and we’ve also requested the best linear fit (i.e. the regression line) to better see the positive relationship. There is a clear, positive association between these variables.

## Running the Regression

To run the regression, go to **Analyze \(\rightarrow\) Regression \(\rightarrow\) Linear…**

Select `vote_share`

as the dependent variable and `mshare`

as the independent variable. Then click **OK**.

We get the following output:

The first table lists the variables in the model.

The second table provides the model summary. The \(R\) value is given, though the \(R^2\) value is more commonly used in interpretation. The `R square`

value tells us that the independent variable explains 25.89% of the variation in the outcome. The adjusted \(R^2\) provides a slightly more conservative estimate of the percentage of variance explained, 25.71%. The `Std. Error of the Estimate`

gives a summary of how much the observed values vary around the predicted values, with better models having lower standard errors.

The third table provides us with an ANOVA table that gives 1) the sum of squares for the regression model, 2) the residual sum of squares, and 3) the total sum of squares. Dividing the `Sum of Squares`

column by the `df`

(degrees of freedom) column returns the mean squares in the `Mean Square`

column. These values go into calculating the \(R^2\), adjusted \(R^2\), and Standard Error of the Estimate shown in the previous table. The \(F\)-statistic tests the null hypothesis that the independent variable does not help explain any variance in the outcome. We clearly reject the null hypothesis with \(p < 0.001\), as seen by `Sig. = 0.000`

.

The final table gives us the results of the regression model. The `Unstandardized B`

gives the coefficients used in the regression equation. The `(Constant)`

line is the estimate for the intercept in the simple regression equation. This is the vote share we expect when Tweet share equals zero. Here we see that the predicted value is 37.04, which coincides with what we saw above in the scatterplot. This value is of less interest to us compared to assessing the regression line slope, the coefficient for `mshare`

. We can see that for each increase of one on the `mshare`

variable, the vote share increases by 0.269.

The `Coefficients Std. Error`

tells us how much sample-to-sample variability we should expect. Dividing the coefficient by the standard error gives us the \(t\)-statistic used to calculate the \(p\)-value. Here we see that both the `mshare`

and `(Constant)`

coefficient estimates are easily significant, \(p < 0.001\), though in this application we don’t especially care about the constant. The standardized coefficient gives us the association between the independent variable and dependent variable in standard deviation units. A one standard deviation increase in `mshare`

is associated with a change of 0.509 standard deviations in `vote_share`

.

## Fun Facts about Simple Regression

In a simple regression only (that is, when there is just a single independent variable), the \(R^2\) is exactly equal to the squared Pearson correlation between the two variables. Also note that, in simple regression only, the standardized coefficient is exactly equal to the Pearson correlation.
To see this, go to **Analyze \(\rightarrow\) Correlate \(\rightarrow\) Bivariate…**

Select `vote_share`

and `mshare`

as the **Variables**.

Then click **OK**. We get the following output.

The correlation between Tweet share and vote share is 0.5089. If we square this, we get

\[ 0.5089^2 = 0.2589, \]

which is the same as the \(R^2\) value from the regression.

Also, in simple regression only, the model \(F\)-test is the same as the test for the single independent variable. A \(t\)-statistic with \(k\) degrees of freedom is equal to an \(F\)-statistic with 1 and \(k\) degrees of freedom. When there are no other predictors in the model, the square root of \(F\) will equal the \(t\) for our coefficient,

\[ \sqrt{141.17} = 11.88. \]

For more detailed information on where these numbers come from, consult our simple regression tutorial.