Introduction to SPSS

Posted on
SPSS Guide Introduction

This tutorial will go over some basics to get you started using IBM SPSS Statistics, or SPSS. We will cover reading in data, understanding variable view vs. data view, creating and recoding variables, creating graphs, and performing basic analyses. For a more involved approach to analysis with SPSS see our other tutorials. Everything in this tutorial is done using SPSS version 26.

The data used is pulled from the General Social Survey (GSS) dataset for the year 2016.

SPSS may take a minute to load when you first start it up - just be patient. Two windows will open. The “Welcome to IBM SPSS Statistics” window and an “IBM SPSS Statistics Data Editor”.

Opening SPSS

The welcome window includes some quick links that are often useful:

  • You can create a new file
  • Open a recent file
  • See what’s new in recent updates
  • Get quick access to IBM help and support
  • Go to IBM SPSS tutorials

However, we’d like to learn how to do all of this without the use of the welcome window, so we will close it for now.

Now we can see the data editor.

SPSS Data Editor

It is blank since we don’t currently have any data loaded. Notice that there are two tabs in the bottom left corner: data view and variable view. We will come back to those later. First, let’s load some data.

Reading in Data

The native data format for SPSS is .sav or .zsav, but SPSS can import data from Excel, CSV, SAS, Stata, and more. We will cover loading data from a .sav file, and loading excel/csv files.

Opening a .sav file is very simple. Go to File \(\rightarrow\) Open \(\rightarrow\) Data…

open SPSS Data

Select the file that you want to open (in our case spss-basics-data.sav) and click Open.

Two things will happen. The data will open in the data editor:

open SPSS Data click open

and an output window will open:

open SPSS Data click open

The output window is a running log of everything you have done in your current session. If you do anything in SPSS, it will update the output window to reflect what you did. This is also where the output for any figures or analyses will appear. You can just minimize this tab for now.

Next, let’s go over how to open .xlsx and .csv data.

For excel data, go to File \(\rightarrow\) Import Data \(\rightarrow\) Excel…

Open excel data

Find your file and click Open.

The following window will open.

Open excel data - 2

Under Worksheet you can select which tab you want to import if the file has multiple tabs. We only have one tab (sheet1). Then, you can select a custom range; the default is to import the entire spreadsheet. SPSS will default to reading in variable names from the first row of data, but you can uncheck the box if that is not the case. Leave everything else as is and click OK.

Opening CSV data is similar. Go to File \(\rightarrow\) Import Data \(\rightarrow\) CSV Data… Find your file and click Open.

The read CSV file window will open.

Open CSV data

Again, you can specify whether the first column contains variable names. You can also remove leading/trailing spaces from string values (not relevant right now). The delimiter for this dataset is commas, but you could also specify semicolons or tabs. We will leave everything as the default and click OK. The data will open in a new window.

Understanding Variable View

Now that we have the data open, let’s go over what the different views are. Looking at the data editor, we see that we are in variable view (remember the tab at the bottom left corner?). Variable view is exactly as it sounds; it is a view of all the variables in the dataset. There are always 11 columns, and the number of rows is equal to the number of variables; i.e. one row per variable.

SPSS Data Editor Variable View

The columns, from left to right, are as follows:

  • Name: Gives the variable name
  • Type: Specifies if the variable is numeric, string, date, etc.
  • Width: Upper limit of how many characters are in each entry
  • Decimals: How many decimals to round numeric entries to
  • Label: A descriptive label for the variable
  • Values: User-defined value labels
  • Missing: Whether any values are set to missing
  • Columns: The width of the column
  • Align: Specifies whether data are left, right or center aligned
  • Measure: Indicates if the variable is scale, ordinal, or nominal
  • Role: An optional setting to indicate how the variable will be used in analysis

Changing a variable name is very easy; simply double click on the cell with the name you want to change, and type in the new name.

Adding variable labels can be done similarly. SPSS does not allow spaces or special characters in variable names. Variable labels are helpful so that the output is easy to read.

In addition to variable labels, value labels can also be very useful when dealing with categorical data. For example, the SEX variable is coded as 1s and 2s, where 1 represents male and 2 represents female. We can add this as a value label, which will show up on any tables or figures that we create.

For missing values, there is an automatic “System Missing” value of “.”, but some files use numeric values, e.g. -999, to represent missing responses. Setting these values will allow SPSS to correctly treat these values as missing in the analysis. Let’s look at the RACE variable. This has possible values of 0 (inapplicable), 1 (white), 2 (black), 3 (other). We want SPSS to treat zeros as missing.

Let’s add variable labels to the following:

  • AGE: Age
  • SEX: Sex
  • RACE: Race
  • RELIGID: Religious Identity

Then, add value labels for race:

  • 1: White
  • 2: Black
  • 3: Other

and value labels for RELIGID:

  • 1: Fundamentalist
  • 2: Evangelical
  • 3: Mainline
  • 4: Liberal
  • 5: None
  • 6: Other

Finally, specify missing values for RELIGID as 0, 8, and 9 and EDUC as 99 and 98.

SPSS educ missing values

Your data should look like this.

SPSS Data Editor Data View

This recoding will come in handy when we make our figures and tables.

Understanding Data View

Now, let’s take a look at the data view.

SPSS Data Editor Data View

If you have used Excel before, this view should look familiar to you. In data view, each variable is its own column, and each row represents one entry. The 17 variables that we saw in variable view are all here, along with their corresponding values. Currently, we can see the numeric values, rather than the descriptive labels we provided. We can go to View \(\rightarrow\) Value labels, and the value labels we set will show instead.

We can also sort the data in ascending or descending order by variables. Go to Data \(\rightarrow\) sort cases.

SPSS Sort Cases

We can select the variable to sort by - let’s go with age - and specify whether it should be descending or ascending. We’ll select ascending.

SPSS Sort Cases part 2

Then click OK.

SPSS Sorted cases

You can see the data is now sorted by age.

Creating New Variables

There are multiple ways to create new variables in SPSS. The ones we will cover are Compute Variable, Recode into Same variable, and Recode into Different variables.

To use the Compute Variable window, go to Transform \(\rightarrow\) Compute Variable.

The following window will open:

SPSS Compute variable

Type the name of the variable you wish to create under Target Variable. Let’s create a new variable called age_std which will be defined as age minus 18 years, so that 18 becomes zero, 19 becomes one, and so on. Select Age and use the arrow to move it into the numeric expression box, then type - 18. Your window should look something like this.

SPSS Compute variable age_std

Click OK. In variable view, confirm that the age_std variable was created.

SPSS Computed variable age_std

We wish to create an interaction variable between age_std and SEX. Again, go to Transform \(\rightarrow\) Compute Variable. Name the target variable age_sex. For the numeric expression, select age_std from the list and click the arrow to move it over. Use an asterisk to denote multiplication, then click SEX and use the arrow to move it over. You window should look like this.

SPSS Compute variable race*sex

Click OK. Again, confirm that the age_sex variable was created in the variable view. Use the data view to make sure the values are computed correctly.

We have successfully created an interaction variable. However, note that sex is coded 1 = Male and 2 = Female. When creating an interaction with a categorical variable, interpretation is easier when the variable is coded zero and one. The next section will show how to do that for sex.

Recoding Variables

Sometimes you wish to create a new variable based on the values of another variable, or to recode those values. There are two options here:

  • Recode into same variables
  • Recode into different variable

Let’s recode the sex variable from 1’s and 2’s to 0’s and 1’s. Specifically, we want \(1 \rightarrow 0\) and \(2 \rightarrow 1\). Go to Transform \(\rightarrow\) Recode into same variables.

SPSS Recode into same variable part 1

Select the SEX variable and use the arrow to move it into the Numeric Variables box.

SPSS Recode into same variable part 2

Click Old and New Values. Under old value, type “1”, and under new value, type “0”, then click Add. Then, repeat this process to set old value 2 to be new value 1 and click Add.

SPSS Recode into same variable part 3

Click Continue, then OK.

In data view you can see the recoding. However, the labels need to be updated to match the recoded values.

SPSS see new variables

Go to Variable View, then click on the Values box for SEX. Change 1 = "Male" to 0 = "Male", and 2 = "Female" to 1 = "Female".

SPSS see new variables

Then click OK. Your new labels should now show up in the data view.

SPSS see new variables

It’s generally a good idea to recode into a different variable so that you can always go back to the original coding if you need to.

Recode into different variables takes a similar approach to recode into same variable. Consider the race variable. It is often necessary to recode categorical variables into dummy variables. We can do this for race=white and race=black using recode into new variables.

Go to Transform \(\rightarrow\) Recode into different variables.

SPSS Recode into different variable

Select RACE and use the arrow to shift it over to the Input Variable box. Under Output Variable change the name to race_white, and the label to “Race = White”. Then click Change.

SPSS Recode into different variable part 1

Next click Old and New Values…

SPSS Recode into different variable part 2

We want the value of our new variable to be one if race is white, and zero if it is anything else. Recall that the race variable is coded as:

  • 1: White
  • 2: Black
  • 3: Other

So, under Old value, we set Value to 1, and under New Value, we set Value to 1. Then click Add. Under Old value, select All other values, and under New Value set Value to 0. Then click Add. Your window should look like this.

SPSS Recode into different variable part 3

Click Continue, then click OK. Next, we’ll create the race=black dummy variable. Go to Transform \(\rightarrow\) Recode into different variables.

Select RACE and use the arrow to shift it over to the Input Variable box. Under Output Variable change the name to race_black, and the label to “Race = Black”. Then click Change.

SPSS Recode into different variable part 4

Under Old value, we set Value to 2 (since black is coded as 2 in the original race variable), and under New Value, set Value to 1. Then click Add. Under Old value, select All other values, and then under New Value set Value to 0. Then click Add. Your window should look like this.

SPSS Recode into different variable part 5

Click Continue, then click OK. You can see we have created two new variables (race_white and race_black) in the data editor window.

Descriptive Statistics

Now that we have our variables coded with variable and value labels, we may wish to look at some descriptive statistics. For categorical variables (i.e. variables with distinct groups, or categories, such as race) we will look at frequencies. For interval, or continuous, variables (such as age), we will look at the minimum, maximum, mean, and standard deviation.

To create a frequency table, go to Analyze \(\rightarrow\) Descriptive Statistics \(\rightarrow\) Frequencies…

Let’s create a frequency table for race. Select the RACE variable from the list and use the arrow to move it to the Variable(s) box.

SPSS create frequency table

Click OK. The frequency table will open in the output doc.

SPSS frequency table

The first table provides the total number of valid and missing responses, if any exist; there are 2,867 responses to the race variable.

The next table provides the frequencies of each response.

  • Frequency is the number of responses for that category
  • Percent is the number out of the total responses (valid + missing) times 100%
  • Valid Percent is the percentage based on non-missing responses (in this case, percent and valid percent are the same because there were no missing observations)
  • Cumulative Percent is the percent of each response plus the percentage from previous categories

Now let’s take a look at age. Go to Analyze \(\rightarrow\) Descriptive Statistics \(\rightarrow\) Descriptives…

SPSS create decriptives table

Select AGE and use the arrow to move it into the Variable(s) box.

SPSS create decriptives table part 2

You can also add other statistics (e.g. skew, kurtosis for evaluating whether a distribution is normal) by using the Options… button.

SPSS create decriptives table part 3

Leave the defaults checked for now and click Continue. Then click OK.

SPSS decriptives table

  • N provides a count of the responses
  • Minimum is the smallest response
  • Maximum is the largest response
  • Mean gives the average value and is used to measure central tendency
  • Std. Deviation is the standard deviation, which is a measure of dispersion

Creating Graphs

There are many different graphs that SPSS can create - enough to fill multiple tutorials. However, we will just focus on a couple; histograms and bar graphs.

Histograms are used to visualize continuous data by creating “bins” for the frequency of datapoints in each section of values. Bar graphs are used to visualize categorical data by generating a bar for each category whose height is proportional to the frequency of values.

There are two main ways SPSS can create these visualizations; through the Chart Builder, and through Legacy Dialogs. First let’s cover the legacy dialogs.

Say we wish to create a histogram of Age. Go to Graphs \(\rightarrow\) Legacy Dialogs \(\rightarrow\) Histogram…

SPSS histogram legacy dialogs

The Histogram window will open.

SPSS histogram legacy dialogs 2

Select Age and use the arrow to move it into the Variable box.

SPSS histogram legacy dialogs 3

Then click OK.

SPSS histogram legacy dialogs 4

We can see the data appears to be bimodal with a peak at 30 years and another at approximately 55 years. The mean age is 49.33 years with a standard deviation of 17.905 years.

Now, let’s create a bar graph of race. Go to Graphs \(\rightarrow\) Legacy Dialogs \(\rightarrow\) Bar…

SPSS bar graph legacy dialogs 1

Select Simple, and Summaries for groups of cases. Then click Define.

SPSS bar graph legacy dialogs 2

The following window will open.

SPSS bar graph legacy dialogs 3

We can specify what we wish the bars to represent.

  • N of cases is the number of cases in each category
  • Cumulative N will add the previous categories to each subsequent bar
  • % of cases is the number of cases out of the total times 100%
  • Cumulative percent adds the previous category percents to each subsequent bar
  • Other statistic allows you to specify another value (such as mean, minimum, etc.)

Select N of cases.

Category axis is the variable we wish to graph. Select Race and use the arrow to move it into the category axis box.

The window should look like this:

SPSS bar graph legacy dialogs 4

Then click OK. The following figure will be created:

SPSS bar graph legacy dialogs 5

Most respondents to the survey were white, followed by black, and the fewest respondents were other.

Next, let’s create the same figures using the chart builder. The benefit of the chart builder is that it is a lot more flexible than the legacy dialogs.

Go to Graphs \(\rightarrow\) Chart Builder…

When the chart builder window first opens, it will be blank.

SPSS chart builder

There are four main sections in the chart builder window.

  • Section A provides the variables in your dataset
  • Section B is the chart preview, which will be used to build your chart
  • Section C allows you to edit the chart properties, appearance, and options (these will change depending on the type of chart you build)
  • Section D is the Gallery, where you select the chart template you are starting with, basic elements (where you can edit the axes and other elements), Groups/Point ID (which can be used to add clustering/paneling/etc), and titles/footnotes (which can be used to add title/footnote elements to your chart)

In the gallery, select Histogram, then click and drag the Simple Histogram to the chart preview section.

SPSS chart builder histogram 1

This will automatically insert the the simple histogram template into the chart preview window. You can see there are three values that can be edited; Y-axis?, X-Axis?, and Filter?. Click on Age in the variables window and drag it to the X-axis? box.

SPSS chart builder histogram 2

Setting a variable in the y-axis allows you to set histogram values rather than having SPSS calculate them. This is not relevant for us so we will leave it as is. The filter value allows you to filter the data by some other variable; Again we will leave this blank.

We can customize the color under Chart appearance, change axis labels and chart titles under Element Properties and more. However, let’s leave it as the defaults for now and click OK.

SPSS chart builder histogram 3

This creates the same graph as the legacy dialogs did.

Now, let’s create the bar chart using the chart builder. Again, go to Graphs \(\rightarrow\) Chart Builder…

This time, select Bar in the gallery, then click and drag Simple Bar to the chart preview section.

SPSS chart builder bar chart 1

This time, click Race in the variables window and drag it to the X-axis? box. Then click OK.

SPSS chart builder bar chart 2

Again, we get the same chart as we did in the legacy dialogs. It is possible to create most charts using either method–the trade-off is simplicity (legacy dialogs) versus comprehensibility (chart builder).

Performing Analyses

Finally, we will go over how to do some basic analyses with SPSS; specifically, calculating correlations, and running a linear regression model.

Let’s take a look at the correlations between age and income. Go to Analyze \(\rightarrow\) Correlate \(\rightarrow\) Bivariate…

SPSS correlations 1

The Bivariate Correlations window will open. We will select Age and INCOME and use the arrow to move them into the Variables box. Then select what type of correlation coefficients we want calculated, we will stick with Pearson’s r. The test of significance defaults to Two-tailed, which is standard.

SPSS correlations 2

Click OK. You will get the following correlations table:

SPSS correlations 3

The values on the diagonal are 1’s, since any variable correlated with itself will be 1. We can see that age and income have a Pearson correlation of \(r = 0.021\), which is not significant at the \(\alpha = 0.05\) level (\(p = 0.251\)).

Let’s run a regression model to predict years of education based on age, sex, and the race dummy variables we created above. Go to Analyze \(\rightarrow\) Regression \(\rightarrow\) Linear…

SPSS regression 1

The dependent variable is EDUC, so use the right arrow to move that into the Dependent box. The independent variables are Age, SEX, race_white, and race_black so use the arrow to move those into the Independent(s) box.

SPSS regression 2

Then click OK. We get the following output:

The first table lists the variables used in the model, and the method used to enter them into the model (default = Enter).

SPSS regression variables entered/removed

The next table provides the model summary, which gives the R, R-square (which is a measure of model fit), Adjusted R Square (which is a more conservative estimate of model fit), and standard error of the estimate (which is a measure of dispersion).

SPSS regression model summary

The ANOVA table tests whether the model as a whole is significant. We can see given Sig = 0.000 that there is at least one significant factor.

SPSS regression anova

Finally, the coefficients table provides the regression model coefficients, along with their significance. Unstandardized coefficients are provided with their standard errors, as well as standardized betas. A t-test is conducted for each coefficient, and the t-value is given. Then the significance is determined (i.e. p-value). We can see that age is a significant factor (\(p=0.009\)), as is race = white (\(p<0.001\)), while sex and race = black are not.

SPSS regression coefficients

The purpose of this tutorial has been to provide a starting point for using SPSS. For a more in-depth look at specific analyses, see our other SPSS tutorials here.