According to the central limit theorem, the distribution of means across repeated sampling (the *sampling distribution*) will be normal, centered on the true population mean, and have a *standard error* (the standard deviation of the sampling distribution) equal to

\[ {\sigma_M}=\frac{\sigma}{\sqrt{n}} \]

The numerator \(\sigma\) is the standard deviation of values in the population, calculated as

\[ \sigma = \sqrt{ \frac{ \sum (x_i-\mu)^2}{N}} \]

We can convert the distribution of means to be *unit normal* by converting the means from each sample to \(z\)-scores.

\[ z = \frac{M - \mu}{\sigma_M} \]

This produces a normal distribution centered on zero with a standard error of one. To perform hypothesis testing, we compare our sample mean to the distribution of means under the null hypothesis to get our \(p\)-values.

In real life, we don’t know much about the standard deviation in the population. Instead, we have to estimate it from our sample. We use the sample estimator of the standard deviation, which is only slightly different from the formula for \(\sigma\) presented above:

\[ s = \sqrt{ \frac{ \sum (x_i-M)^2}{n-1}} \]

The numerator is the sum of squared deviations from the mean (sum of squares, or SS, for short), and the denominator is referred to as the *degrees of freedom*. What separates the sample estimator from the formula one uses to find a population standard deviation is this denominator. Note that as \(n\) becomes large, subtracting one has a smaller and smaller effect on the estimate vis-a-vis the population formula. The difference disappears entirely given an infinitely large sample size.

The problem with using an estimate rather than the true value is that estimates are by definition uncertain. This uncertainty propogates into the sampling distribution, which is no longer normal but rather distributed according to what is called a \(t\) distribution. Whereas the unit normal distribution was a distribution of \(z\)-scores, the \(t\) distribution is a distribution of \(t\)-scores. The formula for \(t\)-scores looks a lot like the formula for \(z\)-scores:

\[ t = \frac{M - \mu}{s_M} \]

That is, we take our sample mean, subtract from it the population mean (or what our null hypothesis says the population mean is), and divide it by the standard error. The difference is that \(s\) is now calculated as

\[ {s_M}=\frac{s}{\sqrt{n}} \]

using our sample estimate in the numerator instead of the population standard deviation.

There is actually not a single \(t\)-distribution but rather a *family* of distributions whose precise shape varies according to the degrees of freedom. The following plot displays different \(t\)-distributions and compares them to the unit normal distribution.

Note that, as the degrees of freedom increase, the shape of the \(t\) distribution converges to the normal. Intuitively, this reflects the fact that we have more data and hence more certainty in our estimate of the population standard deviation. If we were 100% confident, we could use the standard normal distribution instead of \(t\) for our hypothesis tests.

## Types of t-Tests

Software will usually give you different options for performing a \(t\)-test. In SPSS, for example, you can go to **Analyze** \(\rightarrow\) **Compare Means** and choose from the One-Sample \(t\)-test, the Independent Samples \(t\)-test, or the Paired Samples \(t\)-test. The rest of this page describes the differences between these.

#### One Sample t-Test

A one sample \(t\)-test determines whether a sample mean is statistically different from some population mean. When conducting a \(t\)-test, we select a null hypothesis (\(H_0\)) and an alternative hypothesis (\(H_A\)). The null hypothesis says that our sample was generated by a specific population, and the alternative hypothesis is that our sample mean was generated by a different population. The alternative hypothesis could also be that our sample comes from a population whose mean is greater than or less than the null hypothesis, but the usual convention is to be directionally ambivalent and use the two-tailed test.

Let’s look at an example. We have a sample of 200 IQ scores, and we want to know if the mean IQ for the population could be equal to 100 based on our sample. A first step in any statistical analysis should be to visualize the data. The following is a histogram of the 200 sample values.

Next, we will set our null hypothesis and our alternative hypothesis.

- \(H_0\): the mean IQ = 100
- \(H_A\): the mean IQ \(\ne\) 100

Finally, we will run our \(t\)-test.

We will start by calculating the \(t\)-statistic. The sample mean, \(M\), of our sample data turns out to be 105.04, the sample standard deviation, \(s\) is 15.78, and \(n = 200\). We can use these in our formula from before to find:

\[ t = \frac{M- \mu}{\frac{s}{\sqrt{n}}}\\ =\frac{105.04- 100}{\frac{15.78}{\sqrt{200}}}\\ = 4.517 \] Next, compare our \(t\)-statistic to the \(t\) distribution with \(n-1\) degrees of freedom. Using a two-tailed \(\alpha\) level of .05 for all of our tests, our result will be significant if \(t > 1.97\) or \(t < -1.97\). We get these values because they correspond to the tails covering 5% of the distribution.

We compare our estimated \(t\) to the rejection region and, if it falls into the tails, we reject the null hypothesis.

We easily reject the null hypothesis that our sample was generated from a population having a mean IQ of 100.

#### Dependent (Paired) Samples t-Test

The one sample \(t\)-test is rarely used, but there is a variation of it that is quite useful in pre-post types of designs. The dependent samples \(t\)-test, which is also known as the paired samples \(t\)-test, is used to compare two (and only two) means that are related in some manner. The most common version is when one sample is measured at time 1 and then again at time 2. As another example, each subject used to estimate the first mean may be related to one of the subjects used in the second sample, such as if they were siblings. The goal is to determine if the two related means are different from each other.

Although we seemingly have two samples, because they are paired we can simply take the difference between them and determine if the mean *difference* is significantly different from zero. For example, we will look at our IQ data again, except now we’ll assume that we have 100 individuals measured at two different time points. We can visualize our data with a box plot:

To convert the comparison of means into a one-sample \(t\)-test, we subtract the time 1 scores from time 2. The following shows how this works for the first five subjects in our sample:

Subject ID | Time 1 | Time 2 | Difference |
---|---|---|---|

1 | 93.89 | 83.71 | -10.18 |

2 | 131.22 | 116.23 | -14.99 |

3 | 102.80 | 110.93 | 8.13 |

4 | 107.27 | 95.90 | -11.37 |

5 | 89.94 | 101.37 | 11.43 |

If no change occurs between the two measurements, then the time 1 and time 2 measures will be similar, and the difference will be about zero. Consequently, we will set our hypotheses as:

- \(H_0\): \(\mu_{T2} - \mu_{T1} = 0\)
- \(H_A\): \(\mu_{T2} - \mu_{T1} \ne 0\)

Then, we will conduct our \(t\)-test. For a paired \(t\)-test, the formula is:

\[ t = \frac{M_D}{s_D/\sqrt{n}} \]

where \(M_D\) is the mean of the difference between the time 2 and time 1, \(s_D\) is the standard deviation of that difference, and \(n\) is the number of paired samples, in this case 100. For our data, \(M_D = 3.52\) and \(s_D = 22.62\), so the \(t\)-score is:

\[ \frac{3.52}{22.62/10}=1.56 \]

The degrees of freedom for a paired samples \(t\)-test is \(n-1\), so for our example it is 99. The critical values separating the 5% in both tails from the middle of the distribution are \(\pm\) 1.98. Compare our value to this region:

Our \(t\)-statistic of 1.56 is within this region, and we do not reject the null hypothesis.

#### Two Sample Independent t-Test

Probably the most common type of \(t\)-test is when we have two different samples that are not directly related. An example is comparing a sample receiving a treatment to a sample not receiving treatment, or comparing boys to girls.

For example, we will look at our IQ data again. Say we want to compare men’s IQ vs. women’s IQ. We would do this using an independent samples \(t\)-test. We can visualize our data with a box plot:

There does not appear to be much of a difference. What do we conclude if we conduct an independent samples \(t\)-test? For this test, the null hypothesis (\(H_0\)) is that the two population means are equal, and the alternative hypothesis is that the two means are not equal.

- \(H_0\): \(\mu_F = \mu_M\)
- \(H_A\): \(\mu_F \ne \mu_M\)

The formula we will use to calculate our test statistic is:

\[ t = \frac{M_1 - M_2}{\sqrt{s_1^2/n_1 + s_2^2/n_2}} \]

where the denominator is the standard error of the difference in means. Note that the formula actually uses the *variance* \(s^2\) rather than the standard deviation \(s\). We can convert the standard deviation to a variance by squaring it.

For our data, the sample mean for men (\(M_1\)) is 103.56, and the standard deviation (\(s_1\)) is 13.88. For women, the mean (\(M_2\)) is 106.4, and the standard deviation (\(s_2\)) is 17.31. The sample sizes are equal, \(n_1=n_2 = 100\). We can plug in these values to get our \(t\)-statistic:

\[ \frac{103.56 - 106.4}{\sqrt{13.88^2/100 + 17.31^2/100}} = -1.28 \]

Next, we compare this value to the rejection region. The degrees of freedom for an independent samples \(t\)-test is \(n_1 + n_2 - 2\), reflecting the fact we are estimating *two* means. In this case, the degrees of freedom are 198. The critical values for a \(t\) distribution given 198 degrees of freedom are \(\pm\) 1.97. Since -1.28 is within this region, we fail to reject \(H_0\).

Still have questions? Contact us!