Variance Estimation for Complex Surveys

Jeremy Albright

Posted on
variance estimation surveys taylor series expansion/linearization balanced repeated replication jackknife estimation

The presence of strata, clusters, and sampling weights complicate the estimation of a statistic’s variance. Thus, treating observations from complex sampling designs as though they were drawn from a simple random sample can lead to incorrect inferences. Multiple options exist for accurate variance estimation in specialized software designed to analyze survey data. These are:

  • Taylor Series Expansion/Linearization: When a statistic is estimated using a non-linear function, one option for variance estimation is to rely on a simplification of the function that approximates the original but is more tractable. This is the default in most survey software packages because it is computationally efficient. The downside is that a different formula is required for each kind of statistic to be estimated.
  • Balanced Repeated Replication (BRR): This is a method that is appropriate when there are two primary sampling units (PSUs, or clusters) drawn from each stratum. BRR proceeds by repeatedly dropping one PSU in each stratum, estimating the statistic, and combining the different estimates to construct an estimate of the variance. For example, if there are 3 strata, and each has two PSUs, the possible combinations of balanced replicates are:
t1 <- data.frame(1:8,c(1,1,1,1,2,2,2,2),c(1,1,2,2,1,1,2,2),c(1,2,1,2,1,2,1,2))
kable(t1,col.names = c("Sample", "Stratum 1 PSU","Stratum 2 PSU","Stratum 3 PSU"), align = "c")
Sample Stratum 1 PSU Stratum 2 PSU Stratum 3 PSU
1 1 1 1
2 1 1 2
3 1 2 1
4 1 2 2
5 2 1 1
6 2 1 2
7 2 2 1
8 2 2 2

For surveys with more strata, the number of possible replicates will be 2m, where m is the number of strata. Software capable of handling BRR often requires specifying what is called a Hadamard matrix, a square matrix containing the numbers +1 and -1, which will be used to determine the combinations of PSUs that will appear in a given replicate. BRR effectively creates a new set of sampling weights for each new replicate equal to 1) zero for the elements in the PSU’s that are dropped; or 2) twice the element’s original sampling weight for the PSUs that are retained. In some instances, a survey organization may choose to release only the replicate weights rather than any information on the sampling design. Doing so protects the privacy of survey participants while still allowing for accurate variance estimation.

  • Jackknife Estimation: The jackknife is an alternative to BRR that is not restricted to cases where the number of PSUs per stratum equals two. The jackknife replicates are created by dropping a single PSU at a time and estimating the statistic on the remaining observations. After the jackknife has iteratively dropped each PSU, the estimates are combined to derive the variance. As with BRR, it is possible that an agency releases jackknife weights instead of design information in order to protect respondent privacy. The weights are equal to 1) zero if the element is in the PSU that has been dropped; 2) an adjusted value if the element is from a PSU belonging to the stratum from which another PSU was dropped; and 3) the original value otherwise.

In practice, each method should yield similar results. The benefit of the two replication approaches is they do not change depending on the statistic to be estimated. The drawback, at least historically, has been high computational demands. However, today most computers can handle replication with very little difficulty, and the analyst will only have to wait a few seconds longer for BRR or the jackknife results compared to the Taylor method.

Still have questions? Contact us!