CFA using the lavaan package in R

Caleb Scheidel

Posted on
R CFA lavaan

This tutorial shows how to estimate a confirmatory factor analysis (CFA) model using the R lavaan package. The model, which consists of two latent variables and eight manifest variables, is described here. To review, the model to be fit is the following:

The data can be accessed from the built-in Bollen dataset in the sem package.

library(sem)
library(lavaan)

data("Bollen")

All required packages are loaded first with the library() commands. The data are then loaded into the environment using the data() function. We begin by specifying the model using the following syntax:

model <- '
    xi_1  =~ y1 + y2 + y3 + y4
    xi_2  =~ y5 + y6 + y7 + y8
'

fit <- cfa(model, data = Bollen)
summary(fit, fit.measures = TRUE, standardized = TRUE)

CFA models are specified in lavaan by stating the name of the latent variable, followed by =~, followed by the names of the observed variables. This model has two latent variables. The first is \(\xi_1\) (Greek letter pronounced “xi”) and is measured by the variables y1 to y4. The second is \(\xi_2\) and is measured with the variables y5 to y8.

The model object is then plugged into the lavaan::cfa() function to fit the model. By default lavaan will use standard maximum likelihood ("ML") estimation. The default is also to report the conventional chi-square test and maximum likelihood standard errors. Alternative estimators, such as "MLM" and "MLR" are available in lavaan and are described here.

To view a summary of the model results, the summary() command is used. Adding the standardized = TRUE argument will present standardized estimates along with the unstandardized parameters. Fit statistics of the model will be returned by adding the fit.measures = TRUE argument. See the full lavaan documentation to explore other optional parameters that are available.

The above syntax will be sufficient for many CFA models. However, it is also common to impose constraints on a CFA model, such as forcing factor loadings to be equal or allowing errors to covary. Bollen’s model includes both of these. First, because the latent variables represent the same democracy construct measured at two points in time, it makes sense that the respective factor loadings would be the same for each factor. We can specify this constraint as follows:

model <- '
  # measurement model
    xi_1  =~ y1 + l2*y2 + l3*y3 + l4*y4
    xi_2  =~ y5 + l2*y6 + l3*y7 + l4*y8
'

This syntax adds labels before each loading to specify which should be equal. l2, which is short for lambda 2, has been added before y2 and y6. Having the same label forces these loadings to be equal. Likewise, the l3 label will force the y3 and y7 loadings to be equal, and the l4 label will force the y4 and y8 loadings to be equal. We did not need to impose any constraint for y1 and y5. By default, lavaan identifies the model by constraining the first loading for each factor to equal one.

By default, lavaan will assume that all error variances for the observed variables are independent of each other. We can relax this constraint with some additional syntax:

model <- '
  # measurement model
    xi_1  =~ y1 + l2*y2 + l3*y3 + l4*y4
    xi_2  =~ y5 + l2*y6 + l3*y7 + l4*y8
  # residual correlations
    y1 ~~ y5
    y2 ~~ y4 + y6
    y3 ~~ y7
    y4 ~~ y8
    y6 ~~ y8
'

The ~~ syntax specifies which error variances covary. Most of the covariances capture the fact that, having the same measures at two time points, any idiosyncrasies present at the first time point may also be present at the second time point. We also allow the error variances for the second and fourth observed variables at each time point to covary.

We now use the model object as the first argument to the cfa() function, and results are obtained using the summary() function.

fit <- cfa(model, data = Bollen)
summary(fit, fit.measures = TRUE, standardized = TRUE)
## lavaan 0.6-3 ended normally after 67 iterations
## 
##   Optimization method                           NLMINB
##   Number of free parameters                         23
##   Number of equality constraints                     3
## 
##   Number of observations                            75
## 
##   Estimator                                         ML
##   Model Fit Test Statistic                      15.320
##   Degrees of freedom                                16
##   P-value (Chi-square)                           0.501
## 
## Model test baseline model:
## 
##   Minimum Function Test Statistic              461.111
##   Degrees of freedom                                28
##   P-value                                        0.000
## 
## User model versus baseline model:
## 
##   Comparative Fit Index (CFI)                    1.000
##   Tucker-Lewis Index (TLI)                       1.003
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1320.232
##   Loglikelihood unrestricted model (H1)      -1312.572
## 
##   Number of free parameters                         20
##   Akaike (AIC)                                2680.464
##   Bayesian (BIC)                              2726.814
##   Sample-size adjusted Bayesian (BIC)         2663.779
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000
##   90 Percent Confidence Interval          0.000  0.103
##   P-value RMSEA <= 0.05                          0.683
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.058
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Information saturated (h1) model          Structured
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   xi_1 =~                                                               
##     y1                1.000                               2.170    0.845
##     y2        (l2)    1.213    0.143    8.483    0.000    2.631    0.692
##     y3        (l3)    1.210    0.125    9.690    0.000    2.626    0.762
##     y4        (l4)    1.273    0.122   10.453    0.000    2.761    0.839
##   xi_2 =~                                                               
##     y5                1.000                               2.128    0.803
##     y6        (l2)    1.213    0.143    8.483    0.000    2.580    0.762
##     y7        (l3)    1.210    0.125    9.690    0.000    2.575    0.817
##     y8        (l4)    1.273    0.122   10.453    0.000    2.708    0.833
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##  .y1 ~~                                                                 
##    .y5                0.577    0.364    1.585    0.113    0.577    0.266
##  .y2 ~~                                                                 
##    .y4                1.390    0.685    2.030    0.042    1.390    0.283
##    .y6                2.068    0.733    2.822    0.005    2.068    0.344
##  .y3 ~~                                                                 
##    .y7                0.727    0.611    1.190    0.234    0.727    0.180
##  .y4 ~~                                                                 
##    .y8                0.476    0.453    1.049    0.294    0.476    0.148
##  .y6 ~~                                                                 
##    .y8                1.257    0.583    2.156    0.031    1.257    0.319
##   xi_1 ~~                                                               
##     xi_2              4.461    0.967    4.612    0.000    0.966    0.966
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .y1                1.879    0.431    4.355    0.000    1.879    0.285
##    .y2                7.530    1.363    5.523    0.000    7.530    0.521
##    .y3                4.966    0.966    5.141    0.000    4.966    0.419
##    .y4                3.214    0.722    4.449    0.000    3.214    0.297
##    .y5                2.499    0.518    4.824    0.000    2.499    0.356
##    .y6                4.809    0.924    5.202    0.000    4.809    0.419
##    .y7                3.302    0.699    4.722    0.000    3.302    0.332
##    .y8                3.227    0.720    4.482    0.000    3.227    0.306
##     xi_1              4.708    1.028    4.579    0.000    1.000    1.000
##     xi_2              4.528    1.022    4.429    0.000    1.000    1.000

Note that there were no warning messages returned from the cfa function. If a positive-definite warning message was returned from the cfa function due to a non-identified model, it is a clear signal that the model has a non-unique solution. Output after this warning message may still say convergence was achieved, but should not ever be reported.

We are then presented with model fit information. We look for a non-significant \(\chi^2\) test, a RMSEA less than 0.05, CFI/TLI above 0.90 to 0.95, and SRMR less than 0.08. Consult Hu and Bentler (1999) for fuller details on interpretation.

The next section presents the parameter estimates. The standardized results are presented under the Std.lv and Std.all columns. The Std.lv column shows results that are standardized so that the latent variables have a variance of one. The Std.all column shows results that are standardized so that both the latent variables and the observed variables have a variance of one. The latter is reported more often and is akin to standardized regression coefficients, where x and y are both interpreted in terms of z-scores. Note that the estimates for the loadings are the same for both latent democracy variables, which is what we imposed by labeling the respective parameters in the syntax. Note also that there are estimates corresponding to the error covariances, as we specified with our ~~ syntax.

The lavaan package also provides a set of extractor functions to pull specific portions of the output to further process or analyze. The parameterEstimates(), standardizedSolution(), and fitMeasures() functions can be used to return only the unstandardized estimates, standardized estimates, and model fit statistics, respectively.

parameterEstimates(fit)
##     lhs op  rhs label   est    se      z pvalue ci.lower ci.upper
## 1  xi_1 =~   y1       1.000 0.000     NA     NA    1.000    1.000
## 2  xi_1 =~   y2    l2 1.213 0.143  8.483  0.000    0.933    1.493
## 3  xi_1 =~   y3    l3 1.210 0.125  9.690  0.000    0.965    1.455
## 4  xi_1 =~   y4    l4 1.273 0.122 10.453  0.000    1.034    1.511
## 5  xi_2 =~   y5       1.000 0.000     NA     NA    1.000    1.000
## 6  xi_2 =~   y6    l2 1.213 0.143  8.483  0.000    0.933    1.493
## 7  xi_2 =~   y7    l3 1.210 0.125  9.690  0.000    0.965    1.455
## 8  xi_2 =~   y8    l4 1.273 0.122 10.453  0.000    1.034    1.511
## 9    y1 ~~   y5       0.577 0.364  1.585  0.113   -0.136    1.290
## 10   y2 ~~   y4       1.390 0.685  2.030  0.042    0.048    2.732
## 11   y2 ~~   y6       2.068 0.733  2.822  0.005    0.632    3.504
## 12   y3 ~~   y7       0.727 0.611  1.190  0.234   -0.471    1.926
## 13   y4 ~~   y8       0.476 0.453  1.049  0.294   -0.413    1.364
## 14   y6 ~~   y8       1.257 0.583  2.156  0.031    0.114    2.399
## 15   y1 ~~   y1       1.879 0.431  4.355  0.000    1.034    2.725
## 16   y2 ~~   y2       7.530 1.363  5.523  0.000    4.858   10.203
## 17   y3 ~~   y3       4.966 0.966  5.141  0.000    3.072    6.859
## 18   y4 ~~   y4       3.214 0.722  4.449  0.000    1.798    4.629
## 19   y5 ~~   y5       2.499 0.518  4.824  0.000    1.484    3.514
## 20   y6 ~~   y6       4.809 0.924  5.202  0.000    2.997    6.621
## 21   y7 ~~   y7       3.302 0.699  4.722  0.000    1.932    4.673
## 22   y8 ~~   y8       3.227 0.720  4.482  0.000    1.816    4.638
## 23 xi_1 ~~ xi_1       4.708 1.028  4.579  0.000    2.693    6.723
## 24 xi_2 ~~ xi_2       4.528 1.022  4.429  0.000    2.524    6.531
## 25 xi_1 ~~ xi_2       4.461 0.967  4.612  0.000    2.565    6.357
standardizedSolution(fit)
##     lhs op  rhs est.std    se      z pvalue ci.lower ci.upper
## 1  xi_1 =~   y1   0.845 0.041 20.411  0.000    0.764    0.927
## 2  xi_1 =~   y2   0.692 0.059 11.660  0.000    0.576    0.808
## 3  xi_1 =~   y3   0.762 0.051 14.907  0.000    0.662    0.863
## 4  xi_1 =~   y4   0.839 0.042 20.133  0.000    0.757    0.920
## 5  xi_2 =~   y5   0.803 0.047 17.164  0.000    0.711    0.894
## 6  xi_2 =~   y6   0.762 0.054 14.099  0.000    0.656    0.868
## 7  xi_2 =~   y7   0.817 0.045 18.145  0.000    0.729    0.905
## 8  xi_2 =~   y8   0.833 0.043 19.589  0.000    0.750    0.917
## 9    y1 ~~   y5   0.266 0.142  1.874  0.061   -0.012    0.545
## 10   y2 ~~   y4   0.283 0.115  2.450  0.014    0.056    0.509
## 11   y2 ~~   y6   0.344 0.099  3.486  0.000    0.150    0.537
## 12   y3 ~~   y7   0.180 0.138  1.299  0.194   -0.091    0.451
## 13   y4 ~~   y8   0.148 0.130  1.137  0.255   -0.107    0.402
## 14   y6 ~~   y8   0.319 0.115  2.763  0.006    0.093    0.545
## 15   y1 ~~   y1   0.285 0.070  4.074  0.000    0.148    0.423
## 16   y2 ~~   y2   0.521 0.082  6.341  0.000    0.360    0.682
## 17   y3 ~~   y3   0.419 0.078  5.368  0.000    0.266    0.572
## 18   y4 ~~   y4   0.297 0.070  4.243  0.000    0.160    0.433
## 19   y5 ~~   y5   0.356 0.075  4.737  0.000    0.208    0.503
## 20   y6 ~~   y6   0.419 0.082  5.092  0.000    0.258    0.581
## 21   y7 ~~   y7   0.332 0.074  4.518  0.000    0.188    0.477
## 22   y8 ~~   y8   0.306 0.071  4.310  0.000    0.167    0.445
## 23 xi_1 ~~ xi_1   1.000 0.000     NA     NA    1.000    1.000
## 24 xi_2 ~~ xi_2   1.000 0.000     NA     NA    1.000    1.000
## 25 xi_1 ~~ xi_2   0.966 0.029 33.297  0.000    0.909    1.023
fitMeasures(fit)
##                npar                fmin               chisq 
##              20.000               0.102              15.320 
##                  df              pvalue      baseline.chisq 
##              16.000               0.501             461.111 
##         baseline.df     baseline.pvalue                 cfi 
##              28.000               0.000               1.000 
##                 tli                nnfi                 rfi 
##               1.003               1.003               0.942 
##                 nfi                pnfi                 ifi 
##               0.967               0.552               1.002 
##                 rni                logl   unrestricted.logl 
##               1.002           -1320.232           -1312.572 
##                 aic                 bic              ntotal 
##            2680.464            2726.814              75.000 
##                bic2               rmsea      rmsea.ci.lower 
##            2663.779               0.000               0.000 
##      rmsea.ci.upper        rmsea.pvalue                 rmr 
##               0.103               0.683               0.640 
##          rmr_nomean                srmr        srmr_bentler 
##               0.640               0.058               0.058 
## srmr_bentler_nomean                crmr         crmr_nomean 
##               0.058               0.042               0.042 
##          srmr_mplus   srmr_mplus_nomean               cn_05 
##               0.045               0.045             129.734 
##               cn_01                 gfi                agfi 
##             157.657               0.955               0.898 
##                pgfi                 mfi                ecvi 
##               0.424               1.005               0.738

Visualizing CFA models using semPlot

We can visualize our model with semPlot::semPaths():

library(semPlot)
semPaths(fit, nCharNodes = 0, style = "lisrel", rotation = 2)

The semPaths() function takes the fitted lavaan model object as the main argument, but has a number of different options available to customize the path diagram. Here, we set nCharNodes = 0, so that the variable names are not abbreviated. We also set the styling to look like the “lisrel” software output, and set the rotation so that the path diagram flows horizontally. We can see in the above image that the specification of the model from lavaan is reflected in the diagram.

Other Resources

For further information on fitting CFA models in lavaan, check out the following:

Citations

Bollen, K.A. (1989). Structural Equations with Latent Variables. New York, NY: Wiley.

Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55.