Estimating CFA using the lavaan package in R

Caleb Scheidel

Posted on
R CFA lavaan

This tutorial shows how to estimate a confirmatory factor analysis (CFA) model using the R lavaan package. The model, which consists of two latent variables and eight manifest variables, is described in our previous post which sets up a running CFA and SEM example. To review, the model to be fit is the following:

CFA model

The data can be accessed from the built-in Bollen dataset in the sem package.

library(sem)
## Warning: package 'sem' was built under R version 3.6.2
library(lavaan)

data("Bollen")

All required packages are loaded first with the library() commands. The data are then loaded into the environment using the data() function. We begin by specifying the model using the following syntax:

model <- '
    xi_1  =~ y1 + y2 + y3 + y4
    xi_2  =~ y5 + y6 + y7 + y8
'

fit <- cfa(model, data = Bollen)
summary(fit, fit.measures = TRUE, standardized = TRUE)

CFA models are specified in lavaan by stating the name of the latent variable, followed by =~, followed by the names of the observed variables. This model has two latent variables. The first is \(\xi_1\) (Greek letter pronounced “xi”) and is measured by the variables y1 to y4. The second is \(\xi_2\) and is measured with the variables y5 to y8.

The model object is then plugged into the lavaan::cfa() function to fit the model. By default lavaan will use standard maximum likelihood ("ML") estimation. The default is also to report the conventional chi-square test and maximum likelihood standard errors. Alternative estimators, such as "MLM" and "MLR" are available in lavaan and are described in the lavaan documentation of all available estimators.

To view a summary of the model results, the summary() command is used. Adding the standardized = TRUE argument will present standardized estimates along with the unstandardized parameters. Fit statistics of the model will be returned by adding the fit.measures = TRUE argument. See the full lavaan documentation to explore other optional parameters that are available.

The above syntax will be sufficient for many CFA models. However, it is also common to impose constraints on a CFA model, such as forcing factor loadings to be equal or allowing errors to covary. Bollen’s model includes both of these. First, because the latent variables represent the same democracy construct measured at two points in time, it makes sense that the respective factor loadings would be the same for each factor. We can specify this constraint as follows:

model <- '
  # measurement model
    xi_1  =~ y1 + l2*y2 + l3*y3 + l4*y4
    xi_2  =~ y5 + l2*y6 + l3*y7 + l4*y8
'

This syntax adds labels before each loading to specify which should be equal. l2, which is short for lambda 2, has been added before y2 and y6. Having the same label forces these loadings to be equal. Likewise, the l3 label will force the y3 and y7 loadings to be equal, and the l4 label will force the y4 and y8 loadings to be equal. We did not need to impose any constraint for y1 and y5. By default, lavaan identifies the model by constraining the first loading for each factor to equal one.

By default, lavaan will assume that all error variances for the observed variables are independent of each other. We can relax this constraint with some additional syntax:

model <- '
  # measurement model
    xi_1  =~ y1 + l2*y2 + l3*y3 + l4*y4
    xi_2  =~ y5 + l2*y6 + l3*y7 + l4*y8
  # residual correlations
    y1 ~~ y5
    y2 ~~ y4 + y6
    y3 ~~ y7
    y4 ~~ y8
    y6 ~~ y8
'

The ~~ syntax specifies which error variances covary. Most of the covariances capture the fact that, having the same measures at two time points, any idiosyncrasies present at the first time point may also be present at the second time point. We also allow the error variances for the second and fourth observed variables at each time point to covary.

We now use the model object as the first argument to the cfa() function, and results are obtained using the summary() function.

fit <- cfa(model, data = Bollen)
summary(fit, fit.measures = TRUE, standardized = TRUE)
## lavaan 0.6-7 ended normally after 67 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         23
##   Number of equality constraints                     3
##                                                       
##   Number of observations                            75
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                15.320
##   Degrees of freedom                                16
##   P-value (Chi-square)                           0.501
## 
## Model Test Baseline Model:
## 
##   Test statistic                               461.111
##   Degrees of freedom                                28
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000
##   Tucker-Lewis Index (TLI)                       1.003
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1320.232
##   Loglikelihood unrestricted model (H1)      -1312.572
##                                                       
##   Akaike (AIC)                                2680.464
##   Bayesian (BIC)                              2726.814
##   Sample-size adjusted Bayesian (BIC)         2663.779
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.103
##   P-value RMSEA <= 0.05                          0.683
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.058
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   xi_1 =~                                                               
##     y1                1.000                               2.170    0.845
##     y2        (l2)    1.213    0.143    8.483    0.000    2.631    0.692
##     y3        (l3)    1.210    0.125    9.690    0.000    2.626    0.762
##     y4        (l4)    1.273    0.122   10.453    0.000    2.761    0.839
##   xi_2 =~                                                               
##     y5                1.000                               2.128    0.803
##     y6        (l2)    1.213    0.143    8.483    0.000    2.580    0.762
##     y7        (l3)    1.210    0.125    9.690    0.000    2.575    0.817
##     y8        (l4)    1.273    0.122   10.453    0.000    2.708    0.833
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##  .y1 ~~                                                                 
##    .y5                0.577    0.364    1.585    0.113    0.577    0.266
##  .y2 ~~                                                                 
##    .y4                1.390    0.685    2.030    0.042    1.390    0.283
##    .y6                2.068    0.733    2.822    0.005    2.068    0.344
##  .y3 ~~                                                                 
##    .y7                0.727    0.611    1.190    0.234    0.727    0.180
##  .y4 ~~                                                                 
##    .y8                0.476    0.453    1.049    0.294    0.476    0.148
##  .y6 ~~                                                                 
##    .y8                1.257    0.583    2.156    0.031    1.257    0.319
##   xi_1 ~~                                                               
##     xi_2              4.461    0.967    4.612    0.000    0.966    0.966
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .y1                1.879    0.431    4.355    0.000    1.879    0.285
##    .y2                7.530    1.363    5.523    0.000    7.530    0.521
##    .y3                4.966    0.966    5.141    0.000    4.966    0.419
##    .y4                3.214    0.722    4.449    0.000    3.214    0.297
##    .y5                2.499    0.518    4.824    0.000    2.499    0.356
##    .y6                4.809    0.924    5.202    0.000    4.809    0.419
##    .y7                3.302    0.699    4.722    0.000    3.302    0.332
##    .y8                3.227    0.720    4.482    0.000    3.227    0.306
##     xi_1              4.708    1.028    4.579    0.000    1.000    1.000
##     xi_2              4.528    1.022    4.429    0.000    1.000    1.000

Note that there were no warning messages returned from the cfa function. If a positive-definite warning message was returned from the cfa function due to a non-identified model, it is a clear signal that the model has a non-unique solution. Output after this warning message may still say convergence was achieved, but should not ever be reported.

We are then presented with model fit information. We look for a non-significant \(\chi^2\) test, a RMSEA less than 0.05, CFI/TLI above 0.90 to 0.95, and SRMR less than 0.08. Consult Hu and Bentler (1999) for fuller details on interpretation.

The next section presents the parameter estimates. The standardized results are presented under the Std.lv and Std.all columns. The Std.lv column shows results that are standardized so that the latent variables have a variance of one. The Std.all column shows results that are standardized so that both the latent variables and the observed variables have a variance of one. The latter is reported more often and is akin to standardized regression coefficients, where x and y are both interpreted in terms of z-scores. Note that the estimates for the loadings are the same for both latent democracy variables, which is what we imposed by labeling the respective parameters in the syntax. Note also that there are estimates corresponding to the error covariances, as we specified with our ~~ syntax.

The lavaan package also provides a set of extractor functions to pull specific portions of the output to further process or analyze. The parameterEstimates(), standardizedSolution(), and fitMeasures() functions can be used to return only the unstandardized estimates, standardized estimates, and model fit statistics, respectively.

parameterEstimates(fit)
##     lhs op  rhs label   est    se      z pvalue ci.lower ci.upper
## 1  xi_1 =~   y1       1.000 0.000     NA     NA    1.000    1.000
## 2  xi_1 =~   y2    l2 1.213 0.143  8.483  0.000    0.933    1.493
## 3  xi_1 =~   y3    l3 1.210 0.125  9.690  0.000    0.965    1.455
## 4  xi_1 =~   y4    l4 1.273 0.122 10.453  0.000    1.034    1.511
## 5  xi_2 =~   y5       1.000 0.000     NA     NA    1.000    1.000
## 6  xi_2 =~   y6    l2 1.213 0.143  8.483  0.000    0.933    1.493
## 7  xi_2 =~   y7    l3 1.210 0.125  9.690  0.000    0.965    1.455
## 8  xi_2 =~   y8    l4 1.273 0.122 10.453  0.000    1.034    1.511
## 9    y1 ~~   y5       0.577 0.364  1.585  0.113   -0.136    1.290
## 10   y2 ~~   y4       1.390 0.685  2.030  0.042    0.048    2.732
## 11   y2 ~~   y6       2.068 0.733  2.822  0.005    0.632    3.504
## 12   y3 ~~   y7       0.727 0.611  1.190  0.234   -0.471    1.926
## 13   y4 ~~   y8       0.476 0.453  1.049  0.294   -0.413    1.364
## 14   y6 ~~   y8       1.257 0.583  2.156  0.031    0.114    2.399
## 15   y1 ~~   y1       1.879 0.431  4.355  0.000    1.034    2.725
## 16   y2 ~~   y2       7.530 1.363  5.523  0.000    4.858   10.203
## 17   y3 ~~   y3       4.966 0.966  5.141  0.000    3.072    6.859
## 18   y4 ~~   y4       3.214 0.722  4.449  0.000    1.798    4.629
## 19   y5 ~~   y5       2.499 0.518  4.824  0.000    1.484    3.514
## 20   y6 ~~   y6       4.809 0.924  5.202  0.000    2.997    6.621
## 21   y7 ~~   y7       3.302 0.699  4.722  0.000    1.932    4.673
## 22   y8 ~~   y8       3.227 0.720  4.482  0.000    1.816    4.638
## 23 xi_1 ~~ xi_1       4.708 1.028  4.579  0.000    2.693    6.723
## 24 xi_2 ~~ xi_2       4.528 1.022  4.429  0.000    2.524    6.531
## 25 xi_1 ~~ xi_2       4.461 0.967  4.612  0.000    2.565    6.357
standardizedSolution(fit)
##     lhs op  rhs est.std    se      z pvalue ci.lower ci.upper
## 1  xi_1 =~   y1   0.845 0.041 20.411  0.000    0.764    0.927
## 2  xi_1 =~   y2   0.692 0.059 11.660  0.000    0.576    0.808
## 3  xi_1 =~   y3   0.762 0.051 14.907  0.000    0.662    0.863
## 4  xi_1 =~   y4   0.839 0.042 20.133  0.000    0.757    0.920
## 5  xi_2 =~   y5   0.803 0.047 17.164  0.000    0.711    0.894
## 6  xi_2 =~   y6   0.762 0.054 14.099  0.000    0.656    0.868
## 7  xi_2 =~   y7   0.817 0.045 18.145  0.000    0.729    0.905
## 8  xi_2 =~   y8   0.833 0.043 19.588  0.000    0.750    0.917
## 9    y1 ~~   y5   0.266 0.142  1.874  0.061   -0.012    0.545
## 10   y2 ~~   y4   0.283 0.115  2.450  0.014    0.056    0.509
## 11   y2 ~~   y6   0.344 0.099  3.486  0.000    0.150    0.537
## 12   y3 ~~   y7   0.180 0.138  1.299  0.194   -0.091    0.451
## 13   y4 ~~   y8   0.148 0.130  1.137  0.255   -0.107    0.402
## 14   y6 ~~   y8   0.319 0.115  2.763  0.006    0.093    0.545
## 15   y1 ~~   y1   0.285 0.070  4.074  0.000    0.148    0.423
## 16   y2 ~~   y2   0.521 0.082  6.341  0.000    0.360    0.682
## 17   y3 ~~   y3   0.419 0.078  5.368  0.000    0.266    0.572
## 18   y4 ~~   y4   0.297 0.070  4.243  0.000    0.160    0.433
## 19   y5 ~~   y5   0.356 0.075  4.737  0.000    0.208    0.503
## 20   y6 ~~   y6   0.419 0.082  5.091  0.000    0.258    0.581
## 21   y7 ~~   y7   0.332 0.074  4.518  0.000    0.188    0.477
## 22   y8 ~~   y8   0.306 0.071  4.310  0.000    0.167    0.445
## 23 xi_1 ~~ xi_1   1.000 0.000     NA     NA    1.000    1.000
## 24 xi_2 ~~ xi_2   1.000 0.000     NA     NA    1.000    1.000
## 25 xi_1 ~~ xi_2   0.966 0.029 33.297  0.000    0.909    1.023
fitMeasures(fit)
##                npar                fmin               chisq                  df 
##              20.000               0.102              15.320              16.000 
##              pvalue      baseline.chisq         baseline.df     baseline.pvalue 
##               0.501             461.111              28.000               0.000 
##                 cfi                 tli                nnfi                 rfi 
##               1.000               1.003               1.003               0.942 
##                 nfi                pnfi                 ifi                 rni 
##               0.967               0.552               1.002               1.002 
##                logl   unrestricted.logl                 aic                 bic 
##           -1320.232           -1312.572            2680.464            2726.814 
##              ntotal                bic2               rmsea      rmsea.ci.lower 
##              75.000            2663.779               0.000               0.000 
##      rmsea.ci.upper        rmsea.pvalue                 rmr          rmr_nomean 
##               0.103               0.683               0.640               0.640 
##                srmr        srmr_bentler srmr_bentler_nomean                crmr 
##               0.058               0.058               0.058               0.042 
##         crmr_nomean          srmr_mplus   srmr_mplus_nomean               cn_05 
##               0.042               0.045               0.045             129.734 
##               cn_01                 gfi                agfi                pgfi 
##             157.657               0.955               0.898               0.424 
##                 mfi                ecvi 
##               1.005               0.738

Visualizing CFA models using semPlot

We can visualize our model with semPlot::semPaths():

library(semPlot)
semPaths(fit, nCharNodes = 0, style = "lisrel", rotation = 2)

CFA Model using semPaths function

The semPaths() function takes the fitted lavaan model object as the main argument, but has a number of different options available to customize the path diagram. Here, we set nCharNodes = 0, so that the variable names are not abbreviated. We also set the styling to look like the “lisrel” software output, and set the rotation so that the path diagram flows horizontally. We can see in the above image that the specification of the model from lavaan is reflected in the diagram.

Still have questions? Contact us!

Other Resources

For further information on fitting CFA models in lavaan, check out the following:

Citations

Bollen, K.A. (1989). Structural Equations with Latent Variables. New York, NY: Wiley.

Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55.