SEM using the lavaan package in R

Caleb Scheidel

Posted on
R SEM lavaan

This tutorial shows how to estimate a full structural equation model (SEM) with latent variables using the lavaan package in R. The model consists of three latent variables and eleven manifest variables, as described here. To review, the model to be fit is the following:

The data can be accessed from the built-in Bollen dataset in the sem package.

library(sem)
library(lavaan)

data("Bollen")

All required packages are loaded first with the library() commands. The data are then loaded into the environment using the data() function.

The post on CFA using the lavaan package in R described the steps towards fitting and testing the measurement model for the two measures of democracy. Here we are going to move from fitting a measurement model to actually testing structural relationships between variables. The model will keep both latent variables from the measurement model, which represented democracy measured in 1960 (\(\eta_1\)) and democracy measured in 1965 (\(\eta_2\)). We will also add a latent variable measuring industrialization in 1960 (\(\xi_1\)). The model expects that democracy in 1965 will be associated with democracy in 1960 as well as industrialization in 1960.

The lavaan syntax to run the model is the following:

model <- '
  # measurement model
    eta_1  =~ y1 + l2*y2 + l3*y3 + l4*y4
    eta_2  =~ y5 + l2*y6 + l3*y7 + l4*y8
    xi_1   =~ x1 + x2 + x3
  # regressions
    eta_1 ~ xi_1
    eta_2 ~ eta_1 + xi_1
  # residual correlations
    y1 ~~ y5
    y2 ~~ y4 + y6
    y3 ~~ y7
    y4 ~~ y8
    y6 ~~ y8
'

fit <- sem(model, data = Bollen)
summary(fit, fit.measures = TRUE, standardized = TRUE)

The syntax retains all of the constraints described in the tutorial on CFA using lavaan in R. That is, the respective loadings for the 1960 and 1965 democracy indicators are constrained to be equal, and certain covariances between the observed variable error terms are free parameters to be estimated. The equality constraints are specified with the labels l2, l3, and l4 before the observed variable is listed. The ~~ statements introduce the covariances.

The primary difference from the CFA example is that now there are structural relationships between the latent variables. These are captured with the ~ syntax, which is used to specify regression-type linear associations. The dependent variable is listed first, followed by ~, followed by the independent variables. We have the following latent variable regressions:

  • eta_1 ~ xi_1; specifies that 1960 democracy (\(\eta_1\)) is associated with 1960 industrialization (\(\xi_1\)).
  • eta_2 ~ eta_1 + xi_1; specifies that 1965 democracy (\(\eta_2\)) is associated with 1960 democracy (\(\eta_1\)) and 1960 industrialization (\(\xi_1\)).

The model object is then plugged into the lavaan::sem() function to fit the model. By default lavaan will use standard maximum likelihood ("ML") estimation. The default is also to report the conventional chi-square test and maximum likelihood standard errors. Alternative estimators, such as "MLM" and "MLR" are available in lavaan and are described here.

Finally, because latent variables are unobserved and hence have an arbitrary scaling, it is preferable to present standardized estimates rather than the unstandardized parameters. We can get this by adding the optional standardized = TRUE argument to the summary() command. Fit statistics of the model will be returned by adding the fit.measures = TRUE argument. See the full lavaan documentation to explore other optional parameters that are available.

## lavaan 0.6-3 ended normally after 60 iterations
## 
##   Optimization method                           NLMINB
##   Number of free parameters                         31
##   Number of equality constraints                     3
## 
##   Number of observations                            75
## 
##   Estimator                                         ML
##   Model Fit Test Statistic                      40.179
##   Degrees of freedom                                38
##   P-value (Chi-square)                           0.374
## 
## Model test baseline model:
## 
##   Minimum Function Test Statistic              730.654
##   Degrees of freedom                                55
##   P-value                                        0.000
## 
## User model versus baseline model:
## 
##   Comparative Fit Index (CFI)                    0.997
##   Tucker-Lewis Index (TLI)                       0.995
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1548.818
##   Loglikelihood unrestricted model (H1)      -1528.728
## 
##   Number of free parameters                         28
##   Akaike (AIC)                                3153.636
##   Bayesian (BIC)                              3218.526
##   Sample-size adjusted Bayesian (BIC)         3130.277
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.028
##   90 Percent Confidence Interval          0.000  0.087
##   P-value RMSEA <= 0.05                          0.665
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.056
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Information saturated (h1) model          Structured
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   eta_1 =~                                                              
##     y1                1.000                               2.201    0.850
##     y2        (l2)    1.191    0.139    8.551    0.000    2.621    0.690
##     y3        (l3)    1.175    0.120    9.755    0.000    2.586    0.758
##     y4        (l4)    1.251    0.117   10.712    0.000    2.754    0.838
##   eta_2 =~                                                              
##     y5                1.000                               2.154    0.817
##     y6        (l2)    1.191    0.139    8.551    0.000    2.565    0.755
##     y7        (l3)    1.175    0.120    9.755    0.000    2.530    0.802
##     y8        (l4)    1.251    0.117   10.712    0.000    2.694    0.829
##   xi_1 =~                                                               
##     x1                1.000                               0.670    0.920
##     x2                2.180    0.138   15.751    0.000    1.460    0.973
##     x3                1.818    0.152   11.971    0.000    1.218    0.872
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   eta_1 ~                                                               
##     xi_1              1.471    0.392    3.750    0.000    0.448    0.448
##   eta_2 ~                                                               
##     eta_1             0.865    0.075   11.554    0.000    0.884    0.884
##     xi_1              0.600    0.226    2.660    0.008    0.187    0.187
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##  .y1 ~~                                                                 
##    .y5                0.583    0.356    1.637    0.102    0.583    0.281
##  .y2 ~~                                                                 
##    .y4                1.440    0.689    2.092    0.036    1.440    0.291
##    .y6                2.183    0.737    2.960    0.003    2.183    0.356
##  .y3 ~~                                                                 
##    .y7                0.712    0.611    1.165    0.244    0.712    0.169
##  .y4 ~~                                                                 
##    .y8                0.363    0.444    0.817    0.414    0.363    0.111
##  .y6 ~~                                                                 
##    .y8                1.372    0.577    2.378    0.017    1.372    0.338
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .y1                1.855    0.433    4.279    0.000    1.855    0.277
##    .y2                7.581    1.366    5.549    0.000    7.581    0.525
##    .y3                4.956    0.956    5.182    0.000    4.956    0.426
##    .y4                3.225    0.723    4.458    0.000    3.225    0.298
##    .y5                2.313    0.479    4.831    0.000    2.313    0.333
##    .y6                4.968    0.921    5.393    0.000    4.968    0.430
##    .y7                3.560    0.710    5.018    0.000    3.560    0.357
##    .y8                3.308    0.704    4.701    0.000    3.308    0.313
##    .x1                0.081    0.019    4.182    0.000    0.081    0.154
##    .x2                0.120    0.070    1.729    0.084    0.120    0.053
##    .x3                0.467    0.090    5.177    0.000    0.467    0.239
##    .eta_1             3.875    0.866    4.477    0.000    0.800    0.800
##    .eta_2             0.164    0.227    0.725    0.469    0.035    0.035
##     xi_1              0.449    0.087    5.175    0.000    1.000    1.000

Note that there were no warning messages returned from the sem function. If a positive-definite or other warning message was returned from the sem function, it is a signal that the model is not identified or suitable for the data. Output after this warning message may still say convergence was achieved, but should not ever be reported.

We are then presented with model fit information. We look for a non-significant \(\chi^2\) test, a RMSEA less than 0.05, CFI/TLI above 0.90 to 0.95, and SRMR less than 0.08. Consult Hu and Bentler (1999) for fuller details on interpretation.

The next section presents the parameter estimates. The standardized results are presented under the Std.lv and Std.all columns. The Std.lv column shows results that are standardized so that the latent variables have a variance of one. The Std.all column shows results that are standardized so that both the latent variables and the observed variables have a variance of one. The latter is reported more often and is akin to standardized regression coefficients, where x and y are both interpreted in terms of z-scores. Since the regressions here only involve latent variables, the Regressions results are the same under the Std.lv and Std.all columns. Note that the estimates for the loadings are the same for both latent democracy variables, which is what we imposed by labeling the respective parameters in the syntax. Note also that there are estimates corresponding to the error covariances, as we specified with our ~~ syntax.

Our interest is in the structural relationships between the latent variables. Since we do not know what a “unit” of democracy is, we should look at the standardized results under the Std.all heading. Here we see the following:

  1. A standard deviation increase in 1960 industrialization is associated with a .448 standard deviation increase in 1960 democracy, \(p < 0.001\).
  2. A standard deviation increase in 1960 industrialization is associated with a .187 standard deviation increase in 1965 democracy, \(p = 0.011\), holding 1960 democracy constant.
  3. A standard deviation increase in 1960 democracy is associated with a .884 standard deviation increase in 1965 democracy, \(p < 0.001\), holding 1960 industrialization constant.

Note that the lavaan package also provides a set of extractor functions to pull specific portions of the output to further process or analyze. The parameterEstimates(), standardizedSolution(), and fitMeasures() functions can be used to return only the unstandardized estimates, standardized estimates, and model fit statistics, respectively.

parameterEstimates(fit)
##      lhs op   rhs label   est    se      z pvalue ci.lower ci.upper
## 1  eta_1 =~    y1       1.000 0.000     NA     NA    1.000    1.000
## 2  eta_1 =~    y2    l2 1.191 0.139  8.551  0.000    0.918    1.464
## 3  eta_1 =~    y3    l3 1.175 0.120  9.755  0.000    0.939    1.411
## 4  eta_1 =~    y4    l4 1.251 0.117 10.712  0.000    1.022    1.480
## 5  eta_2 =~    y5       1.000 0.000     NA     NA    1.000    1.000
## 6  eta_2 =~    y6    l2 1.191 0.139  8.551  0.000    0.918    1.464
## 7  eta_2 =~    y7    l3 1.175 0.120  9.755  0.000    0.939    1.411
## 8  eta_2 =~    y8    l4 1.251 0.117 10.712  0.000    1.022    1.480
## 9   xi_1 =~    x1       1.000 0.000     NA     NA    1.000    1.000
## 10  xi_1 =~    x2       2.180 0.138 15.751  0.000    1.908    2.451
## 11  xi_1 =~    x3       1.818 0.152 11.971  0.000    1.521    2.116
## 12 eta_1  ~  xi_1       1.471 0.392  3.750  0.000    0.702    2.240
## 13 eta_2  ~ eta_1       0.865 0.075 11.554  0.000    0.718    1.012
## 14 eta_2  ~  xi_1       0.600 0.226  2.660  0.008    0.158    1.043
## 15    y1 ~~    y5       0.583 0.356  1.637  0.102   -0.115    1.280
## 16    y2 ~~    y4       1.440 0.689  2.092  0.036    0.091    2.790
## 17    y2 ~~    y6       2.183 0.737  2.960  0.003    0.738    3.628
## 18    y3 ~~    y7       0.712 0.611  1.165  0.244   -0.486    1.909
## 19    y4 ~~    y8       0.363 0.444  0.817  0.414   -0.508    1.233
## 20    y6 ~~    y8       1.372 0.577  2.378  0.017    0.241    2.502
## 21    y1 ~~    y1       1.855 0.433  4.279  0.000    1.005    2.704
## 22    y2 ~~    y2       7.581 1.366  5.549  0.000    4.904   10.259
## 23    y3 ~~    y3       4.956 0.956  5.182  0.000    3.081    6.830
## 24    y4 ~~    y4       3.225 0.723  4.458  0.000    1.807    4.642
## 25    y5 ~~    y5       2.313 0.479  4.831  0.000    1.375    3.251
## 26    y6 ~~    y6       4.968 0.921  5.393  0.000    3.163    6.774
## 27    y7 ~~    y7       3.560 0.710  5.018  0.000    2.169    4.951
## 28    y8 ~~    y8       3.308 0.704  4.701  0.000    1.929    4.687
## 29    x1 ~~    x1       0.081 0.019  4.182  0.000    0.043    0.120
## 30    x2 ~~    x2       0.120 0.070  1.729  0.084   -0.016    0.257
## 31    x3 ~~    x3       0.467 0.090  5.177  0.000    0.290    0.643
## 32 eta_1 ~~ eta_1       3.875 0.866  4.477  0.000    2.179    5.572
## 33 eta_2 ~~ eta_2       0.164 0.227  0.725  0.469   -0.280    0.609
## 34  xi_1 ~~  xi_1       0.449 0.087  5.175  0.000    0.279    0.619
standardizedSolution(fit)
##      lhs op   rhs est.std    se      z pvalue ci.lower ci.upper
## 1  eta_1 =~    y1   0.850 0.041 20.916  0.000    0.771    0.930
## 2  eta_1 =~    y2   0.690 0.060 11.583  0.000    0.573    0.806
## 3  eta_1 =~    y3   0.758 0.052 14.696  0.000    0.657    0.859
## 4  eta_1 =~    y4   0.838 0.042 20.115  0.000    0.756    0.919
## 5  eta_2 =~    y5   0.817 0.044 18.518  0.000    0.730    0.903
## 6  eta_2 =~    y6   0.755 0.054 14.011  0.000    0.649    0.860
## 7  eta_2 =~    y7   0.802 0.046 17.401  0.000    0.711    0.892
## 8  eta_2 =~    y8   0.829 0.042 19.787  0.000    0.747    0.911
## 9   xi_1 =~    x1   0.920 0.023 40.076  0.000    0.875    0.965
## 10  xi_1 =~    x2   0.973 0.016 59.139  0.000    0.941    1.005
## 11  xi_1 =~    x3   0.872 0.031 28.087  0.000    0.811    0.933
## 12 eta_1  ~  xi_1   0.448 0.103  4.334  0.000    0.245    0.650
## 13 eta_2  ~ eta_1   0.884 0.051 17.238  0.000    0.784    0.985
## 14 eta_2  ~  xi_1   0.187 0.071  2.642  0.008    0.048    0.325
## 15    y1 ~~    y5   0.281 0.143  1.969  0.049    0.001    0.561
## 16    y2 ~~    y4   0.291 0.114  2.547  0.011    0.067    0.515
## 17    y2 ~~    y6   0.356 0.096  3.714  0.000    0.168    0.543
## 18    y3 ~~    y7   0.169 0.135  1.259  0.208   -0.094    0.433
## 19    y4 ~~    y8   0.111 0.129  0.863  0.388   -0.141    0.363
## 20    y6 ~~    y8   0.338 0.110  3.078  0.002    0.123    0.554
## 21    y1 ~~    y1   0.277 0.069  4.002  0.000    0.141    0.412
## 22    y2 ~~    y2   0.525 0.082  6.389  0.000    0.364    0.685
## 23    y3 ~~    y3   0.426 0.078  5.447  0.000    0.273    0.579
## 24    y4 ~~    y4   0.298 0.070  4.276  0.000    0.162    0.435
## 25    y5 ~~    y5   0.333 0.072  4.617  0.000    0.191    0.474
## 26    y6 ~~    y6   0.430 0.081  5.292  0.000    0.271    0.590
## 27    y7 ~~    y7   0.357 0.074  4.841  0.000    0.213    0.502
## 28    y8 ~~    y8   0.313 0.069  4.508  0.000    0.177    0.449
## 29    x1 ~~    x1   0.154 0.042  3.635  0.000    0.071    0.236
## 30    x2 ~~    x2   0.053 0.032  1.671  0.095   -0.009    0.116
## 31    x3 ~~    x3   0.239 0.054  4.419  0.000    0.133    0.346
## 32 eta_1 ~~ eta_1   0.800 0.092  8.648  0.000    0.618    0.981
## 33 eta_2 ~~ eta_2   0.035 0.049  0.729  0.466   -0.060    0.131
## 34  xi_1 ~~  xi_1   1.000 0.000     NA     NA    1.000    1.000
fitMeasures(fit)
##                npar                fmin               chisq 
##              28.000               0.268              40.179 
##                  df              pvalue      baseline.chisq 
##              38.000               0.374             730.654 
##         baseline.df     baseline.pvalue                 cfi 
##              55.000               0.000               0.997 
##                 tli                nnfi                 rfi 
##               0.995               0.995               0.920 
##                 nfi                pnfi                 ifi 
##               0.945               0.653               0.997 
##                 rni                logl   unrestricted.logl 
##               0.997           -1548.818           -1528.728 
##                 aic                 bic              ntotal 
##            3153.636            3218.526              75.000 
##                bic2               rmsea      rmsea.ci.lower 
##            3130.277               0.028               0.000 
##      rmsea.ci.upper        rmsea.pvalue                 rmr 
##               0.087               0.665               0.477 
##          rmr_nomean                srmr        srmr_bentler 
##               0.477               0.056               0.056 
## srmr_bentler_nomean                crmr         crmr_nomean 
##               0.056               0.051               0.051 
##          srmr_mplus   srmr_mplus_nomean               cn_05 
##               0.050               0.050             100.647 
##               cn_01                 gfi                agfi 
##             115.167               0.920               0.861 
##                pgfi                 mfi                ecvi 
##               0.530               0.986               1.282

Visualizing SEMs using semPlot

We can visualize our model with semPlot::semPaths():

library(semPlot)
semPaths(fit, nCharNodes = 0, style = "lisrel", rotation = 2)

The semPaths() function takes the fitted lavaan model object as the main argument, but has a number of different options available to customize the path diagram. Here, we set nCharNodes = 0, so that the variable names are not abbreviated. We also set the styling to look like the “lisrel” software output, and set the rotation so that the path diagram flows horizontally. We can see in the above image that the specification of the model from lavaan is reflected in the diagram.

Other Resources

For further information on fitting SEM models in lavaan, check out the following:

Citations

Bollen, K.A. (1989). Structural Equations with Latent Variables. New York, NY: Wiley.

Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55.