This page describes how to set up code in Mplus to fit a confirmatory factor analysis (CFA) model. The model, which consists of two latent variables and eight manifest variables, is described in our previous post setting up a running CFA and SEM example. Mplus only reads data in text format, see this post for details on how to prepare a data file for Mplus. The data can be accessed from Github. To review, the model to be fit is the following:

We’ll start out with a basic CFA model that does not have any constraints on the parameters nor any correlated errors. The code for such a model would be the following:

```
TITLE: Bollen's (1989, chapter 7) CFA Example;
DATA: FILE IS sem-bollen.dat;
VARIABLE: NAMES ARE x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11;
USEVARIABLES ARE x1 x2 x3 x4 x5 x6 x7 x8;
MODEL:
xi_1 BY x1
x2
x3
x4;
xi_2 BY x5;
x6;
x7;
x8;
```

The optional `TITLE`

command labels the model. The title here indicates that we are replicating the model described in chapter 7 of Bollen’s (1989, pg. 235) book. Note that every command must end with a semicolon. Also keep in mind that the number of characters in any row of the input file cannot exceed 80.

The `DATA`

command points to where the data are located. In this example, it is assumed that the data are in the same folder as this input file. If not, fuller pathnames to the data file would need to be used, such as `"C:\Users\you\Documents\mplus-files\sem-bollen.dat"`

.

The `VARIABLE`

command lists the variables in the order in which they appear in the data file. The second line specifies the `USEVARIABLES`

, or the variables that will actually be used in the analysis. This line is not necessary if all of the variables in the data file will be used.

The `MODEL`

command describes the model. The syntax for latent variables is to list the name of the latent variable, followed by the word `BY`

, followed by a list of the observed variables. Here we say that we want two latent variables. The first is \(\xi_1\) (Greek letter pronounced “xi”) and is measured with the variables \(x_1-x_4\). The second is \(\xi_2\) and is measured with the variables \(x_5-x_8\).

Note that there are no missing values in this file. If there were missing, we would add a line after the `USEVARIABLES`

statement like the following:

`MISSING ARE ALL (-999)`

This of course assumes missing values have all been recoded as -999. The choice of numeric value for missing is up to the user who prepares the data.

Mplus will by default use standard maximum likelihood estimation (specifically, Full Information Maximum Likelihood, or FIML, which is robust to data that have values missing at random). The default is also to report the conventional chi-square test and maximum likelihood standard errors. The optional `ANALYSIS`

command can be used to change the estimator for some or all statistics. For example, adding

`ANALYSIS: ESTIMATOR = MLM`

to the input file will tell Mplus to still use maximum likelihood estimation for model parameters and standard errors but to report the Satorra-Bentler chi-square statistic that is more robust to non-normality in the data. Alternatively,

`ANALYSIS: ESTIMATOR = MLR`

will use maximum likelihood to estimate the parameters as well as cluster-robust standard errors based on the sandwich estimator. The full list of estimators can be found in the Mplus User’s Guide, see the ANALYSIS COMMAND chapter. `ESTIMATOR = ML`

is the default.

The above syntax for the input file will be sufficient for many CFA models. However, it is also common to impose constraints on a CFA model, such as forcing factor loadings to be equal or allowing errors to covary. Bollen’s model includes both of these. First, because the latent variables represent the same democracy construct measured at two points in time, it makes sense that the respective factor loadings would be the same for each factor. We can specify this constraint as follows:

```
TITLE: Bollen's (1989, chapter 7) CFA Example;
DATA: FILE IS sem-bollen.dat;
VARIABLE: NAMES ARE x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11;
USEVARIABLES ARE x1 x2 x3 x4 x5 x6 x7 x8;
MODEL:
xi_1 BY x1
x2 (l2)
x3 (l3)
x4 (l4);
x2i2 BY x5
x6 (l2)
x7 (l3)
x8 (l4);
```

This syntax adds labels after each loading to specify which should be equal. `l2`

, which is short for `lambda 2`

, has been added after `x2`

and `x6`

. Having the same label forces these loadings to be equal. Likewise, the `l3`

label will force the `x3`

and `x7`

loadings to be equal, and the `l4`

label will force the `x4`

and `x8`

loadings to be equal. We did not need to impose any constraint for `x1`

and `x5`

. By default, Mplus identifies the model by constraining the first loading for each factor to equal one.

By default, Mplus will assume that all error variances for the observed variables are independent of each other. We can relax this constraint with some additional syntax:

```
TITLE: Bollen's (1989, chapter 7) CFA Example;
DATA: FILE IS sem-bollen.dat;
VARIABLE: NAMES ARE x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11;
USEVARIABLES ARE x1 x2 x3 x4 x5 x6 x7 x8;
MODEL:
xi_1 BY x1
x2 (l2)
x3 (l3)
x4 (l4);
xi_2 BY x5
x6 (l2)
x7 (l3)
x8 (l4);
x1 WITH x5;
x2 WITH x4;
x2 WITH x6;
x3 WITH x7;
x4 WITH x8;
x6 WITH x8;
```

The `WITH`

statement specifies which error variances covary. Most of the covariances capture the fact that, having the same measures at two time points, any idiosyncrasies present at the first time point may also be present at the second time point. We also allow the error variances for the second and fourth observed variables at each time point to covary.

Finally, it is common to present standardized estimates rather than the unstandardized parameters. We can get this included in the output by adding one more line:

```
TITLE: Bollen's (1989, chapter 7) CFA Example;
DATA: FILE IS sem-bollen.dat;
VARIABLE: NAMES ARE x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11;
USEVARIABLES ARE x1 x2 x3 x4 x5 x6 x7 x8;
MODEL:
xi_1 BY x1
x2 (l2)
x3 (l3)
x4 (l4);
xi_2 BY x5
x6 (l2)
x7 (l3)
x8 (l4);
x1 WITH x5;
x2 WITH x4;
x2 WITH x6;
x3 WITH x7;
x4 WITH x8;
x6 WITH x8;
OUTPUT: STANDARDIZED;
```

Requesting `STANDARDIZED`

for the output will produce three types of standardization which appear in the output file as `STDYX`

, `STDY`

, and `STD`

. In most cases, `STDYX`

will be the section of interest, as it standardizes the output to be interpreted in standard deviation units (just like standardized regression coefficients). The other two may be of interest in full structural equation models (SEMs), especially when a categorical covariate is involved. For a CFA with continuous indicators, `STDYX`

and `STDY`

will be equivalent.

With our syntax ready we can now save the file and then click the red `Run`

button in the toolbar to get the estimates. Doing so yields the following:

```
Mplus VERSION 8
MUTHEN & MUTHEN
06/25/2019 9:54 AM
INPUT INSTRUCTIONS
TITLE: Bollens (1989, chapter 7) CFA Example;
DATA: FILE IS sem-bollen.dat;
VARIABLE: NAMES ARE x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11;
USEVARIABLES ARE x1 x2 x3 x4 x5 x6 x7 x8;
MODEL:
xi_1 BY x1
x2 (l2)
x3 (l3)
x4 (l4);
xi_2 BY x5
x6 (l2)
x7 (l3)
x8 (l4);
x1 WITH x5;
x2 WITH x4;
x2 WITH x6;
x3 WITH x7;
x4 WITH x8;
x6 WITH x8;
OUTPUT: STANDARDIZED;
INPUT READING TERMINATED NORMALLY
Bollens (1989, chapter 7) CFA Example;
SUMMARY OF ANALYSIS
Number of groups 1
Number of observations 75
Number of dependent variables 8
Number of independent variables 0
Number of continuous latent variables 2
Observed dependent variables
Continuous
X1 X2 X3 X4 X5 X6
X7 X8
Continuous latent variables
XI_1 XI_2
Estimator ML
Information matrix OBSERVED
Maximum number of iterations 1000
Convergence criterion 0.500D-04
Maximum number of steepest descent iterations 20
Input data file(s)
sem-bollen.dat
Input data format FREE
UNIVARIATE SAMPLE STATISTICS
UNIVARIATE HIGHER-ORDER MOMENT DESCRIPTIVE STATISTICS
Variable/ Mean/ Skewness/ Minimum/ % with Percentiles
Sample Size Variance Kurtosis Maximum Min/Max 20%/60% 40%/80% Median
X1 5.465 -0.093 1.250 10.67% 2.500 5.000 5.400
75.000 6.787 -1.104 10.000 6.67% 6.900 7.500
X2 4.256 0.325 0.000 34.67% 0.000 3.333 3.333
75.000 15.372 -1.426 10.000 21.33% 4.800 10.000
X3 6.563 -0.606 0.000 10.67% 3.333 6.667 6.667
75.000 10.621 -0.657 10.000 1.33% 6.667 10.000
X4 4.453 0.120 0.000 22.67% 0.000 3.333 3.333
75.000 11.069 -1.164 10.000 10.67% 6.667 6.667
X5 5.136 -0.233 0.000 6.67% 2.500 5.000 5.000
75.000 6.735 -0.718 10.000 2.67% 6.250 7.500
X6 2.978 0.911 0.000 40.00% 0.000 0.000 2.233
75.000 11.224 -0.400 10.000 10.67% 3.333 6.667
X7 6.196 -0.565 0.000 13.33% 3.333 6.667 6.667
75.000 10.655 -0.672 10.000 26.67% 6.667 10.000
X8 4.043 0.455 0.000 16.00% 0.368 3.333 3.333
75.000 10.393 -0.906 10.000 12.00% 3.333 6.667
THE MODEL ESTIMATION TERMINATED NORMALLY
MODEL FIT INFORMATION
Number of Free Parameters 28
Loglikelihood
H0 Value -1320.232
H1 Value -1312.572
Information Criteria
Akaike (AIC) 2696.464
Bayesian (BIC) 2761.354
Sample-Size Adjusted BIC 2673.105
(n* = (n + 2) / 24)
Chi-Square Test of Model Fit
Value 15.320
Degrees of Freedom 16
P-Value 0.5013
RMSEA (Root Mean Square Error Of Approximation)
Estimate 0.000
90 Percent C.I. 0.000 0.103
Probability RMSEA <= .05 0.683
CFI/TLI
CFI 1.000
TLI 1.003
Chi-Square Test of Model Fit for the Baseline Model
Value 461.111
Degrees of Freedom 28
P-Value 0.0000
SRMR (Standardized Root Mean Square Residual)
Value 0.046
MODEL RESULTS
Two-Tailed
Estimate S.E. Est./S.E. P-Value
XI_1 BY
X1 1.000 0.000 999.000 999.000
X2 1.213 0.146 8.309 0.000
X3 1.210 0.124 9.748 0.000
X4 1.273 0.127 10.038 0.000
XI_2 BY
X5 1.000 0.000 999.000 999.000
X6 1.213 0.146 8.309 0.000
X7 1.210 0.124 9.748 0.000
X8 1.273 0.127 10.038 0.000
XI_2 WITH
XI_1 4.461 0.972 4.591 0.000
X1 WITH
X5 0.577 0.371 1.556 0.120
X2 WITH
X4 1.390 0.685 2.030 0.042
X6 2.068 0.729 2.838 0.005
X3 WITH
X7 0.727 0.619 1.175 0.240
X4 WITH
X8 0.476 0.461 1.032 0.302
X6 WITH
X8 1.257 0.584 2.151 0.031
Intercepts
X1 5.465 0.296 18.440 0.000
X2 4.256 0.439 9.696 0.000
X3 6.563 0.398 16.504 0.000
X4 4.453 0.380 11.713 0.000
X5 5.136 0.306 16.781 0.000
X6 2.978 0.391 7.616 0.000
X7 6.196 0.364 17.026 0.000
X8 4.043 0.375 10.776 0.000
Variances
XI_1 4.708 1.044 4.510 0.000
XI_2 4.528 1.020 4.440 0.000
Residual Variances
X1 1.879 0.454 4.140 0.000
X2 7.530 1.338 5.630 0.000
X3 4.966 0.970 5.121 0.000
X4 3.214 0.728 4.414 0.000
X5 2.499 0.518 4.823 0.000
X6 4.809 0.908 5.296 0.000
X7 3.302 0.722 4.572 0.000
X8 3.227 0.719 4.486 0.000
STANDARDIZED MODEL RESULTS
STDYX Standardization
Two-Tailed
Estimate S.E. Est./S.E. P-Value
XI_1 BY
X1 0.845 0.044 19.272 0.000
X2 0.692 0.061 11.383 0.000
X3 0.762 0.049 15.515 0.000
X4 0.839 0.042 19.765 0.000
XI_2 BY
X5 0.803 0.046 17.472 0.000
X6 0.762 0.053 14.383 0.000
X7 0.817 0.048 17.008 0.000
X8 0.833 0.043 19.548 0.000
XI_2 WITH
XI_1 0.966 0.030 32.453 0.000
X1 WITH
X5 0.266 0.144 1.854 0.064
X2 WITH
X4 0.283 0.116 2.426 0.015
X6 0.344 0.100 3.435 0.001
X3 WITH
X7 0.180 0.141 1.271 0.204
X4 WITH
X8 0.148 0.133 1.113 0.266
X6 WITH
X8 0.319 0.117 2.725 0.006
Intercepts
X1 2.129 0.204 10.459 0.000
X2 1.120 0.141 7.924 0.000
X3 1.906 0.187 10.170 0.000
X4 1.352 0.156 8.680 0.000
X5 1.938 0.190 10.201 0.000
X6 0.879 0.135 6.507 0.000
X7 1.966 0.190 10.347 0.000
X8 1.244 0.151 8.259 0.000
Variances
XI_1 1.000 0.000 999.000 999.000
XI_2 1.000 0.000 999.000 999.000
Residual Variances
X1 0.285 0.074 3.847 0.000
X2 0.521 0.084 6.191 0.000
X3 0.419 0.075 5.587 0.000
X4 0.297 0.071 4.165 0.000
X5 0.356 0.074 4.822 0.000
X6 0.419 0.081 5.194 0.000
X7 0.332 0.078 4.235 0.000
X8 0.306 0.071 4.301 0.000
STDY Standardization
Two-Tailed
Estimate S.E. Est./S.E. P-Value
XI_1 BY
X1 0.845 0.044 19.272 0.000
X2 0.692 0.061 11.383 0.000
X3 0.762 0.049 15.515 0.000
X4 0.839 0.042 19.765 0.000
XI_2 BY
X5 0.803 0.046 17.472 0.000
X6 0.762 0.053 14.383 0.000
X7 0.817 0.048 17.008 0.000
X8 0.833 0.043 19.548 0.000
XI_2 WITH
XI_1 0.966 0.030 32.453 0.000
X1 WITH
X5 0.266 0.144 1.854 0.064
X2 WITH
X4 0.283 0.116 2.426 0.015
X6 0.344 0.100 3.435 0.001
X3 WITH
X7 0.180 0.141 1.271 0.204
X4 WITH
X8 0.148 0.133 1.113 0.266
X6 WITH
X8 0.319 0.117 2.725 0.006
Intercepts
X1 2.129 0.204 10.459 0.000
X2 1.120 0.141 7.924 0.000
X3 1.906 0.187 10.170 0.000
X4 1.352 0.156 8.680 0.000
X5 1.938 0.190 10.201 0.000
X6 0.879 0.135 6.507 0.000
X7 1.966 0.190 10.347 0.000
X8 1.244 0.151 8.259 0.000
Variances
XI_1 1.000 0.000 999.000 999.000
XI_2 1.000 0.000 999.000 999.000
Residual Variances
X1 0.285 0.074 3.847 0.000
X2 0.521 0.084 6.191 0.000
X3 0.419 0.075 5.587 0.000
X4 0.297 0.071 4.165 0.000
X5 0.356 0.074 4.822 0.000
X6 0.419 0.081 5.194 0.000
X7 0.332 0.078 4.235 0.000
X8 0.306 0.071 4.301 0.000
STD Standardization
Two-Tailed
Estimate S.E. Est./S.E. P-Value
XI_1 BY
X1 2.170 0.241 9.020 0.000
X2 2.631 0.346 7.608 0.000
X3 2.626 0.304 8.648 0.000
X4 2.761 0.299 9.234 0.000
XI_2 BY
X5 2.128 0.240 8.880 0.000
X6 2.580 0.327 7.887 0.000
X7 2.575 0.297 8.682 0.000
X8 2.708 0.294 9.203 0.000
XI_2 WITH
XI_1 0.966 0.030 32.453 0.000
X1 WITH
X5 0.577 0.371 1.556 0.120
X2 WITH
X4 1.390 0.685 2.030 0.042
X6 2.068 0.729 2.838 0.005
X3 WITH
X7 0.727 0.619 1.175 0.240
X4 WITH
X8 0.476 0.461 1.032 0.302
X6 WITH
X8 1.257 0.584 2.151 0.031
Intercepts
X1 5.465 0.296 18.440 0.000
X2 4.256 0.439 9.696 0.000
X3 6.563 0.398 16.504 0.000
X4 4.453 0.380 11.713 0.000
X5 5.136 0.306 16.781 0.000
X6 2.978 0.391 7.616 0.000
X7 6.196 0.364 17.026 0.000
X8 4.043 0.375 10.776 0.000
Variances
XI_1 1.000 0.000 999.000 999.000
XI_2 1.000 0.000 999.000 999.000
Residual Variances
X1 1.879 0.454 4.140 0.000
X2 7.530 1.338 5.630 0.000
X3 4.966 0.970 5.121 0.000
X4 3.214 0.728 4.414 0.000
X5 2.499 0.518 4.823 0.000
X6 4.809 0.908 5.296 0.000
X7 3.302 0.722 4.572 0.000
X8 3.227 0.719 4.486 0.000
R-SQUARE
Observed Two-Tailed
Variable Estimate S.E. Est./S.E. P-Value
X1 0.715 0.074 9.636 0.000
X2 0.479 0.084 5.692 0.000
X3 0.581 0.075 7.757 0.000
X4 0.703 0.071 9.882 0.000
X5 0.644 0.074 8.736 0.000
X6 0.581 0.081 7.191 0.000
X7 0.668 0.078 8.504 0.000
X8 0.694 0.071 9.774 0.000
QUALITY OF NUMERICAL RESULTS
Condition Number for the Information Matrix 0.254E-02
(ratio of smallest to largest eigenvalue)
DIAGRAM INFORMATION
Use View Diagram under the Diagram menu in the Mplus Editor to view the diagram.
If running Mplus from the Mplus Diagrammer, the diagram opens automatically.
Diagram output
c:\users\jeremy\documents\mplus-data\bollen-cfa.dgm
Beginning Time: 09:54:06
Ending Time: 09:54:06
Elapsed Time: 00:00:00
MUTHEN & MUTHEN
3463 Stoner Ave.
Los Angeles, CA 90066
Tel: (310) 391-9971
Fax: (310) 391-8971
Web: www.StatModel.com
Support: Support@StatModel.com
Copyright (c) 1998-2017 Muthen & Muthen
```

The first part of the output reiterates the code. We then see that `INPUT READING TERMINATED NORMALLY`

. If there were syntax errors, Mplus would alert us at this point, and we would want to go back and check our syntax and data.

The next section describes the model and estimator, followed by a table of descriptive statistics for the observed variables.

Next, the output states `THE MODEL ESTIMATION TERMINATED NORMALLY.`

This is important information. If the model were not identified and/or convergence did not occur after the default maximum number of iterations, Mplus would tell us here. Output that does *not* say that the estimation terminated normally should *not ever* be reported.

We are then presented with model fit information. We look for a non-significant \(\chi^2\) test, a RMSEA less than 0.05, CFI/TLI above 0.90 to 0.95, and SRMR less than 0.08. Consult Hu and Bentler (1999) for fuller details on interpretation.

The next section presents the parameter estimates. The unstandardized results are presented first, followed by the standardized results (we would focus on `STDXY`

or `STDY`

here). Note that the estimates for the loadings are the same for both latent variables, which is what we imposed by labeling the respective parameters in the syntax. Note also that there are estimates corresponding to the error covariances, as we specified in our `WITH`

statements.

To view a path diagram of the model, click on **Diagram** \(\rightarrow\) **View Diagram** in Mplus. This will open a new application that shows the model, such as the following:

The user can toggle between unstandardized parameter estimates (shown) and the different standardizations. In addition, some formatting can be performed to get the image in better shape for publication.

Still have questions? Contact us!

## Citations

Bollen, K.A. (1989). *Structural Equations with Latent Variables*. New York, NY: Wiley.

Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55.