# CFA in Mplus

Jeremy Albright

Posted on
CFA Mplus

This page describes how to set up code in Mplus to fit a confirmatory factor analysis (CFA) model. The model, which consists of two latent variables and eight manifest variables, is described here. Mplus only reads data in text format, see this post for details on how to prepare a data file for Mplus. The data can be accessed from Github. To review, the model to be fit is the following:

We’ll start out with a basic CFA model that does not have any constraints on the parameters nor any correlated errors. The code for such a model would be the following:


TITLE: Bollen's (1989, chapter 7) CFA Example;

DATA: FILE IS sem-bollen.dat;

VARIABLE: NAMES ARE x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11;
USEVARIABLES ARE x1 x2 x3 x4 x5 x6 x7 x8;

MODEL:
xi_1 BY x1
x2
x3
x4;

xi_2 BY x5;
x6;
x7;
x8;


The optional TITLE command labels the model. The title here indicates that we are replicating the model described in chapter 7 of Bollen’s (1989, pg. 235) book. Note that every command must end with a semicolon. Also keep in mind that the number of characters in any row of the input file cannot exceed 80.

The DATA command points to where the data are located. In this example, it is assumed that the data are in the same folder as this input file. If not, fuller pathnames to the data file would need to be used, such as "C:\Users\you\Documents\mplus-files\sem-bollen.dat".

The VARIABLE command lists the variables in the order in which they appear in the data file. The second line specifies the USEVARIABLES, or the variables that will actually be used in the analysis. This line is not necessary if all of the variables in the data file will be used.

The MODEL command describes the model. The syntax for latent variables is to list the name of the latent variable, followed by the word BY, followed by a list of the observed variables. Here we say that we want two latent variables. The first is $$\xi_1$$ (Greek letter pronounced “xi”) and is measured with the variables $$x_1-x_4$$. The second is $$\xi_2$$ and is measured with the variables $$x_5-x_8$$.

Note that there are no missing values in this file. If there were missing, we would add a line after the USEVARIABLES statement like the following:

MISSING ARE ALL (-999)

This of course assumes missing values have all been recoded as -999. The choice of numeric value for missing is up to the user who prepares the data.

Mplus will by default use standard maximum likelihood estimation (specifically, Full Information Maximum Likelihood, or FIML, which is robust to data that have values missing at random). The default is also to report the conventional chi-square test and maximum likelihood standard errors. The optional ANALYSIS command can be used to change the estimator for some or all statistics. For example, adding

ANALYSIS: ESTIMATOR = MLM

to the input file will tell Mplus to still use maximum likelihood estimation for model parameters and standard errors but to report the Satorra-Bentler chi-square statistic that is more robust to non-normality in the data. Alternatively,

ANALYSIS: ESTIMATOR = MLR

will use maximum likelihood to estimate the parameters as well as cluster-robust standard errors based on the sandwich estimator. The full list of estimators can be found in the Mplus User’s Guide, see the ANALYSIS COMMAND chapter. ESTIMATOR = ML is the default.

The above syntax for the input file will be sufficient for many CFA models. However, it is also common to impose constraints on a CFA model, such as forcing factor loadings to be equal or allowing errors to covary. Bollen’s model includes both of these. First, because the latent variables represent the same democracy construct measured at two points in time, it makes sense that the respective factor loadings would be the same for each factor. We can specify this constraint as follows:


TITLE: Bollen's (1989, chapter 7) CFA Example;

DATA: FILE IS sem-bollen.dat;

VARIABLE: NAMES ARE x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11;
USEVARIABLES ARE x1 x2 x3 x4 x5 x6 x7 x8;

MODEL:
xi_1 BY x1
x2 (l2)
x3 (l3)
x4 (l4);

x2i2 BY x5
x6 (l2)
x7 (l3)
x8 (l4);



This syntax adds labels after each loading to specify which should be equal. l2, which is short for lambda 2, has been added after x2 and x6. Having the same label forces these loadings to be equal. Likewise, the l3 label will force the x3 and x7 loadings to be equal, and the l4 label will force the x4 and x8 loadings to be equal. We did not need to impose any constraint for x1 and x5. By default, Mplus identifies the model by constraining the first loading for each factor to equal one.

By default, Mplus will assume that all error variances for the observed variables are independent of each other. We can relax this constraint with some additional syntax:


TITLE: Bollen's (1989, chapter 7) CFA Example;

DATA: FILE IS sem-bollen.dat;

VARIABLE: NAMES ARE x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11;
USEVARIABLES ARE x1 x2 x3 x4 x5 x6 x7 x8;

MODEL:
xi_1 BY x1
x2 (l2)
x3 (l3)
x4 (l4);

xi_2 BY x5
x6 (l2)
x7 (l3)
x8 (l4);

x1 WITH x5;
x2 WITH x4;
x2 WITH x6;
x3 WITH x7;
x4 WITH x8;
x6 WITH x8;


The WITH statement specifies which error variances covary. Most of the covariances capture the fact that, having the same measures at two time points, any idiosyncrasies present at the first time point may also be present at the second time point. We also allow the error variances for the second and fourth observed variables at each time point to covary.

Finally, it is common to present standardized estimates rather than the unstandardized parameters. We can get this included in the output by adding one more line:


TITLE: Bollen's (1989, chapter 7) CFA Example;

DATA: FILE IS sem-bollen.dat;

VARIABLE: NAMES ARE x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11;
USEVARIABLES ARE x1 x2 x3 x4 x5 x6 x7 x8;

MODEL:
xi_1 BY x1
x2 (l2)
x3 (l3)
x4 (l4);

xi_2 BY x5
x6 (l2)
x7 (l3)
x8 (l4);

x1 WITH x5;
x2 WITH x4;
x2 WITH x6;
x3 WITH x7;
x4 WITH x8;
x6 WITH x8;

OUTPUT: STANDARDIZED;


Requesting STANDARDIZED for the output will produce three types of standardization which appear in the output file as STDYX, STDY, and STD. In most cases, STDYX will be the section of interest, as it standardizes the output to be interpreted in standard deviation units (just like standardized regression coefficients). The other two may be of interest in full structural equation models (SEMs), especially when a categorical covariate is involved. For a CFA with continuous indicators, STDYX and STDY will be equivalent.

With our syntax ready we can now save the file and then click the red Run button in the toolbar to get the estimates. Doing so yields the following:


Mplus VERSION 8
MUTHEN & MUTHEN
06/25/2019   9:54 AM

INPUT INSTRUCTIONS

TITLE: Bollens (1989, chapter 7) CFA Example;

DATA: FILE IS sem-bollen.dat;

VARIABLE: NAMES ARE x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11;
USEVARIABLES ARE x1 x2 x3 x4 x5 x6 x7 x8;

MODEL:
xi_1 BY x1
x2 (l2)
x3 (l3)
x4 (l4);

xi_2 BY x5
x6 (l2)
x7 (l3)
x8 (l4);

x1 WITH x5;
x2 WITH x4;
x2 WITH x6;
x3 WITH x7;
x4 WITH x8;
x6 WITH x8;

OUTPUT: STANDARDIZED;

Bollens (1989, chapter 7) CFA Example;

SUMMARY OF ANALYSIS

Number of groups                                                 1
Number of observations                                          75

Number of dependent variables                                    8
Number of independent variables                                  0
Number of continuous latent variables                            2

Observed dependent variables

Continuous
X1          X2          X3          X4          X5          X6
X7          X8

Continuous latent variables
XI_1        XI_2

Estimator                                                       ML
Information matrix                                        OBSERVED
Maximum number of iterations                                  1000
Convergence criterion                                    0.500D-04
Maximum number of steepest descent iterations                   20

Input data file(s)
sem-bollen.dat

Input data format  FREE

UNIVARIATE SAMPLE STATISTICS

UNIVARIATE HIGHER-ORDER MOMENT DESCRIPTIVE STATISTICS

Variable/         Mean/     Skewness/   Minimum/ % with                Percentiles
Sample Size      Variance    Kurtosis    Maximum  Min/Max      20%/60%    40%/80%    Median

X1                    5.465      -0.093       1.250   10.67%       2.500      5.000      5.400
75.000       6.787      -1.104      10.000    6.67%       6.900      7.500
X2                    4.256       0.325       0.000   34.67%       0.000      3.333      3.333
75.000      15.372      -1.426      10.000   21.33%       4.800     10.000
X3                    6.563      -0.606       0.000   10.67%       3.333      6.667      6.667
75.000      10.621      -0.657      10.000    1.33%       6.667     10.000
X4                    4.453       0.120       0.000   22.67%       0.000      3.333      3.333
75.000      11.069      -1.164      10.000   10.67%       6.667      6.667
X5                    5.136      -0.233       0.000    6.67%       2.500      5.000      5.000
75.000       6.735      -0.718      10.000    2.67%       6.250      7.500
X6                    2.978       0.911       0.000   40.00%       0.000      0.000      2.233
75.000      11.224      -0.400      10.000   10.67%       3.333      6.667
X7                    6.196      -0.565       0.000   13.33%       3.333      6.667      6.667
75.000      10.655      -0.672      10.000   26.67%       6.667     10.000
X8                    4.043       0.455       0.000   16.00%       0.368      3.333      3.333
75.000      10.393      -0.906      10.000   12.00%       3.333      6.667

THE MODEL ESTIMATION TERMINATED NORMALLY

MODEL FIT INFORMATION

Number of Free Parameters                       28

Loglikelihood

H0 Value                       -1320.232
H1 Value                       -1312.572

Information Criteria

Akaike (AIC)                    2696.464
Bayesian (BIC)                  2761.354
(n* = (n + 2) / 24)

Chi-Square Test of Model Fit

Value                             15.320
Degrees of Freedom                    16
P-Value                           0.5013

RMSEA (Root Mean Square Error Of Approximation)

Estimate                           0.000
90 Percent C.I.                    0.000  0.103
Probability RMSEA <= .05           0.683

CFI/TLI

CFI                                1.000
TLI                                1.003

Chi-Square Test of Model Fit for the Baseline Model

Value                            461.111
Degrees of Freedom                    28
P-Value                           0.0000

SRMR (Standardized Root Mean Square Residual)

Value                              0.046

MODEL RESULTS

Two-Tailed
Estimate       S.E.  Est./S.E.    P-Value

XI_1     BY
X1                 1.000      0.000    999.000    999.000
X2                 1.213      0.146      8.309      0.000
X3                 1.210      0.124      9.748      0.000
X4                 1.273      0.127     10.038      0.000

XI_2     BY
X5                 1.000      0.000    999.000    999.000
X6                 1.213      0.146      8.309      0.000
X7                 1.210      0.124      9.748      0.000
X8                 1.273      0.127     10.038      0.000

XI_2     WITH
XI_1               4.461      0.972      4.591      0.000

X1       WITH
X5                 0.577      0.371      1.556      0.120

X2       WITH
X4                 1.390      0.685      2.030      0.042
X6                 2.068      0.729      2.838      0.005

X3       WITH
X7                 0.727      0.619      1.175      0.240

X4       WITH
X8                 0.476      0.461      1.032      0.302

X6       WITH
X8                 1.257      0.584      2.151      0.031

Intercepts
X1                 5.465      0.296     18.440      0.000
X2                 4.256      0.439      9.696      0.000
X3                 6.563      0.398     16.504      0.000
X4                 4.453      0.380     11.713      0.000
X5                 5.136      0.306     16.781      0.000
X6                 2.978      0.391      7.616      0.000
X7                 6.196      0.364     17.026      0.000
X8                 4.043      0.375     10.776      0.000

Variances
XI_1               4.708      1.044      4.510      0.000
XI_2               4.528      1.020      4.440      0.000

Residual Variances
X1                 1.879      0.454      4.140      0.000
X2                 7.530      1.338      5.630      0.000
X3                 4.966      0.970      5.121      0.000
X4                 3.214      0.728      4.414      0.000
X5                 2.499      0.518      4.823      0.000
X6                 4.809      0.908      5.296      0.000
X7                 3.302      0.722      4.572      0.000
X8                 3.227      0.719      4.486      0.000

STANDARDIZED MODEL RESULTS

STDYX Standardization

Two-Tailed
Estimate       S.E.  Est./S.E.    P-Value

XI_1     BY
X1                 0.845      0.044     19.272      0.000
X2                 0.692      0.061     11.383      0.000
X3                 0.762      0.049     15.515      0.000
X4                 0.839      0.042     19.765      0.000

XI_2     BY
X5                 0.803      0.046     17.472      0.000
X6                 0.762      0.053     14.383      0.000
X7                 0.817      0.048     17.008      0.000
X8                 0.833      0.043     19.548      0.000

XI_2     WITH
XI_1               0.966      0.030     32.453      0.000

X1       WITH
X5                 0.266      0.144      1.854      0.064

X2       WITH
X4                 0.283      0.116      2.426      0.015
X6                 0.344      0.100      3.435      0.001

X3       WITH
X7                 0.180      0.141      1.271      0.204

X4       WITH
X8                 0.148      0.133      1.113      0.266

X6       WITH
X8                 0.319      0.117      2.725      0.006

Intercepts
X1                 2.129      0.204     10.459      0.000
X2                 1.120      0.141      7.924      0.000
X3                 1.906      0.187     10.170      0.000
X4                 1.352      0.156      8.680      0.000
X5                 1.938      0.190     10.201      0.000
X6                 0.879      0.135      6.507      0.000
X7                 1.966      0.190     10.347      0.000
X8                 1.244      0.151      8.259      0.000

Variances
XI_1               1.000      0.000    999.000    999.000
XI_2               1.000      0.000    999.000    999.000

Residual Variances
X1                 0.285      0.074      3.847      0.000
X2                 0.521      0.084      6.191      0.000
X3                 0.419      0.075      5.587      0.000
X4                 0.297      0.071      4.165      0.000
X5                 0.356      0.074      4.822      0.000
X6                 0.419      0.081      5.194      0.000
X7                 0.332      0.078      4.235      0.000
X8                 0.306      0.071      4.301      0.000

STDY Standardization

Two-Tailed
Estimate       S.E.  Est./S.E.    P-Value

XI_1     BY
X1                 0.845      0.044     19.272      0.000
X2                 0.692      0.061     11.383      0.000
X3                 0.762      0.049     15.515      0.000
X4                 0.839      0.042     19.765      0.000

XI_2     BY
X5                 0.803      0.046     17.472      0.000
X6                 0.762      0.053     14.383      0.000
X7                 0.817      0.048     17.008      0.000
X8                 0.833      0.043     19.548      0.000

XI_2     WITH
XI_1               0.966      0.030     32.453      0.000

X1       WITH
X5                 0.266      0.144      1.854      0.064

X2       WITH
X4                 0.283      0.116      2.426      0.015
X6                 0.344      0.100      3.435      0.001

X3       WITH
X7                 0.180      0.141      1.271      0.204

X4       WITH
X8                 0.148      0.133      1.113      0.266

X6       WITH
X8                 0.319      0.117      2.725      0.006

Intercepts
X1                 2.129      0.204     10.459      0.000
X2                 1.120      0.141      7.924      0.000
X3                 1.906      0.187     10.170      0.000
X4                 1.352      0.156      8.680      0.000
X5                 1.938      0.190     10.201      0.000
X6                 0.879      0.135      6.507      0.000
X7                 1.966      0.190     10.347      0.000
X8                 1.244      0.151      8.259      0.000

Variances
XI_1               1.000      0.000    999.000    999.000
XI_2               1.000      0.000    999.000    999.000

Residual Variances
X1                 0.285      0.074      3.847      0.000
X2                 0.521      0.084      6.191      0.000
X3                 0.419      0.075      5.587      0.000
X4                 0.297      0.071      4.165      0.000
X5                 0.356      0.074      4.822      0.000
X6                 0.419      0.081      5.194      0.000
X7                 0.332      0.078      4.235      0.000
X8                 0.306      0.071      4.301      0.000

STD Standardization

Two-Tailed
Estimate       S.E.  Est./S.E.    P-Value

XI_1     BY
X1                 2.170      0.241      9.020      0.000
X2                 2.631      0.346      7.608      0.000
X3                 2.626      0.304      8.648      0.000
X4                 2.761      0.299      9.234      0.000

XI_2     BY
X5                 2.128      0.240      8.880      0.000
X6                 2.580      0.327      7.887      0.000
X7                 2.575      0.297      8.682      0.000
X8                 2.708      0.294      9.203      0.000

XI_2     WITH
XI_1               0.966      0.030     32.453      0.000

X1       WITH
X5                 0.577      0.371      1.556      0.120

X2       WITH
X4                 1.390      0.685      2.030      0.042
X6                 2.068      0.729      2.838      0.005

X3       WITH
X7                 0.727      0.619      1.175      0.240

X4       WITH
X8                 0.476      0.461      1.032      0.302

X6       WITH
X8                 1.257      0.584      2.151      0.031

Intercepts
X1                 5.465      0.296     18.440      0.000
X2                 4.256      0.439      9.696      0.000
X3                 6.563      0.398     16.504      0.000
X4                 4.453      0.380     11.713      0.000
X5                 5.136      0.306     16.781      0.000
X6                 2.978      0.391      7.616      0.000
X7                 6.196      0.364     17.026      0.000
X8                 4.043      0.375     10.776      0.000

Variances
XI_1               1.000      0.000    999.000    999.000
XI_2               1.000      0.000    999.000    999.000

Residual Variances
X1                 1.879      0.454      4.140      0.000
X2                 7.530      1.338      5.630      0.000
X3                 4.966      0.970      5.121      0.000
X4                 3.214      0.728      4.414      0.000
X5                 2.499      0.518      4.823      0.000
X6                 4.809      0.908      5.296      0.000
X7                 3.302      0.722      4.572      0.000
X8                 3.227      0.719      4.486      0.000

R-SQUARE

Observed                                        Two-Tailed
Variable        Estimate       S.E.  Est./S.E.    P-Value

X1                 0.715      0.074      9.636      0.000
X2                 0.479      0.084      5.692      0.000
X3                 0.581      0.075      7.757      0.000
X4                 0.703      0.071      9.882      0.000
X5                 0.644      0.074      8.736      0.000
X6                 0.581      0.081      7.191      0.000
X7                 0.668      0.078      8.504      0.000
X8                 0.694      0.071      9.774      0.000

QUALITY OF NUMERICAL RESULTS

Condition Number for the Information Matrix              0.254E-02
(ratio of smallest to largest eigenvalue)

DIAGRAM INFORMATION

Use View Diagram under the Diagram menu in the Mplus Editor to view the diagram.
If running Mplus from the Mplus Diagrammer, the diagram opens automatically.

Diagram output
c:\users\jeremy\documents\mplus-data\bollen-cfa.dgm

Beginning Time:  09:54:06
Ending Time:  09:54:06
Elapsed Time:  00:00:00

MUTHEN & MUTHEN
3463 Stoner Ave.
Los Angeles, CA  90066

Tel: (310) 391-9971
Fax: (310) 391-8971
Web: www.StatModel.com
Support: Support@StatModel.com

Copyright (c) 1998-2017 Muthen & Muthen


The first part of the output reiterates the code. We then see that INPUT READING TERMINATED NORMALLY. If there were syntax errors, Mplus would alert us at this point, and we would want to go back and check our syntax and data.

The next section describes the model and estimator, followed by a table of descriptive statistics for the observed variables.

Next, the output states THE MODEL ESTIMATION TERMINATED NORMALLY. This is important information. If the model were not identified and/or convergence did not occur after the default maximum number of iterations, Mplus would tell us here. Output that does not say that the estimation terminated normally should not ever be reported.

We are then presented with model fit information. We look for a non-significant $$\chi^2$$ test, a RMSEA less than 0.05, CFI/TLI above 0.90 to 0.95, and SRMR less than 0.08. Consult Hu and Bentler (1999) for fuller details on interpretation.

The next section presents the parameter estimates. The unstandardized results are presented first, followed by the standardized results (we would focus on STDXY or STDY here). Note that the estimates for the loadings are the same for both latent variables, which is what we imposed by labeling the respective parameters in the syntax. Note also that there are estimates corresponding to the error covariances, as we specified in our WITH statements.

To view a path diagram of the model, click on Diagram $$\rightarrow$$ View Diagram in Mplus. This will open a new application that shows the model, such as the following:

The user can toggle between unstandardized parameter estimates (shown) and the different standardizations. In addition, some formatting can be performed to get the image in better shape for publication.

## Citations

Bollen, K.A. (1989). Structural Equations with Latent Variables. New York, NY: Wiley.

Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55.