CFA in Mplus

Jeremy Albright

Posted on
CFA Mplus

This page describes how to set up code in Mplus to fit a confirmatory factor analysis (CFA) model. The model, which consists of two latent variables and eight manifest variables, is described here. Mplus only reads data in text format, see this post for details on how to prepare a data file for Mplus. The data can be accessed from Github. To review, the model to be fit is the following:

We’ll start out with a basic CFA model that does not have any constraints on the parameters nor any correlated errors. The code for such a model would be the following:


TITLE: Bollen's (1989, chapter 7) CFA Example;

DATA: FILE IS sem-bollen.dat;

VARIABLE: NAMES ARE x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11;
          USEVARIABLES ARE x1 x2 x3 x4 x5 x6 x7 x8;

MODEL:
    xi_1 BY x1  
            x2 
            x3 
            x4;

    xi_2 BY x5;  
            x6; 
            x7; 
            x8;

The optional TITLE command labels the model. The title here indicates that we are replicating the model described in chapter 7 of Bollen’s (1989, pg. 235) book. Note that every command must end with a semicolon. Also keep in mind that the number of characters in any row of the input file cannot exceed 80.

The DATA command points to where the data are located. In this example, it is assumed that the data are in the same folder as this input file. If not, fuller pathnames to the data file would need to be used, such as "C:\Users\you\Documents\mplus-files\sem-bollen.dat".

The VARIABLE command lists the variables in the order in which they appear in the data file. The second line specifies the USEVARIABLES, or the variables that will actually be used in the analysis. This line is not necessary if all of the variables in the data file will be used.

The MODEL command describes the model. The syntax for latent variables is to list the name of the latent variable, followed by the word BY, followed by a list of the observed variables. Here we say that we want two latent variables. The first is \(\xi_1\) (Greek letter pronounced “xi”) and is measured with the variables \(x_1-x_4\). The second is \(\xi_2\) and is measured with the variables \(x_5-x_8\).

Note that there are no missing values in this file. If there were missing, we would add a line after the USEVARIABLES statement like the following:

MISSING ARE ALL (-999)

This of course assumes missing values have all been recoded as -999. The choice of numeric value for missing is up to the user who prepares the data.

Mplus will by default use standard maximum likelihood estimation (specifically, Full Information Maximum Likelihood, or FIML, which is robust to data that have values missing at random). The default is also to report the conventional chi-square test and maximum likelihood standard errors. The optional ANALYSIS command can be used to change the estimator for some or all statistics. For example, adding

ANALYSIS: ESTIMATOR = MLM

to the input file will tell Mplus to still use maximum likelihood estimation for model parameters and standard errors but to report the Satorra-Bentler chi-square statistic that is more robust to non-normality in the data. Alternatively,

ANALYSIS: ESTIMATOR = MLR

will use maximum likelihood to estimate the parameters as well as cluster-robust standard errors based on the sandwich estimator. The full list of estimators can be found in the Mplus User’s Guide, see the ANALYSIS COMMAND chapter. ESTIMATOR = ML is the default.

The above syntax for the input file will be sufficient for many CFA models. However, it is also common to impose constraints on a CFA model, such as forcing factor loadings to be equal or allowing errors to covary. Bollen’s model includes both of these. First, because the latent variables represent the same democracy construct measured at two points in time, it makes sense that the respective factor loadings would be the same for each factor. We can specify this constraint as follows:


TITLE: Bollen's (1989, chapter 7) CFA Example;

DATA: FILE IS sem-bollen.dat;

VARIABLE: NAMES ARE x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11;
          USEVARIABLES ARE x1 x2 x3 x4 x5 x6 x7 x8;

MODEL:
    xi_1 BY x1  
            x2 (l2) 
            x3 (l3) 
            x4 (l4);

    x2i2 BY x5  
            x6 (l2) 
            x7 (l3) 
            x8 (l4);

This syntax adds labels after each loading to specify which should be equal. l2, which is short for lambda 2, has been added after x2 and x6. Having the same label forces these loadings to be equal. Likewise, the l3 label will force the x3 and x7 loadings to be equal, and the l4 label will force the x4 and x8 loadings to be equal. We did not need to impose any constraint for x1 and x5. By default, Mplus identifies the model by constraining the first loading for each factor to equal one.

By default, Mplus will assume that all error variances for the observed variables are independent of each other. We can relax this constraint with some additional syntax:


TITLE: Bollen's (1989, chapter 7) CFA Example;

DATA: FILE IS sem-bollen.dat;

VARIABLE: NAMES ARE x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11;
          USEVARIABLES ARE x1 x2 x3 x4 x5 x6 x7 x8;

MODEL:
    xi_1 BY x1  
            x2 (l2) 
            x3 (l3) 
            x4 (l4);

    xi_2 BY x5  
            x6 (l2) 
            x7 (l3) 
            x8 (l4);

    x1 WITH x5;
    x2 WITH x4;
    x2 WITH x6;
    x3 WITH x7;
    x4 WITH x8;
    x6 WITH x8;

The WITH statement specifies which error variances covary. Most of the covariances capture the fact that, having the same measures at two time points, any idiosyncrasies present at the first time point may also be present at the second time point. We also allow the error variances for the second and fourth observed variables at each time point to covary.

Finally, it is common to present standardized estimates rather than the unstandardized parameters. We can get this included in the output by adding one more line:


TITLE: Bollen's (1989, chapter 7) CFA Example;

DATA: FILE IS sem-bollen.dat;

VARIABLE: NAMES ARE x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11;
          USEVARIABLES ARE x1 x2 x3 x4 x5 x6 x7 x8;

MODEL:
    xi_1 BY x1  
            x2 (l2) 
            x3 (l3) 
            x4 (l4);

    xi_2 BY x5  
            x6 (l2) 
            x7 (l3) 
            x8 (l4);

    x1 WITH x5;
    x2 WITH x4;
    x2 WITH x6;
    x3 WITH x7;
    x4 WITH x8;
    x6 WITH x8;
    
OUTPUT: STANDARDIZED;

Requesting STANDARDIZED for the output will produce three types of standardization which appear in the output file as STDYX, STDY, and STD. In most cases, STDYX will be the section of interest, as it standardizes the output to be interpreted in standard deviation units (just like standardized regression coefficients). The other two may be of interest in full structural equation models (SEMs), especially when a categorical covariate is involved. For a CFA with continuous indicators, STDYX and STDY will be equivalent.

With our syntax ready we can now save the file and then click the red Run button in the toolbar to get the estimates. Doing so yields the following:


Mplus VERSION 8
MUTHEN & MUTHEN
06/25/2019   9:54 AM

INPUT INSTRUCTIONS

  TITLE: Bollens (1989, chapter 7) CFA Example;

  DATA: FILE IS sem-bollen.dat;

  VARIABLE: NAMES ARE x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11;
            USEVARIABLES ARE x1 x2 x3 x4 x5 x6 x7 x8;

  MODEL:
      xi_1 BY x1
              x2 (l2)
              x3 (l3)
              x4 (l4);

      xi_2 BY x5
              x6 (l2)
              x7 (l3)
              x8 (l4);

      x1 WITH x5;
      x2 WITH x4;
      x2 WITH x6;
      x3 WITH x7;
      x4 WITH x8;
      x6 WITH x8;

  OUTPUT: STANDARDIZED;


INPUT READING TERMINATED NORMALLY



Bollens (1989, chapter 7) CFA Example;

SUMMARY OF ANALYSIS

Number of groups                                                 1
Number of observations                                          75

Number of dependent variables                                    8
Number of independent variables                                  0
Number of continuous latent variables                            2

Observed dependent variables

  Continuous
   X1          X2          X3          X4          X5          X6
   X7          X8

Continuous latent variables
   XI_1        XI_2


Estimator                                                       ML
Information matrix                                        OBSERVED
Maximum number of iterations                                  1000
Convergence criterion                                    0.500D-04
Maximum number of steepest descent iterations                   20

Input data file(s)
  sem-bollen.dat

Input data format  FREE



UNIVARIATE SAMPLE STATISTICS


     UNIVARIATE HIGHER-ORDER MOMENT DESCRIPTIVE STATISTICS

         Variable/         Mean/     Skewness/   Minimum/ % with                Percentiles
        Sample Size      Variance    Kurtosis    Maximum  Min/Max      20%/60%    40%/80%    Median

     X1                    5.465      -0.093       1.250   10.67%       2.500      5.000      5.400
              75.000       6.787      -1.104      10.000    6.67%       6.900      7.500
     X2                    4.256       0.325       0.000   34.67%       0.000      3.333      3.333
              75.000      15.372      -1.426      10.000   21.33%       4.800     10.000
     X3                    6.563      -0.606       0.000   10.67%       3.333      6.667      6.667
              75.000      10.621      -0.657      10.000    1.33%       6.667     10.000
     X4                    4.453       0.120       0.000   22.67%       0.000      3.333      3.333
              75.000      11.069      -1.164      10.000   10.67%       6.667      6.667
     X5                    5.136      -0.233       0.000    6.67%       2.500      5.000      5.000
              75.000       6.735      -0.718      10.000    2.67%       6.250      7.500
     X6                    2.978       0.911       0.000   40.00%       0.000      0.000      2.233
              75.000      11.224      -0.400      10.000   10.67%       3.333      6.667
     X7                    6.196      -0.565       0.000   13.33%       3.333      6.667      6.667
              75.000      10.655      -0.672      10.000   26.67%       6.667     10.000
     X8                    4.043       0.455       0.000   16.00%       0.368      3.333      3.333
              75.000      10.393      -0.906      10.000   12.00%       3.333      6.667


THE MODEL ESTIMATION TERMINATED NORMALLY



MODEL FIT INFORMATION

Number of Free Parameters                       28

Loglikelihood

          H0 Value                       -1320.232
          H1 Value                       -1312.572

Information Criteria

          Akaike (AIC)                    2696.464
          Bayesian (BIC)                  2761.354
          Sample-Size Adjusted BIC        2673.105
            (n* = (n + 2) / 24)

Chi-Square Test of Model Fit

          Value                             15.320
          Degrees of Freedom                    16
          P-Value                           0.5013

RMSEA (Root Mean Square Error Of Approximation)

          Estimate                           0.000
          90 Percent C.I.                    0.000  0.103
          Probability RMSEA <= .05           0.683

CFI/TLI

          CFI                                1.000
          TLI                                1.003

Chi-Square Test of Model Fit for the Baseline Model

          Value                            461.111
          Degrees of Freedom                    28
          P-Value                           0.0000

SRMR (Standardized Root Mean Square Residual)

          Value                              0.046



MODEL RESULTS

                                                    Two-Tailed
                    Estimate       S.E.  Est./S.E.    P-Value

 XI_1     BY
    X1                 1.000      0.000    999.000    999.000
    X2                 1.213      0.146      8.309      0.000
    X3                 1.210      0.124      9.748      0.000
    X4                 1.273      0.127     10.038      0.000

 XI_2     BY
    X5                 1.000      0.000    999.000    999.000
    X6                 1.213      0.146      8.309      0.000
    X7                 1.210      0.124      9.748      0.000
    X8                 1.273      0.127     10.038      0.000

 XI_2     WITH
    XI_1               4.461      0.972      4.591      0.000

 X1       WITH
    X5                 0.577      0.371      1.556      0.120

 X2       WITH
    X4                 1.390      0.685      2.030      0.042
    X6                 2.068      0.729      2.838      0.005

 X3       WITH
    X7                 0.727      0.619      1.175      0.240

 X4       WITH
    X8                 0.476      0.461      1.032      0.302

 X6       WITH
    X8                 1.257      0.584      2.151      0.031

 Intercepts
    X1                 5.465      0.296     18.440      0.000
    X2                 4.256      0.439      9.696      0.000
    X3                 6.563      0.398     16.504      0.000
    X4                 4.453      0.380     11.713      0.000
    X5                 5.136      0.306     16.781      0.000
    X6                 2.978      0.391      7.616      0.000
    X7                 6.196      0.364     17.026      0.000
    X8                 4.043      0.375     10.776      0.000

 Variances
    XI_1               4.708      1.044      4.510      0.000
    XI_2               4.528      1.020      4.440      0.000

 Residual Variances
    X1                 1.879      0.454      4.140      0.000
    X2                 7.530      1.338      5.630      0.000
    X3                 4.966      0.970      5.121      0.000
    X4                 3.214      0.728      4.414      0.000
    X5                 2.499      0.518      4.823      0.000
    X6                 4.809      0.908      5.296      0.000
    X7                 3.302      0.722      4.572      0.000
    X8                 3.227      0.719      4.486      0.000


STANDARDIZED MODEL RESULTS


STDYX Standardization

                                                    Two-Tailed
                    Estimate       S.E.  Est./S.E.    P-Value

 XI_1     BY
    X1                 0.845      0.044     19.272      0.000
    X2                 0.692      0.061     11.383      0.000
    X3                 0.762      0.049     15.515      0.000
    X4                 0.839      0.042     19.765      0.000

 XI_2     BY
    X5                 0.803      0.046     17.472      0.000
    X6                 0.762      0.053     14.383      0.000
    X7                 0.817      0.048     17.008      0.000
    X8                 0.833      0.043     19.548      0.000

 XI_2     WITH
    XI_1               0.966      0.030     32.453      0.000

 X1       WITH
    X5                 0.266      0.144      1.854      0.064

 X2       WITH
    X4                 0.283      0.116      2.426      0.015
    X6                 0.344      0.100      3.435      0.001

 X3       WITH
    X7                 0.180      0.141      1.271      0.204

 X4       WITH
    X8                 0.148      0.133      1.113      0.266

 X6       WITH
    X8                 0.319      0.117      2.725      0.006

 Intercepts
    X1                 2.129      0.204     10.459      0.000
    X2                 1.120      0.141      7.924      0.000
    X3                 1.906      0.187     10.170      0.000
    X4                 1.352      0.156      8.680      0.000
    X5                 1.938      0.190     10.201      0.000
    X6                 0.879      0.135      6.507      0.000
    X7                 1.966      0.190     10.347      0.000
    X8                 1.244      0.151      8.259      0.000

 Variances
    XI_1               1.000      0.000    999.000    999.000
    XI_2               1.000      0.000    999.000    999.000

 Residual Variances
    X1                 0.285      0.074      3.847      0.000
    X2                 0.521      0.084      6.191      0.000
    X3                 0.419      0.075      5.587      0.000
    X4                 0.297      0.071      4.165      0.000
    X5                 0.356      0.074      4.822      0.000
    X6                 0.419      0.081      5.194      0.000
    X7                 0.332      0.078      4.235      0.000
    X8                 0.306      0.071      4.301      0.000


STDY Standardization

                                                    Two-Tailed
                    Estimate       S.E.  Est./S.E.    P-Value

 XI_1     BY
    X1                 0.845      0.044     19.272      0.000
    X2                 0.692      0.061     11.383      0.000
    X3                 0.762      0.049     15.515      0.000
    X4                 0.839      0.042     19.765      0.000

 XI_2     BY
    X5                 0.803      0.046     17.472      0.000
    X6                 0.762      0.053     14.383      0.000
    X7                 0.817      0.048     17.008      0.000
    X8                 0.833      0.043     19.548      0.000

 XI_2     WITH
    XI_1               0.966      0.030     32.453      0.000

 X1       WITH
    X5                 0.266      0.144      1.854      0.064

 X2       WITH
    X4                 0.283      0.116      2.426      0.015
    X6                 0.344      0.100      3.435      0.001

 X3       WITH
    X7                 0.180      0.141      1.271      0.204

 X4       WITH
    X8                 0.148      0.133      1.113      0.266

 X6       WITH
    X8                 0.319      0.117      2.725      0.006

 Intercepts
    X1                 2.129      0.204     10.459      0.000
    X2                 1.120      0.141      7.924      0.000
    X3                 1.906      0.187     10.170      0.000
    X4                 1.352      0.156      8.680      0.000
    X5                 1.938      0.190     10.201      0.000
    X6                 0.879      0.135      6.507      0.000
    X7                 1.966      0.190     10.347      0.000
    X8                 1.244      0.151      8.259      0.000

 Variances
    XI_1               1.000      0.000    999.000    999.000
    XI_2               1.000      0.000    999.000    999.000

 Residual Variances
    X1                 0.285      0.074      3.847      0.000
    X2                 0.521      0.084      6.191      0.000
    X3                 0.419      0.075      5.587      0.000
    X4                 0.297      0.071      4.165      0.000
    X5                 0.356      0.074      4.822      0.000
    X6                 0.419      0.081      5.194      0.000
    X7                 0.332      0.078      4.235      0.000
    X8                 0.306      0.071      4.301      0.000


STD Standardization

                                                    Two-Tailed
                    Estimate       S.E.  Est./S.E.    P-Value

 XI_1     BY
    X1                 2.170      0.241      9.020      0.000
    X2                 2.631      0.346      7.608      0.000
    X3                 2.626      0.304      8.648      0.000
    X4                 2.761      0.299      9.234      0.000

 XI_2     BY
    X5                 2.128      0.240      8.880      0.000
    X6                 2.580      0.327      7.887      0.000
    X7                 2.575      0.297      8.682      0.000
    X8                 2.708      0.294      9.203      0.000

 XI_2     WITH
    XI_1               0.966      0.030     32.453      0.000

 X1       WITH
    X5                 0.577      0.371      1.556      0.120

 X2       WITH
    X4                 1.390      0.685      2.030      0.042
    X6                 2.068      0.729      2.838      0.005

 X3       WITH
    X7                 0.727      0.619      1.175      0.240

 X4       WITH
    X8                 0.476      0.461      1.032      0.302

 X6       WITH
    X8                 1.257      0.584      2.151      0.031

 Intercepts
    X1                 5.465      0.296     18.440      0.000
    X2                 4.256      0.439      9.696      0.000
    X3                 6.563      0.398     16.504      0.000
    X4                 4.453      0.380     11.713      0.000
    X5                 5.136      0.306     16.781      0.000
    X6                 2.978      0.391      7.616      0.000
    X7                 6.196      0.364     17.026      0.000
    X8                 4.043      0.375     10.776      0.000

 Variances
    XI_1               1.000      0.000    999.000    999.000
    XI_2               1.000      0.000    999.000    999.000

 Residual Variances
    X1                 1.879      0.454      4.140      0.000
    X2                 7.530      1.338      5.630      0.000
    X3                 4.966      0.970      5.121      0.000
    X4                 3.214      0.728      4.414      0.000
    X5                 2.499      0.518      4.823      0.000
    X6                 4.809      0.908      5.296      0.000
    X7                 3.302      0.722      4.572      0.000
    X8                 3.227      0.719      4.486      0.000


R-SQUARE

    Observed                                        Two-Tailed
    Variable        Estimate       S.E.  Est./S.E.    P-Value

    X1                 0.715      0.074      9.636      0.000
    X2                 0.479      0.084      5.692      0.000
    X3                 0.581      0.075      7.757      0.000
    X4                 0.703      0.071      9.882      0.000
    X5                 0.644      0.074      8.736      0.000
    X6                 0.581      0.081      7.191      0.000
    X7                 0.668      0.078      8.504      0.000
    X8                 0.694      0.071      9.774      0.000


QUALITY OF NUMERICAL RESULTS

     Condition Number for the Information Matrix              0.254E-02
       (ratio of smallest to largest eigenvalue)


DIAGRAM INFORMATION

  Use View Diagram under the Diagram menu in the Mplus Editor to view the diagram.
  If running Mplus from the Mplus Diagrammer, the diagram opens automatically.

  Diagram output
    c:\users\jeremy\documents\mplus-data\bollen-cfa.dgm

     Beginning Time:  09:54:06
        Ending Time:  09:54:06
       Elapsed Time:  00:00:00



MUTHEN & MUTHEN
3463 Stoner Ave.
Los Angeles, CA  90066

Tel: (310) 391-9971
Fax: (310) 391-8971
Web: www.StatModel.com
Support: Support@StatModel.com

Copyright (c) 1998-2017 Muthen & Muthen

The first part of the output reiterates the code. We then see that INPUT READING TERMINATED NORMALLY. If there were syntax errors, Mplus would alert us at this point, and we would want to go back and check our syntax and data.

The next section describes the model and estimator, followed by a table of descriptive statistics for the observed variables.

Next, the output states THE MODEL ESTIMATION TERMINATED NORMALLY. This is important information. If the model were not identified and/or convergence did not occur after the default maximum number of iterations, Mplus would tell us here. Output that does not say that the estimation terminated normally should not ever be reported.

We are then presented with model fit information. We look for a non-significant \(\chi^2\) test, a RMSEA less than 0.05, CFI/TLI above 0.90 to 0.95, and SRMR less than 0.08. Consult Hu and Bentler (1999) for fuller details on interpretation.

The next section presents the parameter estimates. The unstandardized results are presented first, followed by the standardized results (we would focus on STDXY or STDY here). Note that the estimates for the loadings are the same for both latent variables, which is what we imposed by labeling the respective parameters in the syntax. Note also that there are estimates corresponding to the error covariances, as we specified in our WITH statements.

To view a path diagram of the model, click on Diagram \(\rightarrow\) View Diagram in Mplus. This will open a new application that shows the model, such as the following:

The user can toggle between unstandardized parameter estimates (shown) and the different standardizations. In addition, some formatting can be performed to get the image in better shape for publication.

Citations

Bollen, K.A. (1989). Structural Equations with Latent Variables. New York, NY: Wiley.

Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55.