ole.gif


 










                                                                                                                                              



























                                                        Michael C. Neale




Mx:

Statistical Modeling







by


Michael C. Neale

Department of Psychiatry, Virginia Institute for Psychiatric and Behavioral Genetics,

Virginia Commonwealth University, Richmond, Virginia, U.S.A.


Steven M. Boker

Department of Psychology,

University of Notre Dame, Notre Dame, Indiana, U.S.A.


Gary Xie

Department of Psychiatry, Virginia Institute for Psychiatric and Behavioral Genetics,

Virginia Commonwealth University, Richmond, Virginia, U.S.A.


Hermine H. Maes

Department of Human Genetics, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, Virginia, U.S.A.












Virginia Institute for Psychiatric and Behavioral Genetics

Virginia Commonwealth University

Department of Psychiatry












First Published 1991

Second Edition 1994

Third Edition 1995

Fourth Edition 1997

Fifth Edition 1999

Sixth Edition 2003

last revised on April 15, 2004

























Refer to Mx manual as:


Neale MC, Boker SM, Xie G, Maes HH (2003). Mx: Statistical Modeling. VCU Box 900126, Richmond, VA 23298: Department of Psychiatry. 6th Edition.




All rights reserved

© 2003 Michael C. Neale


Table of Contents


 

List of Tables

List of Figures

Preface

 

1Introduction to
Structural Equation Modeling

1.1Guidelines for good Script Style

1.2Matrix Algebra

1.3Structural Equation Modeling

RAM Approach

Simplified Mx Approach

Fully Multivariate Approach

1.4Other Types of Statistical Modeling

 

2Introduction to the
Mx Graphical User Interface

2.1Using Mx GUI

2.2Fitting a Simple Model

Preparing the Data

Editing Dat Files

Drawing the Diagram

Fitting the Model

Viewing Results

Saving Diagrams

2.3Revising a Model

Adding a Causal Path

Adding a Covariance Path

Changing Path Attributes

Fixing a Parameter

Confidence Intervals

Equating Paths

Moving Variables and Paths

2.4Extending the Model

Multiple Groups: Using Cut and Paste

Selecting Different Variables for Analysis

Modeling Means

2.5Output Options

Zooming in and out

Copying Matrices to the Clipboard

Comparing Models

Setting Job Options

Printing

Exporting Diagrams to other Applications

Files and Filename Extensions

2.6Running Jobs

Running Scripts

Editing Mx Header Files

Using Networked Unix Workstations

2.7Advanced Features

Adding Non-linear Constraints to Diagrams

Moderator Variables: Observed Variables as Paths

 

3Outline of Mx Scripts and Data Input

3.1Preparing Input Scripts

Comments, Commands and Numeric Input

Syntax Conventions

Job Structure

Single Group Example

3.2Group Types

#NGroups

Title Line

Group-type Line

3.3Commands for Reading Data

Covariance and Correlation Matrices

Asymptotic Variances and Covariances

Variable Length, Rectangular and Ordinal Files

Contingency Tables

Means

Higher Moment Matrices

3.4Label and Select Variables

Labeling Input Variables

Select Variables

Select If

3.5Calculation and Constraint Groups

Calculation Groups

Constraint Groups

3.6Commands for Declaring Variable Options

Missing Command

Highest Command

Definition Variables

3.7Advanced Commands for Script Writing

#Define Command

#If, #Elseif, #Else and #Endif Commands

#Repeat Command

#Include Command

System Command

Matrices Declaration

Matrix Algebra

 

4Building Models with Matrices

4.1Commands for Declaring Matrices

Matrices Command

Matrix Types

Equating Matrices across Groups

Free Keyword

4.2Building Matrix Formulae

Matrix Operations

Matrix Functions

4.3Using Matrix Formulae

Covariances, Compute Command

Means Command

Threshold Command

Weight command

Frequency Command

4.4Putting Numbers in Matrices

Matrix Command

Start and Value Commands

4.5Putting Parameters in Matrices

Pattern Command

Fix and Free Commands

Equate Command

Specification Command

Boundary Command

4.6Label Matrices and Select Variables

Labeling Matrices

Identification Codes

 

5Options for Fit Functions andOutput

5.1Options and End Commands

5.2Fit Functions: Defaults and Alternatives

Standard Fit Functions

Maximum Likelihood Analysis of Raw Continuous Data

Maximum Likelihood Analysis of Raw Ordinal Data

Contingency Table Analysis

User-defined Fit Functions

5.3Statistical Output and Optimization Options

Standard goodness-of-fit output

RMSEA

Suppressing Output

Appearance

Residuals

Adjusting Degrees of Freedom

Power Calculations

Confidence Intervals on Parameter Estimates

Standard Errors

Bootstrap estimates

Randomizing Starting Values

Automatic Cold Restart

Jiggling Parameter Starting Values.

Confidence Intervals on Fit Statistics

Comparative Fit Indices

Computing Likelihood-Ratio Statistics of Submodels

Check Identification of Model

Changing Optimization Parameters

Setting Optimization Parameters

5.4Fitting Submodels: Saving Matrices and Files

Fitting Submodels using Multiple Fit Option

Dropping Parameters from Model

Reading and Writing Binary Files

Writing Matrices to Files

Formatting and Appending Matrices Written to Files

Writing Individual Likelihood Statistics to Files

Creating RAMpath Graphics Files

 

6Example Scripts

6.1Using Calculation Groups

General Matrix Algebra

Assortative Mating ‘D’ Matrix

Pearson-Aitken Selection Formula

6.2Model Fitting with Genetically Informative Data

ACE Genetic Model for Twin Data

Power Calculation for the Classical Twin Study

RAM Specification of Model for Twin Data

Cholesky Decomposition for Multivariate Twin Data

PACE Model: Reciprocal Interaction between Twins

Scalar, Non-scalar Sex Limitation and Age Regression

Multivariate Assortative Mating: Modeling D

6.3Fitting Models with Non-linear Constraints

Principal Components

Analysis of Correlation Matrices

Fitting a PACE Model to Contingency Table Data

Twins and Parents: Cultural and Genetic Transmission

6.4Fitting Models to Raw Data

Estimating Means and Covariances

Variable Pedigree Sizes

Definition Variables

Using NModel to Assess Heterogeneity

Using #if and #repeat Commands

6.5User-Defined Fit Functions

Least Squares

Correction for Ascertainment

6.6Using Mx Header and Template Files

Factor Models for Twin Data

Alternative Genetic Models for Twin Data

 

A Using Mx under different operating systems

A.1Obtaining Mx

A.2System Requirements

A.3Installing the Mx GUI

A.4Using Mx

 

BError Messages

B.1General Input Reading

B.2Error Codes

 

CIntroduction to Matrices

C.1The Concept of a Matrix

C.2Matrix Algebra

Transposition

Matrix Addition and Subtraction

Matrix Multiplication

C.3Equations in Matrix Algebra

C.4Calculation of Covariance Matrix from Data Matrix

Transformations of Data Matrices

Determinant of a Matrix

Inverse of a Matrix

 

DReciprocal Causation

 

References

 

Index





List of Tables


 

2.1        Correspondence between optimization codes and IFAIL parameters

2.2        Summary of filename extensions used by Mx

3.2Parameters of the group-type line

4.1Matrix types

4.2Syntax for constraining matrices to special quantities

4.3Examples of use of the Matrices command

4.4Matrix operators

4.5Matrix functions

5.1Default fit functions

6.1        Summary of parameter estimates for a variety of models of heterogeneity






List of Figures


 

1.1Example path diagram

1.2Multivariate path diagram

2.1        Mx GUI with Project Manager Window

2.2        The Results Panel

2.3        The Results Box Panel

2.4        Mx Path Inspector

2.5        Starting values for an ACE twin model for MZ twins

2.6        Parameter estimates from fitting the ACE model

2.7The Job Option Panel

2.8        The Host Options Panel

2.9        Higher order factor model

2.10      Linear regression with interaction model

2.11      Linear moderated regression with interaction model

3.1Factor model for two variables

5.2Contour plot showing a bivariate normal distribution

6.1ACE genetic model for twin data

6.2Cholesky or triangular factor model

6.3PACE model for phenotypic interaction

6.4Model for sex limitation and age regression

6.5        Three factor model of cognitive ability tests

6.6Model of mixed genetic and cultural transmission

6.7Definition variable example

C.1Graphical representation of the inner product

C.2Geometric representation of the determinant

D.1Feedback loop between two variables

D.2Structural equation model for x variables

D.3Structural equation model for y variables





Preface


What Mx does


Mx is a structural equation modeling package, but it is flexible enough to fit a variety of other mathematical models. At its heart is a matrix algebra processor, which can be used by itself. There are many built-in fit functions to enable structural equation modeling and other experiments in matrix algebra and statistical modeling. It offers the fitting functions found in commercial software such as LISREL, LISCOMP, EQS, SEPATH, AMOS and CALIS, including facilities for maximum likelihood estimation of parameters from missing data structures, under normal theory. Complex ‘nonstandard’ models are easy to specify. For further general applicability, it allows the user to define their own fit functions, and optimization may be performed subject to linear and nonlinear equality or boundary constraints.


How to Read this Manual


The bad news is that this manual is quite long; the good news is that you don't need to read it all! Chapter 1 contains an introduction to multivariate path modeling. The Mx graphical interface, Mx GUI, is introduced in Chapter 2, which will relieve the user of getting to grips with the details of scripts. Even with this graphical interface, knowledge of the script language - described in Chapters 3 through 5 - is necessary to use advanced features and methods. The "how to" part of the manual starts in Chapter 3, in which general syntax conventions and job structure are laid out, followed by description of the commands necessary to read data. Chapter 4 deals with the heart of Mx: how to define matrices and matrix algebra formulae for model fitting, and ways of estimating and constraining parameters. Methods of changing the default fit function, of decreasing and increasing the quantity (and quality) of the output, and for fitting submodels efficiently, are described in Chapter 5. The last chapter supplies and briefly describes a number of example scripts. The Appendices describe the use of Mx under different operating systems, error codes, introductory matrix algebra and reciprocal causation.


Origin


The development of Mx owes much to LISREL and I acknowledge the pioneering effort put in by Karl Jöreskog & Dag Sörbom. There are many who have supported and encouraged this effort in many different ways. I thank them all, and especially Lindon Eaves, Ken Kendler and John Hewitt since they also provided grant support Footnote , and David Fulker for allowing modification of his notes on matrix algebra to be supplied as an appendix to this manual. Jack McArdle and Steve Boker provided excellent path diagram drawing software (RAMPath) which was the basis for the development of Mx GUI, Luther Atkinson suggested the binary file save option; Buz Brown programmed the Rectangular file read, Karen Kenny and John Fritz organized the interactive website; these efforts were part of the excellent software, hardware and consultancy support supplied by University Computing Services at the Medical Campus of Virginia Commonwealth University. The Mx team includes my colleagues Drs. Steve Boker, Hermine Maes, Mr. Gary Xie and Wayne Hadady.


New Features


Several features have been added to the Mx graphical user interface. First, in an Mx path diagram it is possible to subscript and superscript labels for both latent and observed variables. The caret character ^ denotes a superscript and an underscore _ denotes a subscript. This change is cosmetic only. Note that as ever, it is possible to copy diagrams to other programs such as MS Word and to edit the diagrams in these other packages. Second, changes to the path inspector allow for a larger variety of options when it comes to changing aspects of paths. These options allow for multiple paths within the same diagram or across diagrams to be changed simultaneously. Support for reading and editing 'header' files (see below) has been added. There is also a new Data Edit function to create or edit .dat files.


Bootstrapping options have been added to the normal theory maximum likelihood analysis of raw data. It becomes possible to repeat analyses multiple times to obtain empirical estimates of parameters and of the goodness of fit statistics associated with them. It is also possible to reuse the same bootstrap samples so that, e.g., likelihood ratio fit statistics may be compared when fitting submodels.


Some features to enable manual implementation of more efficient marginal maximum likelihood methods for ordinal data have been developed. Although largely invisible to the user except for the run time, multidimensional numerical integration of covariance structures that can be partitioned into distinct blocks of covariance that do not covary with each other has been significantly enhanced. Numerical integration is done separately over the blocks and the product of the sub-integrals computed to estimate the full integral. Further extensions to this approach are planned.


Features for selective compilation of parts of Mx script have been added. There are now #if, #else, #elseif and #endif metacommands that allow the advanced user to construct scripts that are easy for the novice or intermediate user to use. Scripts can be split into 'header' and 'template' parts, where the template part is simply #include'd at the end of the header to form a complete script. There is also support for the use of these headers in the Mx GUI. Similarly, there is a #repeat function to allow either repeated running of the same script or to generate a large number of similar groups within a script.


Line length limit for reading rectangular/ordinal files has been raised from 2000 to 20,000 at some cost of efficiency when reading data. A common problem with ordinal data is error 61, indicating that the number of elements in the threshold matrix is incorrect. The debugging information on this type of problem has been improved, listing all the relevant information about the required size of this matrix.


Three new matrix functions have been added: \rprod \cprod and \incrow. The first two functions compute the products of elements in a matrix, row-wise or column-wise. The function \incrow forces element i+1,j to be greater than element i,j by a constant amount that is user-configurable with option rinc. This rather unusual matrix function is useful for certain ordinal data threshold problems.


Internet Support


Mx is public domain; it is available from the internet at http://www.vcu.edu/mx/. With a suitable browser, you can obtain the program, documentation and examples, send comments, see the latest version available for your platform, and so on. E-mail bug reports, requests for further information, and most important your comments and suggestions for improvements to neale@hsc.vcu.edu - it is hard to overemphasize the importance of constructive criticism.


Technical Support


A number of users have been most helpful finding errors in the documentation or software or both, and for suggesting new features that would make Mx easier to use. Thank you! I hope that all users will forward any comments, bug reports, or wish-lists to me. My current address is:


address             Department of Psychiatry

                          Virginia Institute for Psychiatric and Behavioral Genetics

                          Box 126 MCV

                          Richmond VA 23298-0126, USA

phone                804 828 3369

fax                     804 828 1471

E-mail               neale@hsc.vcu.edu (internet)

 

and my order of preference for communication is E-mail, fax, phone and snail mail. When reporting problems, E-mail is especially useful to include the problem file.


To find

Go to

Matrix Algebra

Learn basic Syntax

Appendix C

SEM Path Analysis

Neale & Cardon (1992) chapter 5; Loehlin (1987); McArdle & Boker (1990); Everitt (1984)

How to do basic SEM

Chapter 1

How to recast basic SEM

more efficiently

Chapter 1

How to use the Mx GUI

Chapter 2

Job Structure

Chapter 3

Reading Data

Chapter 3

Declare Matrices

Use Matrix Formulae

Chapter 4

Use different Fit Functions,

Write Output to Files

Change Options

Chapter 5

Look through Example Scripts

Chapter 6

Quick Check of Syntax

Index

Quick Reference Guide

Operating Systems

Appendix A




Icons

Meaning

Caution

Note

Efficiency tip



1Introduction to

Structural Equation Modeling


What you will find in this chapter

 

           Guidelines for building your own scripts

           A brief introduction to the capabilities of Mx.

           Three different ways to implement a structural equation model in Mx


1.1Guidelines for good Script Style


Programming, like much of life, requires compromises. We must balance the time taken to do things against their value. Now, there are both short-term considerations (“how do I get this working as soon as possible?”) and long-term ones (“how can I save time in what I’m going to be doing next week?”). This usually results in making a choice of method that is based on the following factors:

 

           Time taken to get the script working properly

           Clarity, which can affect time to debug and modify

           Efficiency of the script - how fast it runs

           Flexibility - how easy it is to alter.


Normally, we would choose a method that will solve our problem in the shortest time. If we expect to use the same basic model but with a varying number of observed and latent variables, then it is worth spending the extra time to write a script in which these changes can be made easily.


Part of writing good scripts is to write them so that you, or colleagues can understand them. Sometimes readability can be at the expense of efficiency, and it is up to you to decide on the balance between the two. One of the most important things to remember is to put plenty of comments in your scripts. Doing so can seem like a waste of time, but it usually pays off handsomely when the scripts are read by yourself or others at a later date.


1.2Matrix Algebra


Mx will evaluate matrix algebra expressions. It has a simple language, which uses single letters to represent matrices, certain characters to represent matrix operations, and a special syntax to invoke matrix functions. Thus the program can be used as a matrix algebra calculator, which is helpful in a variety of research and educational settings, and which provides a powerful way to specify structural equation and other mathematical models. Most users of multivariate statistics need to know some matrix algebra, and Appendix C gives a brief introduction to the subject, along with examples and exercises which use Mx. Even those familiar with matrix algebra should review the “How to do it in Mx” sections in the appendix as that is where elementary principles of writing Mx scripts are introduced.


1.3Structural Equation Modeling


One of the most common uses of Mx is to fit Structural Equation Models (SEM) to data. A nice aspect of SEM is that the models can be represented as a path diagram. Mx GUI incorporates path diagram drawing software directly and is described in Chapter 2. We concentrate on translating path diagrams into models ‘by hand’. This approach has the advantage of giving greater understanding of the modeling process, and can yield highly efficient scripts which are easy to change when, for example, the number of variables changes.


There are many accounts of SEM, which vary widely in complexity and clarity, and which are aimed at different fields of study or different software packages (Jöreskog, K.G. & Sörbom, 1991; Bentler, 1989; Everitt, 1984; Loehlin, 1987; McArdle & Boker 1990; Bollen 1992; Neale & Cardon 1992; Steiger, 1994). The brief account given here is intended to provide a practical guide to setting up models in Mx for those with some familiarity with path analysis or SEM. We begin with a simple, foolproof method, called RAM (McArdle & Boker 1990) which would be ideal except that it is inefficient for the computer to fit. More efficient approaches will follow.


RAM Approach


A path diagram consists of four basic types of object: circles, squares, one-headed and two-headed arrows. Circles are used to represent latent (not measured) variables Footnote , and squares correspond to the observed (or measured) variables. In a path diagram, two types of relationship between variables are possible: causal and correlational. Causal relationships are shown with a one-headed arrow going from the variable that is doing the causing to the variable being caused. Correlational or covariance relationships are shown with two headed arrows. A special type of covariance path is one that goes from the variable to itself. Variation in a variable which is not due to causal effects of other variables in the diagram is represented by this self-correlational path. Sometimes this is called ‘residual variance’ or ‘error variance’.


Figure 1.1 shows a sample path diagram with two latent variables and four observed. The RAM model specification involves three matrices: F, A and S. S is for the symmetric paths, or two-headed arrows, and is symmetric. A is for the asymmetric paths, or one-headed arrows, and F is for filtering the observed variables out of the whole set. The dimensions of these matrices are fixed by the number of variables in the model. A and S are both m×m, and F is mO×m, where m=mO+mL is the total number of variables in the model, the number of observed variables, and the number of latent variables. In our example we have mO=4, mL=2 and m=6.


ole1.gif

Figure 1.1         Example path diagram with two latent variables (P and Q) and four observed variables (R, S, T, U)


and Note how F is an elementary matrix of 1's and 0's with a 1 wherever the row variable is the same as the column variable.


Now that we have defined these matrices, computing the predicted covariance matrix under this model is relatively simple. The formula is:


which is easy to program in Mx and is quite general. So, suppose that we have measured R, S, T and U on a sample of 100 subjects, and computed the covariance matrix. How would we fit the model in Figure 1.1 to these data, using the above formula? A sample script might look like this:


!

! Simple RAM approach to fitting models

!

#NGroups 1

#define latent 2 ! Number of latent variables

#define meas 4 ! Number of measured variables

#define m 6 ! Total number of variables, measured + latent


Title Ram approach to fitting models ! Title

 Data NInput=meas NObserved=100 ! Number of variables,subjects

 CMatrix File=ramfit.cov ! Reads observed covariance matrix


 Begin Matrices; ! Declares matrices

  A Full m m ! One-headed paths

  S Symm m m ! Two-headed paths

  F Full meas m ! Filter matrix

  I Iden m m ! Identity matrix

 End Matrices; ! End of matrix declarations


 Specify A ! Set certain elements of A as free parameters

  0 0 0 0 0 0

  0 0 0 0 0 0

  1 0 0 0 0 0

  2 0 0 0 0 0

  0 3 0 0 0 0

  0 4 0 0 0 0

 Specify S ! Set the free parameters in S

  0

  5 0

  0 0 6

  0 0 0 7

  0 0 0 0 8

  0 0 0 0 0 9

 Value 1.0 S 1 1 S 2 2 ! Put 1's into certain elements of S

 Matrix F ! Do the same for Matrix F but a different way

  0 0 1 0 0 0 ! Note - this could be omitted if F had

  0 0 0 1 0 0 ! been declared ZI instead of full.

  0 0 0 0 1 0

  0 0 0 0 0 1

 Start .5 All ! Supply .5 starting value for all parameters


 Covariance F & ((I-A)~ & S); ! Formula for model

 Options Rsiduals ! Print observed, expected and residual matrices

End group


This script is organized into seven sections: (i) defines, (ii) title and data reading, (iii) declaring matrices, (iv) putting parameters into matrices, (v) putting numbers into matrices (vi) the formula for the model, and (vii) options. More detail on all these components can be found in the body of the manual, but let’s look at some of the basic features.

 

           Anything after ! is interpreted as a comment. Blank lines are inactive but serve to visually separate the sections of the script.

 

           #NGroups indicates the number of groups in the script.

 

           The #define statement is used to preassign numbers to certain strings of letters. After a command like #define latent 2, Mx will interpret ‘latent’ as 2 whenever it is trying to read a number.

 

           The Title line is required.

 

           The group type line is required. For Data groups, this line supplies essential information about the number of variables to be analyzed (NInput_vars) and the number of subjects measured (NObservations).

 

           The CMatrix statement reads in the observed covariance matrix from a file, in lower triangular format. The file ramfit.cov might look like this:

      *

       1.51

       .31 1.17

       .22 .19 1.46

       .11 .23 .34 1.56

             where the * indicates free format.

 

           The Begin Matrices; line is required and starts the declaration of matrices that will be used in the covariance statement. We make use of the #define’d words to get them the right size. This section ends with the End Matrices; line.

 

           Specify puts free parameters into matrices. All the usable elements of the matrix are listed (i.e. only the lower triangle for symmetric matrices, or only the diagonal elements for diagonal matrices). A zero indicates that the element is fixed, and a positive integer indicates that it’s free. Different positive integers represent different free parameters; if we wished to have parameters 1 and 2 set equal, we would replace the 2 with a 1.

 

           The fixed values of 1 for the variances of the latent variables are given with a Value statement.

 

           Start .5 all sets all the free parameters to .5 as an initial guess of the parameter estimates.



 

           The Covariance statement supplies the formula for the model. We have used the & operator for quadratic matrix multiplication (A&B = A*B*Aʹ), to make the script more efficient. It would work equally well, and only slightly more slowly with the full expression F*(I-A)~*S*(I-A)~ʹ*Fʹ given above.

 

           End group marks the end of the script.



What are the advantages and disadvantages of setting up models with the RAM method? On the positive side, it is extremely simple and general. It doesn’t matter if there are feedback loops, everything will be specified correctly (see Appendix D). Of course, some care may be required with the choice of starting values, but we do have a practical method. On the negative side, the covariance statement involves inverting the (I-A) matrix, which will be slow when we have many variables or a slow computer. Many models do not need to use matrix inversion in the covariance statement. In fact, it is only feedback loops which make this necessary; we can therefore seek a simpler, more efficient specification of the model. There are many of these, but we shall be aiming for one that is systematic and straightforward.


Simplified Mx Approach for Models without Feedback Loops


Consider Figure1.1 again. It has two levels of variables: P and Q at level 1, and R, S, T and U at level 2. We could put all the two-headed arrows at the first level in one matrix, all the level 1 to level 2 arrows in a second matrix, and all the two-headed level 2 arrows in a third matrix. Letting these matrices be X, Y and Z respectively, we would get:


It so happens that all the observed variables are at the same level (2) in this model, which makes life easy for us. Although it may seem that we have artificially contrived the model to have this desirable feature, many structural equation models can be written this way. The covariance formula for this model is:


and this has a very simple multivariate path diagram to represent it, as shown in Figure 1.2. To get from Figure 1.1 to Figure 1.2 all we did was to collapse the vector of variables within each level to form a single vector of variables at each level. The paths are collapsed into matrices of paths.


ole2.gif

Figure 1.2         Multivariate path diagram for the system shown in Figure 1.1.




Exercises:

 

1. Fit the model using the simpler X, Y and Z specification.

2. Find the change in chi-squared when the parameters b and c are set equal

3. Pick a simple published model and data and fit it with Mx with the RAM approach

4. Find a more efficient method to fit the model in 3.


To best learn how to use Mx, readers should attempt the exercises themselves before reading the next section, which describes the answer to the first exercise.




!

! Mx partly simplified approach to fitting models

!

#NGroups 1

#define top 2 ! Number of variables in top level

#define bottom 4 ! Number of variables in bottom level


Title Mx simplified approach to fitting model ! Title

 Data NInput=bottom NObserved=100 ! Number of variables,subjects


 CMatrix File=ramfit.cov ! Reads observed covariance matrix

 Begin Matrices; ! Declares matrices

  X Stan top top Free ! Two-headed, top level

  Y Full bottom top ! From top to bottom arrows

  Z Diag bottom bottom Free ! Two-headed, bottom level

 End Matrices; ! End of matrix declarations


 Specify Y ! Declare certain elements of Y as free parameters

  31 0

  32 0

  0 33

  0 34

 Start .5 All ! Supply .5 starting value for all parameters


 Covariance Y*X*Yʹ + Z; ! Formula for covariance model

 Options RSiduals

End group


What tricks have we used here? First, the keyword Free in the matrix declaration section makes elements of matrices X and Z free. Matrix X is standardized, which means that it is symmetric with 1's fixed on the diagonal, so free parameter number 1 goes in the lower off-diagonal element (the upper off-diagonal element is automatically assigned this free parameter as well, because standardized matrices are symmetric). Matrix Z is diagonal, so it will have parameters 2 through 5 assigned to its diagonal elements. We could put parameters 6 through 9 in matrix Y, but 31 to 34 are used instead, just to emphasize that we don’t want our specification numbers to overlap with specifications automatically supplied by Mx when the free keyword is encountered at matrix declaration time.


Note how this script is much shorter than the original, because of the reduced need for specification statements to put parameters into matrices. This illustrates a valuable feature of programming with Mx: with appropriate matrix formulation of the model, specification statements can be eliminated. The advantage of setting up models in this way is that modifying the model to cater for a different number of observed or latent variables becomes trivially simple. The more complex the model, the greater the value of this approach. Another advantage is that the computer time required to evaluate the model can be greatly reduced. We have not only eliminated the need for matrix inversion when the predicted covariance matrix is being calculated, but also reduced the size of the matrices that are being multiplied.



Fully Multivariate Approach


We now turn to a third implementation of the same model to show how the matrix algebra features can be used to make an efficient script which can be easily modified. Take another look at Figure 1.1. The first latent factor, P, causes the first two observed variables, S and T, whereas the second factor, Q, only affects the other two observed variables, U and V. Perhaps we expect to change the number of observed variables in one or other of these sets. If so, we might want to split the causal paths into two matrices, one for each factor. So, what was matrix Y in the simplified Mx approach will be partitioned into 4 pieces:

 

           the effects of P on S and T

           the effects of P on U and V (zero)

           the effects of Q on S and T (zero)

           the effects of Q on U and V


We’ll use a separate matrix for each of these, and use definition variables to make the changes in their dimensions automatic.

!

! Mx multivariate approach to fitting models

!

#NGroups 1

#define top 2 ! Number of variables in top level (P,Q)

#define left 2 ! Number of variables in bottom left level (R,S)

#define right 2 ! Number of variables in bottom right level (T,U)

#define meas 4

!

Title Mx simplified approach to fitting model

 Data NInput=meas NObservations=100 ! Number of variables & subjects

 CMatrix File=ramfit.cov ! Reads observed covariance matrix


 Begin Matrices; ! Declares matrices

  X Stan top top free ! Two-headed, top level

  J Full left 1 free ! From P to R,S arrows

  K Zero left 1 ! From Q to R,S (zeroes)

  L Zero right 1 ! From P to T,U (zeroes)

  M Full right 1 free ! From Q to T,U arrows

  Z Diag meas meas free ! Two-headed, bottom level

 End Matrices; ! End of matrix declarations

  Start .5 All ! Supply .5 starting value for all parameters


 Begin Algebra;

  Y = J|K _

      L|M ;

 End Algebra;


 Covariance Y*X*Yʹ + Z; ! Formula for model

End group


So, the major change here is to use the algebra section to compute matrix Y. We have eliminated the need for a specification statement by applying the keyword free to matrices J and M. If we thought that we might expand the model to have more than one factor for each side, then we could further generalize the script by changing the matrix dimensions from 1 to #define’d variables.


1.4Other Types of Statistical Modeling


The example in this chapter only deals with fitting a structural equation model to covariance matrices, but Mx will do much more than this! There are many types of fit function built in to handle different types of data for structural equation modeling, including:

 

           Means and covariance matrices

           Correlation matrices with weight matrices

           Contingency tables

           Raw data


Also, the program’s multigroup and algebra capabilities cater for tests of heterogeneity, nonlinear equality and inequality constraints, and many other aspects of advanced structural modeling.


Mx has a powerful set of matrix functions and a state-of-the-art numerical optimizer, which make it suitable to implement many other types of mathematical model. One crucial feature makes this possible — user-defined fit functions. The program will optimize almost anything. Given familiarity with matrix algebra and the basics of Mx syntax, it is often much quicker to implement a new model with Mx than to write a FORTRAN or C program specifically for the task. A slight drawback is that the Mx script may run more slowly than a purpose built programs, although this is usually well worth the saving in development time.



2Introduction to the

Mx Graphical User Interface


What you will find in this chapter


How to use Mx Graphical User Interface (GUI) to:

 

           Draw path diagrams

           Automatically create and run scripts from diagrams

           View & print results on diagrams

           Run Mx scripts

           View output in Project Manager, HTML or text formats

           Edit and debug Mx scripts

           Compare results and export them to other programs.


2.1Using Mx GUI



After installation, Mx GUI may be started by double clicking the Mx icon mxicon.jpg in either the group window, or from the Start Programs menu in Windows. You may create a shortcut on the desktop to simplify starting the program.


mxgui.jpg

Figure 2.1         Mx GUI with Project Manager Window and two diagram windows open


Figure 2.1 shows a diagram of the layout of the Mx GUI when the Project Manager window is active. The button bar icons are grouped into: filing, editing, printing, running, and drawing. As with any GUI you are free to behave as you like, clicking on buttons in any order. There are, however, some logical ways to proceed that will save time. The purpose of this chapter is to demonstrate the capabilities of the interface and how to use it efficiently.


You can draw path diagrams at any time during an Mx session. A diagram which is either visible in a window or minimized is called open. An Mx script can be automatically created from all open diagrams, sent to the Project Manager, and run. Parameter estimates will be displayed in the diagrams.


Path diagrams are models of latent variables (circles) and observed variables (squares), which are related by causal (one-headed) and covariance (two-headed) paths. While diagrams can be drawn and printed in the abstract, to fit models we must attach - or ‘map’ -our data to the squares. Mapping data is the best starting point for drawing a diagram.


2.2Fitting a Simple Model


Preparing the Data


We start with a simple dataset: a covariance matrix based on a sample of 123 subjects measured on two variables, X and Y. This information is entered in a .dat file, which for those familiar with Mx notation, contains the Data, CMatrix, and Labels part of an Mx script:

Data Ninput=2 Nobservations=123

CMatrix

.95

.55 1.23

Labels X Y

This file is supplied with Mx GUI; biv.dat was installed in the examples subdirectory of the Mx installation directory. For details on how to use other types of data, see chapter 3. To create the file yourself, any text editor, such as Microsoft's Edit program or Notepad will do. There is a text editor built into the Mx GUI, and by choosing the menu item File|New, or clicking the new file icon new.jpg , a new file can be edited and saved from the File menu or by clicking the save file save.jpg . If the file is created with a wordprocessor such as Wordperfect or Word, it must be saved as ASCII text.


Editing Dat Files


Mx GUI includes a way to prepare data for analysis with either Mx scripts or diagrams. It will read existing .dat files, or write new ones. To see how this works, the example file ozbmiodz.dat in the examples subdirectory of where Mx GUI was installed can be read into the dat file editor. Click on MxProject and select data edit. In the data edit window, click

Load, and then select the ozbmiodz.dat file. The number of input variables (NI=2) appears in the top left window, and the number of observations (NO=380) appears in the next window to the right. Third is the filename. In the last window at the top is the type of data. Last, in the largest panel the labels from the Labels command appear. All these fields may be edited to create a new dat file. Editing the filename is best done when saving the file.


Depending on the type of data being read there may be one or two additional folder tabs visible below the large window containing the labels. Clicking on these tabs allows the data to be edited or to name an external file from which they can be read. In the ozbmiodz.dat case, both means and covariances are supplied as data and they are both read in from external files.


Drawing the Diagram


To start a new diagram, click on the ‘new drawing’ iconCthen click the button marked

DataMap

. Then click the biv.dat file to open. The program then shows a list of the variables in this file. You can highlight one or more of these variables by using click, shift-click, click and drag, or control-click - the usual Microsoft Windows conventions. Get both X and Y highlighted by positioning the pointer over the X variable, pressing the left mouse button down, dragging it to the Y variable, and then releasing the mouse button. X and Y should now be highlighted in blue. Hit

New

and two new observed variables will appear in the diagram ready for analysis (they may have appeared behind the data map window). Click

Close

to close the data map window.


Note that the variables are created with variance paths variance.jpg (small double-headed arrows). These paths represent residual variance; they are sometimes called autocorrelational paths. This is called a ‘null model’. It has only variances and no covariances.


Fitting the Model


Click

Run

to run this job. You will have to supply a job name and a file name. Enter null for both, without any file extension. Mx GUI will then build, save and run the script file null.mx. In addition Mx automatically saves the diagram into the file null.mxd which can be reloaded later.


While the job is running, a counter appears. The numbers it displays show that the Mx engine is still trying to solve the problem. When it has finished the message ‘Parsing to Core’ may appear, indicating that the graphical interface is busy interpreting the results. Often this step is so fast that it is invisible.


Viewing Results


Results Panel

After the job has run, the Results Panel appears (see Figure 2.2). It contains information about the status of the optimization; in this example, the words ‘Appears OK’ should be on the top line, meaning that the solution it found is very likely to be a global minimum Footnote .

results.jpg

Figure 2.2         The Results Panel to view the results

 

Table 2.1          Correspondence between optimization codes and IFAIL parameters


Optimization Code

IFAIL

Serious

Action

Failed! Incomputable

-1

Yes

Check output & script for errors

Appears OK

0 or 1

No

Carefully accept results

Failed! Constraint Error

3

Yes

Check output & script for

constraint errors

Failed! Too few iterations

4

Yes

Restart from estimates

Possibly Failed

6

Sometimes

Restart from estimates

Failed! Boundary Error

9

Yes

Send script & data to neale@hsc.vcu.edu


The next line indicates the type of fitting function used, ML ChiSq, which is the usual Maximum Likelihood fit function for covariance matrices, scaled to yield a χ2 goodness-of-fit of the model. The χ2 is 39.546 in this example, with lower and upper 90% confidence intervals of 21.564 and 62.957 respectively. There is one degree of freedom, and the model fits very poorly (p=.000). There are two free parameters estimated (the two variance parameters) and three observed statistics (the two variances and the covariance). Akaike's Information Criterion (AIC) is greater than zero, reflecting poor fit. This impression is supported by the RMSEA statistic, which should be .05 or less for very good fit, or between .05 and .10 for good fit. The high value of .538 for RMSEA, and its 90% confidence intervals which do not overlap regions of good fit (0.393 is greater than .10) indicate that the model does not fit well. Click on the

OK

to remove the Results Panel. The Results Panel can be reviewed later by selecting the Output|Fit Results option.


Viewing Results in the Diagram

When the Results Panel closes, the estimates of the variance parameters for this model become visible in the diagram, on the double-headed arrows. The results panel information has been copied into the diagram. These results can be deleted entirely (click on the results box in the diagram and hit delete or ctrl-x or the specific elements may be selected for viewing and printing. To display only the fit and p-value we would double click the Results box to bring up the Check items to be displayed box and change the selections as shown in Figure 2.3. If the null option in the Preferences|Job options panel(see p 25) was used to these data, the grayed-out fit statistics would be available for display in the diagram.


items.jpg

Figure 2.3         The Results Box Panel to Change the results displayed in the diagram


Project Manager

More information about this model can be found in the Project Manager - click the

Manager

button (or the toolbar icon project.jpg , to open this window. Highlighted, the script file name is in the left panel, the group name is in the middle panel, and the first matrix in this group is in the right hand panel. The values in this matrix are shown in the Matrix Spreadsheet at the bottom of the Project Manager window.


Fit statistics for the model are shown in the left-hand panel of the manager, F: 39.546 being the value reported in the Results Panel. You can see the degrees of freedom, df: 1, in the left-hand Project Manager panel as well, but depending on your display you may have to use the slider at the bottom of the panel or resize the window to see them. More information on the fit of the model can be seen in the matrix spreadsheet at the bottom of the Project Manager by clicking the

Statistics

button. Click on

Statistics

again to toggle the view back to the highlighted matrix.


In the middle panel is a list of the groups in the job - there's only one group in this case. In the right hand panel is a list of matrices used to define the model (I, A, F and S), along with the observed covariance matrix (ObsCov), expected covariance matrix (ExpCov) and the residual, ObsCov-ExpCov (ResCov). If you click on the ObsCov matrix you can see the data matrix in the matrix spreadsheet at the bottom of the Project Manager. This view of the selected matrix can be turned on and off with the

View

button on the right of the manager. As described below these matrices can be copied to the cliploard with ctrl-c.


The matrix spreadsheet can show not only the values of the matrix (and its labels) but also the parameter specifications. If you click on the

Value

button, the parameter specifications will be shown. Try this out for the S matrix. This is the matrix of Symmetric arrows (two-headed). There are two of these, one going from X to X and one going from Y to Y. The free parameters are numbered 1 and 2 in the specs view of the S matrix. A parameter numbered zero is fixed. The A matrix contains the A symmetric paths (single-headed, causal arrows) which run from column variable to row variable. There are no causal paths in this model, so all of the elements of A are zero.


Click on ExpCov in the right hand panel. To the right is the formula used for this model. Models built from diagrams currently use one general formula for the covariance:

which is written using the quadratic operator & in the Mx matrix language: F&((I-A)~&S) Beginners don't need to know how these formulae are used to fit the model. Details are given in Chapter 1 , or see McArdle & Boker (1990) for a more complete description of this formulation.

Click on the ResCov matrix in the right hand panel. Notice how the diagonal elements of this matrix are very small. They are presented in scientific notation so 1.23e-08 means .0000000123 and this indicates a good fit of the model to these elements. The model does not fit the off-diagonal elements at all well. It predicts no covariance between these variables, but .55 is quite substantial covariance with this sample size --- as is shown by the fit statistic of χ2=39.55 for 1 df. The model should be revised.


Resizing the Project Manager

The Project Manager window may be resized by pulling the side, top, bottom or corner of it to a new position. It is also possible to resize the proportion of the window that displays jobs by dragging Footnote the bottom of the group panel up or down to a new position. Also, the

View

button will switch the matrix spreadsheet on and off.


Saving Diagrams


All open diagrams are automatically saved to file when the job is run, but sometimes it is useful to save diagrams manually. The null model diagram could be saved directly (without running it) using the following steps:

 

           Click on the diagram to select it

           Click on the save-to-disk icon save1.jpg (or use the File|Save menu item)

           Enter a filename such as null.mxd (.mxd is the default extension for Mx diagrams, which will be added automatically if you enter null without .mxd at the end). Note that all active (minimized or displayed) diagram windows are saved to the file.


See page 29 for details on running and saving scripts.


2.3Revising a Model


Revising models is easy with the graphical tools.


Adding a Causal Path


Returning to the null path diagram, a linear regression model can be devised by adding a causal path from the independent variable, X, to the dependent variable, Y. It may clarify the path estimates to put more space between the variables. Click on the open space to de-select all the variables. Then click on Y and move it a little to the right (if you want to keep it aligned with X, press shift throughout the operation). Now click on the arrow tool icon sarrow.jpg on the icon bar. In the diagram window, click on X, hold the mouse button down and drag it to Y, and release the button. The diagram should now have an arrow from X to Y. Usually we want these arrows to be straight, but sometimes it is useful to make them curved, which can be done by dragging the little blue square in the middle.


You can now hit

Run

in the diagram window. Enter regress for the Job name. Note that if instead you enter null as the jobname, it will overwrite the previous Mx script and diagram files. This overwriting approach is useful when trying to get a model correctly specified initially, but it is better to keep substantively different models in different diagram and script files. Doing so also allows comparison between them.


The model fits perfectly, as seen by the ML ChiSq of zero in the Results Panel. It also has zero degrees of freedom, because it has the same number of parameters as it does observed statistics. Such a model is often called ‘saturated’. Click on

OK

to view the new estimates in the diagram.


Adding a Covariance Path


The procedure to add a covariance path is essentially the same as for adding a causal path, but you use the covariance drawing tool instead. Note that there are two types of covariance path: variance variance1.jpg which appears as a little loop from a variable to itself, and covariance darrow.jpg . We'll add the covariance type to the diagram.


First, delete the causal path by selecting the pointer tool (the white arrow cursor.jpg ) click on the path once (a blue dot will appear in the middle of the path to show that it is selected) and press delete or ctrl-x (cut). Note that you can undo a mistake with the undo tool undo.jpg , and that tool-changes can be accomplished via a right mouse button click on a diagram.


Second, add the covariance path by selecting the covariance tool darrow1.jpg . Then click on X, drag the pointer to Y, and release. The path is automatically curved a certain amount. The curvature can be increased or decreased by dragging the blue dot in the middle of the path. Single-headed arrows can be made to curve in the same way, but their default follows the convention that they are straight lines, and we recommend keeping them that way if possible (reciprocal interaction between two variables A→B and B→A requires some curvature to stop the lines being on top of each other).


Third, hit

Run

to rerun the model. Enter covar as the name of the job and script. Again this model fits perfectly, with zero degrees of freedom. The parameter estimates are not all the same as the regression model we fitted earlier. These two models may be called ‘equivalent’ because they always explain the data equally well, and a transformation can be used to obtain the parameter estimates of one model from the other.




Changing Path Attributes


A variety of characteristics of paths can be changed and made visible in the diagram with the Path Inspector. Double-click the covariance path that we just created in the diagram to bring up the Path Inspector. Using the Inspector a path can be fixed, bounded, or equated to other paths. Confidence intervals can be requested, and the display of labels, start values and other information can be switched on or off. These changes can be made to several paths at once by selecting them all and checking the relevant line of the ‘Apply to this name’ pull-down menu in the Path Inspector.


pathinsp.jpg

Figure 2.4         Mx Path Inspector with parameter F fixed at .2


Fixing a Parameter


For illustration, we will test the hypothesis that the covariance between X and Y is equal to point two. In the Path Inspector panel for the covariance arrow check (•) “Fix This Parameter.” Double click the start value field and type in .2 to give the fixed value for this path. One useful way to remember that a path is fixed is to display only the start value and not the path label. Uncheck the “Display Label” box and check the “Display Start Value” box. At the end your Path Inspector panel should look like Figure 2.4. Click OK and then click

Run

in the diagram window to rerun the model. Enter a new job name such as fixed.


If you now look at the Project Manager and click

Statistics

, you can see the fit of this model and compare it with the other models so far. Note that the Path Inspector also allows you to change the boundaries to restrict path estimates to lie in a particular interval. To constrain a parameter to be non-negative, we would simply change the lower bound to zero.


Confidence Intervals


For any free parameter you can request confidence intervals. Just double click on the path, and check the “Calculate CI” and the “Display CI” boxes in the inspector. Run the model again, but this time just click

OK

without entering a new job name so that the job overwrites the existing one in the manager. After all, we are fitting the same model and simply calculating a few more statistics. Mx computes likelihood-based confidence intervals which have superior statistical properties to the more common type based on derivatives. Chapter 5 describes the method used, and Neale & Miller (1997) discuss the advantages of using this type of confidence interval. The main disadvantage is that they are relatively slow to compute, so we suggest computing them only when the model is finally correctly specified.


Equating Paths


Mx uses the Labels of the paths to decide whether or not they are constrained to be equal. To illustrate, add a latent variable to the diagram, and draw causal paths from it to both X and Y, and constrain the two paths to be equal. First click on the Circle tool circle.jpg , and click on the diagram to add the circle. Second, click on the causal path tool and add the two paths from the new latent variable to X and Y. Third, click on one of the paths and give it the same label as the other. Finally, to make the model identified we should delete the covariance (double-headed) path between X and Y. On running it, we should find the same perfect fit (χ2=0) of the model. This time we have the square root of the covariance of X and Y as estimates for the two paths.


Note that the latent variable we added had an variance path with the fixed value of 1.00 on it. This is different from the observed variables, which come with free variance paths, corresponding to residual error variance.


Having a fixed variance of 1.00 makes our latent variables standardized by default. Of course, we could make a latent variable unstandardized by fixing it to some other value, or (if there is enough information in the model) estimate its variance as a free parameter.


Moving Variables and Paths


It is easy to modify the appearance of a diagram by moving one or more variables. To select a variable, de-select everything by clicking on the selection tool cursor1.jpg Footnote and then clicking on some open space in the diagram. Then click on the one variable, and drag it to its new position. To move several variables together, click on one of them, then press the shift key and click on another variable. Alternatively, you can click on the background of the diagram and drag a rectangle around the variables you wish to select. When all the variables to be moved are selected, you can drag them to their new location.


2.4Extending the Model


Multiple Groups: Using Cut and Paste


A valuable feature of graphical interfaces is the ability to rapidly duplicate objects by means of cut and paste. Here we go through a simple multi-group example --- the classical twin study --- to illustrate these actions.


Fitting the ACE Genetic Model

Structural equation modeling of data from twins has been described in detail elsewhere. In summary, twin pairs are diagnosed as either Monozygotic (MZ) or Dizygotic (DZ). The pair is treated as a case, and the MZ pairs are analyzed in a separate group from the DZ. The structural equation model is configured with three latent variables which model possible effects of: additive genes (correlated 1.0 in MZ twins and 0.5 in DZ pairs); shared environment (correlated 1.0 in both types of twin pair); and individual-specific environment (uncorrelated between twins). This is a two-group example so we will draw two diagrams.


Drawing the MZ Diagram

To begin modeling, open the Mx GUI and click on the open a new drawing icon diagram.jpg . Then click the

DataMap

button and the

Open

button and select the file ozbmiomz.dat from the examples subdirectory. Select only the variable BMI-T1 and click

New

to drop it into the drawing. Move the data map window out of the way or close it, and start working on the drawing.


We need to add A1, C1 and E1 latent variables. Click on the latent variable icon circle1.jpg and draw three circles above the BMI-T1 variable. Relabel the variables to read A1, C1 and E1 by double clicking inside the circles and typing in the new text.


Next we need to add the causal paths from A1, C1, and E1 to BMI-T1. Click on the causal arrow icon sarrow1.jpg and click and drag from A1 to BMI-T1, and release. Do the same for C1 to BMI-T1 and E1 to BMI-T1. Mx automatically labels arrows and variables for us, but we want to use specific names for our paths: a, c and e. Therefore, we double click on each path in turn and rename it in the label field of the Path Inspector. Care is needed here! Depending on the order in which the latent variables were drawn, there may already be a path called a, c or e on one of the latent variables. Relabelling the causal paths may have inadvertently caused an equality constraint that we don't want. Relabel any of the latent variable variance paths as necessary to make them different from a, c and e. Finally, because we are going to model individual-specific variation with e we can remove the variance path variance2.jpg on BMI-T1. Click inside it so that its blue select button appears and hit delete or ctrl-x.


We now have a model for Twin 1, and we need to replicate it for the Twin 2. Either press ctrl-a or go to the Edit menu and click Select All. Press ctrl-c for copy and ctrl-v for paste (or use the icons copy.jpg and paste.jpg or the Edit menu equivalents) and you have a new copy of the model for an individual. Use the mouse to drag it to the right of the existing model. You may have to resize the window to give yourself space for this. Alternatively, you can zoom out the drawing with the zoomout.jpg button (see below).


A very important step comes next. We have duplicated the model for twin 1 --- both the A, C and E part and the phenotype BMI-T1. We do not want to model the covariance between BMI-T1 and BMI-T1. When we duplicated the model for twin 1, the new BMI-T1 box was black rather than blue. This is because it is not mapped to data. To map it, we select the variable BMI-T1 (and only this variable) in the diagram. Then hit

DataMap

, click on BMI-T2 in the variable list, and then

Map

. The variable in the diagram turns blue and the label is revised to say BMI-T2. Mx now knows what data we are analyzing.


To complete the model for MZ twins, we need to do two things. First, change the labels of the latent variables causing BMI-T2 to A2, C2 and E2 by double clicking on the circles and typing in the new names. This step is for cosmetic purposes - Mx will still fit the correct model even if the latent variables have incorrect names. Second, we must specify that the covariances between A1 and A2 and between C1 and C2 are fixed at one. Click on the covariance path tool darrow2.jpg . Click on A1, drag to A2 and release. Do the same for C1 and C2. Note that if you drag from right to left, the arrows curve downwards rather than upwards. The curvature can be adjusted by clicking on the arrow and dragging the blue selection button in the middle.


You must now fix the A1-A2 and C1-C2 covariances to one. Click on each path in turn, check the “Fix this parameter” box, make the starting value 1, and select “Display Starting Value”. At this stage the diagram should look something like Figure 2.5. It would be possible to run this model, but the parameters a and c are confounded when we have only MZ twins. To identify the model we must add the DZ group.


Drawing the DZ Diagram

Adding the DZ twin group is easy. Click on the MZ diagram and hit ctrl-a (select all) and ctrl-c (copy). Then press the new drawing icon diagram1.jpg . Click on the new diagram, press ctrl-v (paste) and the MZ model is copied into the new drawing window. Two steps remain. First click on the covariance between A1 and A2 and change its starting value to 0.5 - the value specified by genetic theory. Second, map the observed variables to data. Hit the

DataMap

button and select the file ozbmiodz.dat. Highlight BMI-T1 and BMI-T2 in the variable list and click

AutoMap

. Because the variable labels in the ozbmiodz.dat file are the same as the variable labels in the ozbmiomz.dat file, the automap function maps the variables from the list to the diagram correctly.



mztwins.jpg

Figure 2.5         Starting values for an ACE twin model for MZ twins


Fitting the Model

Finally, run the model by clicking the

Run

button in either diagram. Enter ace as the filename for the script and diagrams. The Results Panel should report a fit of 2.3781 and the estimates in the diagram should look like those in Figure 2.6.


mztwinsest.jpg

Figure 2.6         Parameter estimates from fitting the ACE model to MZ and DZ twin data


Note that in this example, there were two Mx errors in the error window. These errors warn us that although we had supplied both means and covariances as data (in the .dat files), only a model for covariances was supplied. See below on page 22 for details on how to graphically model means.


Selecting Different Variables for Analysis


To unmap variables, you must select one and only one variable, go to the data map window, select only that variable in the list, and then press the

Unmap

button. You can then remap the variable in your diagram to another variable in the list by selecting the variable in the list and pressing

Map

.


The

AutoMap

feature lets you automatically map boxes to variables in a dataset by name. If you have a series of unmapped boxes in your diagram, and a series of unmapped variables in your dataset, then pressing

AutoMap

will map them by name. This is very useful when you have run an analysis on one dataset, then wish to fit the same model to a different dataset. It also comes in handy when you have multiple groups, with variables with the same names being analyzed in different groups, as we did with the twin study example above.


Modeling Means


The Mx GUI allows the user to draw and fit models to means as well as to covariances. This is simplified with a new type of variable in a path diagram, the triangle. Let's add means to the twin model we developed earlier. If you do not still have the MZ and DZ drawings open, load them from the file ace.mxd.


Select the MZ diagram and click on the triangle tool triangle.jpg . Point the mouse somewhere below the rectangles and click once to create a triangle. Then use the causal path tool sarrow2.jpg to draw paths from the triangle to the variables BMI-T1 and BMI-T2. Do the same thing in the DZ group. Mx has automatically set new, free parameters on the paths and we can run the job.


The output for this job should give exactly the same goodness-of-fit to the model as we had before, because the model for the means is saturated. It has one free parameter for each mean. Let's test the hypothesis that Twin 1 means are equal to Twin 2 means. Go to the MZ diagram and make the label on the path from the triangle to BMI-T1 the same as the label from the triangle to BMI-T2. Do the same in the DZ diagram (keep the labels different from those on the paths from the triangles in the MZ diagram). Run the job again, and give it a new name, like t1eqt2. In the Project Manager window we see that the χ2 (F:) has only slightly increased from 2.38 to 2.55 - an increase of less than .2 for two degrees of freedom, which is non-significant. This indicates that the hypothesis that the means of twin 1 and twin 2 are equal is not rejected.


To continue the example we can test whether MZ means are equal to DZ means. This is done by going back to the DZ diagram (ctrl-tab is a shortcut way to switch between Mx windows) and changing the paths from the triangle so that they have the same label as those in the MZ group. Run the model again and call it mzeqdz. The χ2 of 6.24 has increased by about 3.7 over the t1eqt2 model, for one degree of freedom, which is not significant at the .05 level. The hypothesis that the MZ means equal the DZ means is not rejected. The sample sizes here (637 MZ and 380 DZ pairs) are quite large, so the chance that this result is a type II error (failure to detect a true effect) is small. The observed MZ-DZ mean difference must be small relative to the variance of body mass index in these data. We can check this result in the Project Manager window. Select the t1eqt2 job and examine the predicted MZ and DZ mean in the ExpMean matrix for the MZ group and compare it with the ExpMean matrix in the DZ group by alternately selecting the MZ and DZ groups. The DZ mean is .45 and the MZ mean is .34 which is approximately .11 of a standard deviation different because the expected variance (see ExpCov) is about .97 for this model. The standard error of the difference between two means is given by the formula . This formula isn't entirely appropriate for the case in hand because we have correlated observations making up the two samples. If we pretend that they are uncorrelated then the standard error would be approximately √1/760 + 1/1274=.0458. If we pretend that the twins are perfectly correlated then we would have √1/380 + 1/637=.0648. The first estimate of the standard error would give a z-score for the difference of .11/.0458=2.40 (significant at .05 level), whereas the second would give 1.70 (not significant at .05 level). The truth lies somewhere in between, and a very nice property of the maximum likelihood testing is that it handles these complications with ease and provides appropriate tests for both independent and correlated observations. The χ2 difference test above showed that the difference was not quite significant at the .05 level. Better still, we can obtain confidence intervals on this χ2 test and on the parameter estimate itself.


The Mx Model for Means

When computing a predicted mean, Mx traces the paths from an observed variable (rectangle) to a mean variable (triangle) and multiplies the paths together. If there are several triangles or pathways from a triangle to an observed variable, it sums their contributions to the mean. Note that, unlike covariances, there is no changing of direction when traversing paths, and only the single-headed arrows are used. The matrix formula Mx uses to compute the predicted means (shown in ExpMean in the Project Manager) is

where U is a unit matrix and M contains the paths from the triangles to the circles and squares.


2.5Output Options


Zooming in and out


To zoom into a part of a diagram, click on the zoom in tool zoomin.jpg then click on the diagram workspace and drag a rectangle around the part of the figure that you wish to enlarge.


To zoom out, select the zoom out tool zoomout1.jpg click on the diagram and drag a square inside it. Note that this feature works proportionately, so that it is possible to get a very tiny and unreadable figure if you drag a very small square by mistake.


Sometimes zooming operations can cause a diagram to become so big or small that it disappears altogether. A click on the zoom undo button zoomundo.jpg will shrink or expand the diagram to roughly fit the window size.


Copying Matrices to the Clipboard


A matrix may be copied to the Windows clipboard by selecting it in the right hand panel of the Project Manager window, and pressing ctrl-c or the copy icon copy1.jpg . The contents of the windows clipboard may then be pasted into wordprocessing or spreadsheet applications, usually by pressing ctrl-v or clicking the appropriate paste tool or menu item. By default, the matrices are copied with a tab character between each column, and a carriage return character at the end of each row --- suitable for many applications. These defaults may be changed using Preference|Matrix Options. For example, to obtain output formatted suitable for a LaTeX table, the user-defined delimiters should be changed to & for columns and \\ for rows. Note also that the number of decimal places may be changed. Diagrams may be copied to the clipboard as described below.


Comparing Models


When several models have been fitted to the same data, it is possible to generate a table of parameter estimates and goodness-of-fit statistics automatically. The menu item Output|Job Compare will build a file of comparisons, which you can view with a text editor. The first column of this file contains a list of all the paths in the model, followed by the fit statistics. The remaining columns are the estimates and fit statistics found for all the models in the project manager. This table may then be copied into other software for publication. The format of the table depends on the Preference|Matrix Options in the same way as copying matrices to the clipboard.


To get only a few of the models in the manager, simply delete the jobs that should be excluded from the comparison, by selecting them and hitting the Project Manager

Delete

button.



Setting Job Options


Mx uses a default set of job options suitable for most general purpose model-fitting, but there may be times when other settings are desired. The Job Option panel (menu Preference-Job Option) is used to change these settings. Figure 2.7 shows the default settings: Text output with 4 decimal places of precision and 80 column width will be generated; no debug statistics and individual pedigree likelihood statistics will be generated. Confidence intervals (90%) on the fit statistics will be computed, and null model and power statistics will not. Parameter estimates will not be standardized. New Mx jobs created from diagrams will be started from the starting values in the diagram, not the current estimates.


joboption.jpg

Figure 2.7         The Job Option Panel.


Text Output

Having run an Mx job, you may wish to view the regular text output. If so, simply hit the output tool mxoutput.jpg . The Mx GUI comes with a shareware editor called notebook.exe which you can select. It allows you to edit and view much larger files than Microsoft Windows' Notepad editor. You can select an alternative text viewer via Preferences (though we do not recommend Microsoft Notepad because of its inability to edit large files).


HTML Output

Flexview is supplied with Mx to simplify the viewing of HTML output. In order to use it, you must first tell Mx to produce HTML output when it runs, before running the job. This you do via the Preferences-Job Option menu item. Netscape could be chosen, but earlier versions start up slowly every time. Under Internet Explorer, choosing explorer as the html viewer (typically found in c:\windows\explorer.exe) works quite well. For large output files, Flexview does not work well and text output or another viewer should be used. Flexview is shareware and you should register it if you decide to use it regularly.


HTML and Text Appearance

You can change the number of decimal places and the width of Mx output by entering different values in the decimals and width fields.


Debug Output

Auxiliary output about optimization may be printed to the file NAGDUMP.OUT by requesting NpSol values greater than 0 (up to 30). Debug output will go to this file as well if Debug is set to 1. Debug prints the values of the parameter estimates and the fit function for each group for every iteration during optimization. Such files can be both large and slow to write to disk, so we recommend only using these features in an emergency.


Individual Likelihood Files

If you are using raw data, it is possible to save the individual likelihood statistics (see p. 106) to a file by entering a filename in the text box “Ind. Likelihood File”.


Additional Statistical Output

Certain ‘comparative’ fit indices require the computation of the fit of a Null model. By default the null model has free parameters for the variances and zero covariances. This model will be fitted automatically by Mx and the statistics will be computed if the Null model radio button is set to Auto. Sometimes, a different null model than the default is required; this model should be fitted by the user and the χ2 and degrees of freedom noted. These statistics would then be entered by first selecting the Manual radio button and then entering values in the Null ChiSq and Null Df fields. The additional statistics will be visible in the Results Panel.


Power Calculation

To compute statistical power, the “Power Calculation” checkbox should be checked, and the alpha-level and degrees of freedom should be entered. See the p. 114 for information on how to fit models that assess statistical power.


Confidence Intervals on Fit

By default the Mx GUI requests 90% confidence intervals on fit. If an alternative interval is required, it can be entered in this text field. If CI's are not required, then the check box can be cleared. Note that this is not the same as confidence intervals on the parameter estimates, which must be requested for paths using the Path Inspector.


Standardize

By default, Mx produces unstandardized parameter estimates. This default may be changed by selecting the “Standardize” check box. The graphical interface then generates different Mx scripts which include non-linear constraint groups to remove the variance of the variables. This box should be checked when working with correlation matrices to obtain correct confidence intervals on the parameters. Correlation matrices should be entered in dat files with a KMatrix not a CMatrix command. The number of degrees of freedom may be changed for certain special models as describe on p 93.


Restart

The Restart check box changes the scripts generated from diagrams. Instead of using the starting values of paths, the current estimates are used instead. If a model has been fitted before, and is only slightly changed, e.g. by fixing one parameter, then re-running from the existing estimates may be much faster than starting from the starting values again.


Optimization Options

Mx uses certain default values of the optimization parameters which have proven to be reliable under a variety of conditions. Occasionally it is necessary to use different settings; these technical options are described on p. 100. For the most part, these options should not be changed.


If optimization ends with the message “Possibly Failed” you can try to restart optimization automatically with Random Start at ‘-2’ for two attempts to solve the problem. If you want to try randomized starting values for a model, set it to a positive value, but be sure to put sensible boundaries on all your free parameters.


Printing


To print diagrams, click the printer icon print.jpg or use the File menu and select Print. Note that the part of the diagram visible in the window is printed. Print can also be used to print scripts from the editor window. The script font can be changed with the Preferences|Script fonts menu item.


Printed output can be previewed with the File|Print preview menu item or the preview tool preview.jpg on the toolbar. This feature is a good way to save time and paper. Some features of printing, like printing the object handles on selected objects, may be unexpected, so print preview is recommended.


Improving Print Quality

There are various ways to improve the visual appearance of the diagrams. Generally, these are worth doing for final copy, such as printing for publication or to make slides for a talk.

First, you can move the path labels away from the paths by clicking on them and dragging them to a new location. Occasionally it may be difficult to select the label because another object, such as the path, is selected instead. If so, try clicking slightly to the right of the label. Second, in Preferences you can choose font size and appearance, separately for the paths and the variables. Also in Preferences you can choose line thickness, which currently affects both the paths and the lines around the variables. To add impact for color printing, you can change the color of the background and foreground components (paths, boxes, text etc.) in a diagram. Third, remember that the amount of information displayed about a path - labels, estimates, confidence intervals, boundaries and so on - can be changed for individual paths with the Path Inspector. Revising the appearance of many paths simultaneously can be done by selecting several paths and selecting the ‘Apply to all in this diagram’ box in the Path Inspector.


The variance arrows sometimes become obscured by paths going to and from variables. They may be dragged to one of eight positions around circles or squares.


Aligning Variables and Paths The grid tool grid.jpg adds a grid to the currently active drawing. The color and size of this grid can be changed via the Preferences|Grid menu item. It is then simple to align circles and squares to this grid by moving them. Much faster is to use the snap to grid feature snapgrid.jpg , which automatically aligns variables on the grid. Objects will move only to another grid place, so moving a variable a small distance often won't have any effect at all. Moving it a greater distance will allow it to snap to a new grid position. The granularity or size of the grid can be changed using Preferences|Grid size.


Paths labels are given a default central position based on the length and direction of the path they are labeling. If a path is longer in the vertical axis than the horizontal, its label will be centered vertically. Conversely, if it is longer in the horizontal axis its label will be centered horizontally. By moving objects further away it is sometimes possible to automatically align relevant path labels; this is the preferred way to align path labels. If necessary it is possible to move each individual label away from its default position by dragging it to a new position - but this should be used as a last resort. We recommend that print preview (File|Print Preview or preview1.jpg be used to check the visual appearance of a figure.


Exporting Diagrams to other Applications


Mx GUI uses the standard Windows clipboard to export diagrams to other applications. To export a diagram, left-click once on the background of the diagram, and then press ctrl-c or press the copy icon copy2.jpg . This copies the figure to the clipboard. Open another application, such as Wordperfect, MS Word, Harvard Graphics or Visio and press ctrl-v (or select the paste menu command or click the paste icon paste1.jpg ). Partial figures may be copied in the same way, by selecting only part of the diagram before pressing ctrl-c.


Diagrams may also be printed to a postscript file, if you have a postscript printer driver installed. From the printer control menu, select encapsulated postscript as the postscript option, and check the 'Print to file' box.


Files and Filename Extensions


Mx uses and creates a lot of different files, with specific filename extensions attached to them. To save disk space, some of them may be deleted. Table 2.2 lists the filenames and their contents, and indicates whether they may be safely deleted. Typically one does not want to delete data or useful drawing or script files. Malfunctioning scripts might be better deleted. At this time .prj files cannot be read back into the GUI.

 

Table 2.2          Summary of filename extensions used by Mx

 

File extension

Contents

Delete

.dat

Mx data

Probably not

.mx

Input script

Probably not

.mxd

Mx path diagram

Probably not

.mxo

Text output

If no longer needed

.mxh

Header file

Probably not

.mxt

Template file

Probably not

.htm

Hypertext output

If no longer needed

.mxl

Frontend output

Yes

.prj

Mx project

Probably not

.exe

Executable Mx program

No

.dll

Dynamic link library

No



2.6Running Jobs


Running Scripts


Many previous users of Mx and those working with non-standard models (such as those involving constraints or special fit functions) will want to be able to run such models. The Mx GUI has been designed to make working with scripts efficient. It lets you open script files, edit them, and view output in either the manager or text or hypertext (HTML) formats. In addition, if there are errors in the script, it will display them and with a click of a button will take you to the editor window with the problem text highlighted.


Let's take an example script. Start the GUI and click the open icon open.jpg . Choose twinpar.mx and hit

Run

in the editor window. The Mx statistical engine runs the job in the background and then delivers the output to the manager. We don't need to bother with the details of this particular job, it's just an example to show how several groups appear. You can easily look at the matrices in the different groups by selecting the group in the middle panel and the matrix in the right hand panel.


As we run more jobs, perhaps editing the script or selecting other scripts, the Project Manager fills up with the new jobs. The fit statistics from all jobs become visible in the bottom panel when the

Statistics

button is pressed.


Errors in Scripts

To help debugging of Mx scripts, the line and column of the input file where an error occurred is automatically sent to the GUI to speed up debugging of scripts. Let's see how this works with an example.


Edit a dummy script by hitting the new icon new1.jpg . Type in the following:

Title

 Data Ngroups=1

 Oops a mistake

 Begin Matrices;


Hit

Run

and see what happens. Click the left mouse button on the error, and note how the editor window shows the ‘Oops’ text highlighted. You are now in a good position to fix the problem, if you are familiar with the script language. A full description of the language is given in chapters 3-5 and examples are in chapter 6. Courses on Mx are run quite regularly; consult http://www.vcu.edu/mx.


Sometimes it is helpful to look at the text or HTML output file to see full details of the error. Click the right mouse button on the error to bring up the output file. With HTML, the error is automatically presented, with Text output it is necessary to scroll to the end of the file.


Editing Mx Header Files


Mx provides a system for advanced users to make it easier for the beginning user to start using the program. Using this approach to script writing can also make it easier for all users to change the script for other data sets or to change the number of variables in the analysis, which variables are analyzed, the number of factors to be used, or even the type of model to be fitted.


In the examples subdirectory, the files factor.mxt (template), factor.mxh (header) and factor.dat (data file) illustrate how this can be used. Opening the header file, from the MxProject|Header Edit menu, the user can change the number of variables being analyzed, or the number of factors being fitted by clicking on the relevant lines of the header file in the header edit box. For a more detailed description of this example, see page 149. An example of header and template files for fitting alternative genetic models to twin data is described on page 151.


We expect this new feature to lead to a collection of header and template files that will be added to the website http://www.vcu.edu/mx in the future.


Using Networked Unix Workstations


Performance and Multi-Platform Environments

The difference in performance between high-end MS Windows computers and Unix workstations is narrowing all the time. Indeed, the same hardware can be used for either Unix or MS Windows so it might be argued that it has disappeared. However, it is not very cost-effective to supply every student and faculty member with the latest and fastest PC. Many institutions still use a mixed platform computing facility in which there are powerful Unix servers available for general use, along with PC computers that have networked access to these servers. The Unix machines often have large amounts of memory, high-speed disk access and may offer much faster CPU than is available for PC's. To facilitate the use of these remote machines, Mx GUI has a networking component which allows the user to select a remote Unix host to run Mx scripts.


The Host Options Panel


runoptions.jpg
remote.jpg










 

Figure 2.8         The Host Options Panel for local PC use (left) and remote Unix use (right).


Figure 2.8 shows the Host Options Panel set for local (on the PC on which Mx GUI is running; left panel) and remote processing (right panel). By unchecking the local host checkbox, the user can enter the IP address of the Unix machine and their username and password. Mx is not (yet) a standard part of the Unix operating system, so it must be installed on the host in question before remote access to it will work. The files and instructions for installation are available at http://www.vipbg/vcu.edu/mxgui/unix.html. As a user, you should make sure that your path on the Unix host includes the directory in which Mx-Unix has been installed, which is usually /usr/local/bin.


Running a Job Remotely

The following steps are required to run a job remotely:

           Make sure you have an account on a Unix host which has the Mx server installed

           Go to the Host Options panel (Preferences|Host Options menu) and enter the machine name, username and password

           Click

Run

in your diagram or script window

           Enter any commands to change directory Footnote on the remote host and click

Execute

           Click

Run Mx

           Click

Run Mx

again if it says Possible Incompatible Remote Engine, Install New Remote Engine (this error sometimes occurs spuriously)

           Wait for the job to run and to be transferred back to the GUI.


Transferring Files to Unix Hosts

Running Mx GUI on a remote host has a few additional considerations. Foremost is the use of files, especially the File= subcommand used in Mx scripts. Any file mentioned in a File= subcommand must be transferred to the remote Unix host (using e.g., ftp) in order for the Unix host to access it. For this reason, it is best not to put pathnames in the File= subcommands, because of inconsistencies between the Unix filenaming system and the windows filenaming system. It would become messy if the only place used for Mx files was the root directory on the Unix host, so there are facilities for changing directory on the remote host prior to running scripts there. In the Host Command window, the user can enter a Unix command such as cd mymxfiles to change directory, before hitting the

Execute

.


One exception to the need to transfer files to the remote host is the .dat file specified in a diagram

DataMap

command. This file will be included in the script and automatically transferred to the Unix host. For this reason, it can be best to keep all the data in the .dat file itself and not to use the File= subcommand at all. In some circumstances this may be inefficient, especially if the network connection is slow, as all the data will be transferred with the job --- this applies especially to large raw data files or large asymptotic weight matrices. If several jobs are to be run using the same dataset, it may be more efficient to ftp these files to the Unix host and return to using File= in the script.


Increasing Backend Memory

The default amount of memory available for the Mx engine to store data, perform matrix algebra and optimization is 100,000 words for the PC version. This can be increased when necessary by changing the value in the Run Options panel (Figure 2.8). The Unix versions have a default of one million words of memory and at present this cannot be altered. If a larger Unix version is required, please email neale@hsc.vcu.edu for a special build. Sometimes more efficient re-specification of a problem can free up workspace.


2.7Advanced Features


In this section we consider some of the more advanced features of Mx GUI, including adding non-linear constraints to diagrams, and the use of continuous moderator variables.


Adding Non-linear Constraints to Diagrams


In earlier sections we saw that it is straightforward to make one path equal another by giving it the same name. It is also simple to force the estimate of a path to lie within certain limits by double-clicking the path and entering boundary constraints in the Path Inspector box. Much less simple is the addition of non-linear constraints which at this time can be done only by directly editing the script.

constrain.jpg

 

Figure 2.9         Higher order factor model with nonlinear constraints imposed such that the variances of F1 and F2 are constrained to equal 1.0 (.24+.872 =1.0)


Figure 2.9 shows a diagram with a higher-order latent factor (H) and two first-order factors F1 and F2. Suppose that we wish to constrain the variance of the second-order factors to equal unity. One simple way to do this might be to eliminate H and allow the factors F1 and F2 to correlate, and give them error components fixed to unity. However, suppose that the paths from H were of substantive interest themselves, perhaps because of reports from other investigations. This example is for illustration, so we'll do it the hard way with non-linear constraints. The data come from Horn & McArdle (1992) and concern the sub-scales of the WAIS intelligence test, taken by subjects aged between 16 and 28 years of age. The tests may be broadly categorized as verbal (IN: Information; CO: Comprehension; SI: Similarities; and VO: Vocabulary) or spatial (PC: Picture Completion; BD: Block Design; PA: Picture Arrangement; and OA: Object Arrangement).


The following steps are necessary:

1.          Draw diagram

2.          Build script from diagram (Click

To Script

)

3.          Edit script file:

             a.          Increase NGroups by one to allow for new constraint group

             b.          Edit in the constraint group using Mx script language

4.          

Run

the job from the script

5.          View parameter estimates in the diagram


The most difficult part of the sequence is of course 3(b), where knowledge of the Mx script language and the way that the Mx GUI creates scripts is required. We now give a brief description of the approach used to implement the constraints for this example.


Because the matrix expression for the covariances of all the variables (both latent and observed) is (I-A -1) * S * (I-A-1)' we can compute this by equating matrices to those of the first group, and entering this matrix formula in a Algebra section. More tricky is to extract the relevant matrix elements corresponding to the variances of F1 and F2 This can be achieved using the \part(A,B) function which partitions matrix A according to the rows and columns specified in B. Matrix B must have four elements and these identify two corners of the sub-matrix, so setting the elements of B to 9,9,10,10 will extract the 2×2 matrix from element 9,9 to element 10,10. We know that this is in fact the sub-matrix that we need by looking at the variable labels for matrix S in group 1. Variables F1 and F2 appear as the ninth and tenth elements of the list of labels. A second matrix algebra statement can be used to create the sub-matrix and place it in matrix T.


It remains to equate the diagonal elements of T to unity. This we can do using the \d2v matrix function which extracts the diagonal of a matrix to a vector. It is then simple to request a constraint between this vector and a vector in which every element is 1.0, as shown in the following lines of Mx script:


Title Add constraint to variances of F1 and F2

 Constraint NInput=2

 Begin Matrices = Group 1

  P Full 1 4 ! for the partitioning part

  U Unit 1 2 ! two 1.0 elements to equate to variances

 End Matrices;

  ! deduce from labels for S above that F1 and F2 are variables 9 and 10

  Matrix P 9 9 10 10 ! to be used for partitioning

 Begin Algebra;

  R= (I-A)~&S; ! computes covariance of all variables, latent and observed

  T= \part(R,P); ! computes the sub-matrix of R from element 9,9 to 10,10

 End Algebra;

 Constraint \d2v(T) = U ; ! constrains the diagonal elements to equal U

 Option DF=-1 ! I add this df adjustment because really and truly all

! we have done is put the same constraint in twice, because the paths

! from H to F1 and from H to F2 are equal. A more efficient way would be to

! only constrain one of the variances (F1 or F2) but this is an illustration.

End


The constraint syntax above involves the = operator because we want an equality constraint. For nonlinear boundary constraints one could use the < or > symbols instead.


Once the script has been modified, care must be taken not to overwrite it with a new script from the diagram. If the diagram is modified, it is necessary to go through steps 2-4 again to run it, otherwise the constraint group will be lost. However, these steps are much easier the second time because cut and paste can be used to get the constraint group from the earlier script.


A final remark concerns the use of Option DF=-1. By default, Mx will add one observed statistic for each non-linear constraint imposed. This addition of a statistic is analogous to the loss of a free parameter when two parameters are linearly constrained (equated) Mx assumes that whatever non-linear constraints you are using effectively reduces the number of parameters (or equivalently increased the number of observed statistics) in the same way. In this example we did a silly thing, because both constraints were identical, so we really gained no information by adding the second constraint. The df=-1 option corrects this silliness.


Moderator Variables: Observed Variables as Paths


An interesting feature of Mx is that it allows the specification of models that can differ for every subject in the sample. In some sense, this is the extreme case of multiple groups, and it has some interesting statistical possibilities. For one, this type of modeling is equivalent to Hierarchical Linear Modeling (HLM) as specified by Bryk and Raudenbush (1992) and others. This aspect of Mx has not received much attention, but perhaps that will change now that the graphical interface facilitates the specification of some of these models.


We will illustrate the method with an uninspiring example of interaction terms in linear regression. This example has the advantage that we know the answer and can compare it with results from standard methods. The standard model of linear regression with interaction that we shall use is

where b3 is the interaction parameter of interest. In a path diagram, it is possible to model these data by pre-computing x1×x2 and fitting a model like the one shown in Figure 2.10. An alternative approach would be to allow two pathways from x1 to y, one having the parameter b1, and the other going through two paths, one with the parameter b3 and the other having the individual's data for x2 on it. Thus, by path analysis, the model for y would be equivalent to the model in the equation. The question is, how do we get individual-specific data onto the paths in an Mx model?

linreg.jpg

Figure 2.10       Linear regression with interaction model with two independent variables, X1 and X2 and their product X1*X2 and one dependent variable Y.


Raw data is essential for fitting these ‘data-specific’ models. As described in the Mx manual, two basic forms of raw data may be read by Mx: variable length (VLength), and rectangular (Rect). Rectangular is generally much easier to generate, and except for special cases such as many siblings in a family or very serious missingness, it is easier to use. A .dat file with rectangular data might look like this:


! Rectangular data file created by Jane Datapro on Sept 31 1997

! using program /home/janedata/mxstuff/makemx.sas

!

Data NInput=4 NObservations=0

Labels X1 X2 X2d Y

Rectangular

1.234 2.345 2.345 3.456

4.321 3.210 3.210 2.109

...

End Rectangular


The ... indicate the remaining records of the dataset. Note the valuable comments at the start of the file - very useful for later retracing one's steps. The special feature of this data file is that the second variable (X2) has been included twice (X2d is identical to X2 for all cases). We are going to make use of this variable twice - once as an independent variable, and once as a moderator variable. In a linear regression we normally remove the main effects of a variable before testing for the presence of interaction, hence the duplication. Again we should remember that this simple example is for illustration, and that the same thing could be achieved more easily with standard software. The more complex possibilities that such modeling encompasses could not be easily specified.


Close any diagrams that you have open and start a new diagram diagram2.jpg , hit

DataMap

and open the nonlin.dat file from the examples directory. Highlight the X1, X2 and Y variables and click

New

. Then draw a covariance path between X1 and X2 and causal paths from these variables to Y. Then add a dummy latent variable M by drawing a circle and draw paths from X1 to it and from it to Y. Select the path from the dummy variable to Y and then hit

DataMap

again. Highlight the remaining unmapped variable, X2d and click

Map

. This variable has now been mapped to the path from M to Y. The path should be the mapped variable color (blue by default) and there should be a diamond surrounding the path label to indicate that it is mapped to a variable. The total effect from X1 to Y now contains both the linear and the interaction terms. Finally add means to the model; for raw data we must always have a model for the means. In the end your figure should look something like (topologically equivalent to) Figure 2.11.

mod.jpg

Figure 2.11       Linear moderated regression with interaction model. Variables X1 and X1d are identical in the dataset. Each individual has a different model because they have different values of X1d.


Run

 the model and be patient; fitting models of this type is computationally intensive. One special thing to note about the printed output and the results on the diagram is that the value on the X2d path is that of the last case in the file. The results should closely approximate the values used for simulation, namely b1=.5; b2=.4; b3=.3; and e=.36 More interesting models would involve moderation of the effects of latent variables, and they may be specified in exactly the same way.




3Outline of Mx Scripts and Data Input


What you will find in this chapter

 

           General rules for job structure and syntax

           Details on how to read data and select variables for analysis


3.1Preparing Input Scripts


Comments, Commands and Numeric Input


Input files should be prepared with the text editor of your choice. If you use a wordprocessor (such as Word Perfect or MS Word) the input file should be saved in DOS text (ASCII) format.


You may put comments anywhere in your input file using the character '!'.

The Mx command processor ignores:

     All characters following ! on any line

     Blank lines

     Anything after column 1200


Lines in Mx scripts may be up to 1200 characters long on most systems.


The processor is also entirely insensitive to case, except for filenames under Unix. Essentially, Mx reads two things: keywords and numbers. Unless explicitly stated otherwise, the first two letters of a keyword are sufficient to identify it. Keywords are separated by one or more blank spaces. Once the program has identified a keyword you can extend it to anything you like as long as it doesn't have a blank character in it, so Data and Data_silly_words_ have the same effect. However, we strongly recommend use of full keywords to facilitate comprehension of the script by human beings.


Quite often, a keyword has the format KEY=123 where 123 is a numeric value to be input. This is called a parameter. Mx ignores all (including blanks) non-numeric characters found between recognition of a parameter and reading a number, so that NI=100 and NInput_vars a lot of words 100 have the same effect.

Note: The exception to this rule is when it encounters a #define’d variable, which it will accept instead of a number.


Syntax Conventions


The syntax described for commands follows these conventions:

     alternatives are represented by /

     optional parameters or keywords are enclosed by { and }

     items to be substituted according to the specific application are enclosed by < and >


Job Structure


Mx has been written for multiple groups, since genetically informative data generally comprise information on different types of relatives which form distinct groups. At the beginning of an Mx script, you have to say how many groups there are with an #NGroups statement. You can also define variables here. A group begins with a title line that contains from 1 to 1200 characters for reference. The second line is the group type line, and the group ends with an End line. What happens in between varies according to what type of group it is. Currently there are 3 types:

 

     DATA - containing data to be analyzed

     CALCULATION - allowing matrix operations for output or to simplify structure

     CONSTRAINT - for non-linear equality and inequality constraints between parameters


Any number of each type of group can be specified, in any order. Unless one of the keywords Constraint or Calculation appears on the data line, Mx expects to read a Data group. Effectively, there are 3 things to do:

 

     Supply the data

     Describe the model

     Request options


To do this, the input script will consist of groups, each having the following structure:

 

1.   Title

2.   Indicate group type: data/calculation/constraint

3.   Read and select any observed data, supply labels

4.   Matrices Declaration: declare at least one matrix

5.   Specify numbers and parameters, starting values, equality and boundary constraints

6.   Matrix Algebra or Model Statement: use matrix formulae for algebra/compute or covariance/means/thresholds/weights/frequencies

7.   Request fit functions, statistical output and optimization options, multiple fit mode, save matrices and job specification

8.   End command

where steps 5 and 7 are optional.

Steps 1-3 supply data and are described in Section 3.1-3.5, steps 4-6 define the model (Section 4.1-4.6), and steps 7-8 requests output (Section 5.1-5.4). Constraint and calculation groups do not read any data, so they omit step 3.


Single Group Example


For example, an input file may look like this:


#NGroups 1

Simple MX example file

 Data NObservations=150 NInput_variables=2

 CMatrix 1.2 .8 1.3

 Begin Matrices;

  A Full 2 1

  D Diag 2 2

 End Matrices;

 Specification A

  1 2

 Specification D

  0 3

 Start .5 all

 Covariance_model A*A' + D ;

 Options RSiduals

End


This would fit, by maximum likelihood (the default) a factor model to a covariance matrix calculated from 150 observations of two variables. The model is shown as a path diagram in Figure 3.1. Details of this example will be found in the following sections.


ole3.gif

Figure 3.1         Factor model for two variables. Free parameters are indicated by x, y and z. Causal paths are shown as single headed arrows and correlational paths are shown as double-headed arrows.


3.2Group Types


Every group has to begin with a Title line and a Group-type command. In a data group, these statements may be followed by reading of data. These commands are described in this section. Note that a new job requires a line indicating the number of groups.


#NGroups


Syntax:

#Ngroups n

where n defines the number of groups


Title Line


The title line is purely for the user's reference, it is printed when Mx prints the parameter specifications and the parameter estimates for a group. It is most useful when there are multiple groups. The title line is recognized by its location (the beginning of a group) rather than by a keyword at the start of a line.


Group-type Line


Syntax:

Data/Calculation/Constraint {NInput_vars=n Nobservations=n}

where Calculation defines a calculation group and Constraint a constraint group, the default being a Data group


Every group must have a data line. It has a number of parameters to indicate

 

i.    what kind of group is being input,

ii.   various characteristics (the number of input variables NInput_vars and the number of observations NObservations) of the data to be analyzed, if any


The parameters may be specified in any order, and are summarized in Table 3.2. Note that Data groups must have NInput_vars and NObservations keywords. Constraint groups only require NInput_vars, and Calculation groups need no parameters.


Table 3.2          Parameters of the group-type line in Mx input files.


Parameter

Function

Required for group(s)

Data

Calculation

Constraint

NInput_vars

NObservations

NModel

Specifies a data group

Specifies a calculation group

Specifies a constraint group

Number of input variables

Number of observations

Number of models

Data

Calculation

Constraint

Data, constraint

Data

Weighted likelihood*

* required for fitting mixture models only, see section 4.3 on page 74


3.3Commands for Reading Data


Covariance and Correlation Matrices


Syntax:

CMatrix/KMatrix/PMatrix {Full} {File=filename}


In a data group, a covariance matrix may be read using the keyword CMatrix. By default, CMatrix expects to read the lower triangle of an NInput_vars x NInput_vars matrix, from the input file. If the keyword Full appears, then a full matrix will be read. The matrix is read in free format, that is, the numbers are expected to be separated by one or more blank spaces or carriage returns. If the keyword File appears, then Mx will read the data from a file. This latter method is generally to be preferred, since it keeps the data in one place. If the data are changed, it is not necessary to change every script that uses these data.


A FORTRAN format [in parentheses, e.g., (6F10.5)] for reading data must be the first line of a data file. If the first line just has * or (*) on it, the data are read in free format, i.e. numbers are separated by one or more spaces or new line characters.


Correlation matrices (KMatrix) and matrices of polychoric or polyserial correlations (PMatrix) are read in the same way as covariance matrices (CMatrix). Although the diagonal elements of these matrices are all 1.0, and could in principle be omitted, they are needed for Mx to read the file correctly. See page 126 for an example of special methods required for maximum likelihood analysis of correlation matrices.


Asymptotic Variances and Covariances


Syntax:

ACov/AVar/AInv {File=filename}


In order to use asymptotic weighted least squares or diagonally weighted least squares ( see p. 83) it is necessary to read a weight matrix. For compatibility with PRELIS (Jöreskog & Sörbom, 1986; 1993), Mx expects to receive a weight matrix multiplied by the number of observations. If the File= option is used, a PRELIS output file (created with the SA=filename or the SV=filename PRELIS commands) may be read. By default, Mx expects to receive an asymptotic weight matrix (ACov) whose size depends on (i) NInput_vars and (ii) whether a correlation matrix or covariance matrix has been input. If NInput_vars=k, then if CMatrix has been input, the number of rows in ACov is


or if PMatrix or KMatrix have been input, the number of rows in ACov is


The weight matrices can thus be very large - of order


If you use PRELIS, please be sure to use PRELIS 2 or LISREL 8.5 instead of PRELIS 1. Later versions of PRELIS output the file in binary format, which must be changed with the bintoasc.exe or bintoggl.exe utility supplied with PRELIS.


An ACov line makes AWLS the default method of estimation for that group. If AWLS is requested on the Options line in a group without an ACov, and error will result. Similarly, DWLS is default if AVar is read.

Note that inverting the asymptotic covariance matrix can take an appreciable amount of time for large problems. Two facilities are available to combat this problem. First, the inverse of the matrix can be read instead. A simple Mx job could be used to invert and save the inverse, for example:


#NGroups 1

Commands to invert a 325x325 asymptotic weight matrix

 Calculation

 Begin Matrices;

  P Symm 325 325

 Compute P~ ;

 Matrix P File=weight.asy

 Option MX%E=weight.inv

End


The inverse of the asymptotic matrix (AInv), saved in the file weight.inv could be used in place of the matrix itself, with a command of the form: AInv Full File=weight.inv . The Full keyword is essential here because Mx is agnostic about the symmetry of square matrices created in calculation groups. It is safer to assume that it is not symmetric to maintain consistency across applications. The second, alternative approach is to use the binary save feature described on page 104, which saves the whole job specifications.


A common error in reading data with CMatrix or ACov commands is to read them as full matrices when they are stored as symmetric, or vice versa. Mx attempts to be a bit smarter about this process. If a user forgets to put the Full keyword on the CMatrix line, but Mx detects an Mx-style data file that was saved in full format, it will read it as full instead.


Variable Length, Rectangular and Ordinal Files


Syntax:

VLength/Rectangular/Ordinal {File=filename} {Highest <numlist>}


Mx will read two types of raw data for multivariate normal maximum likelihood analysis.

Rectangular reads regular data, i.e. where every observation has the same number of input variables (NInput_vars on the Data line). Missing values may be specified with a . (dot) or another code (see Missing command on page 48). This is appropriate if there are relatively few missing data, or if missing data have been imputed.


VLength is a variable length record reader, which allows reading of raw data where there may be many missing values. The default (and mandatory) format for these data is free. A line with comments or * can be placed at the start of the file, but it will be ignored by Mx except for printing a warning and the line itself in the output file. The structure of a VLength file is:

     number of input variables (k)

     identification codes for the k variables

     observed data for the k variables.


For every case, the number of input variables must be on a line by itself. The identification codes must be integers that correspond to codes read by the ICodes command (see page 81). For example, a file might contain the following:

3

1 2 3 .33 .62 .95

2

2 3 1.4 -2.2

1

2 .37


This example reads 3 variables for the first observation, with identification codes 1 2 3, and data values .33 .62 and .95. The second observation has no data for variable 1, but supplies data for 2 and 3, while the third supplies data for variable 2 alone. By default, data of this type are fitted using the raw maximum likelihood fit function (see page 86).


It is quite simple to prepare VLength files with SAS or SPSS. However, caution should be exercised with SAS which uses a . for a missing value. Depending on the operating system under which you are running Mx, this dot may produce a file read error or be read as a zero. Here are a few lines of SAS code to output a VLength file from an array of two variables V{2}, either or both of which may be missing. The third and fourth lines need to be modified to declare the length of the array and to copy the required variables to the array into it. Certain applications may also need to change the format of the PUT statement that writes the data values.


DATA ONE; SET ZERO;

count=0; nvar=2; /* Number of variables in total !!Change!! */

array V{2} AT1 AT2; /* Set up array for variables !!Change!! */

do I=1 to nvar; /* Count the non-missing observations */

if V{I} ne . then do; count+1; end; end;

FILE MXVLFILE; /* Filename for future Mx input !!Change!! */

if count ne 0 then do; /* Write observations if there are any */

put count;

do I=1 to nvar;

if V{I} ne . then put I @@; end; put; /* Write the identifiers */

do I=1 to nvar;

if V{I} ne . then put V{I} 13.6 +1 @@; end; put; /* Write the data values */

end;

Note: format statements are not valid for either rectangular or VL files.


Similar to the rectangular command to read raw continuous data, the Ordinal file statement reads in ordinal data from a rectangular file. By default, a . (dot) character separated by spaces is recognized as a missing value, and this default may be changed by inserting a Missing command before the Ordinal statement. Ordinal data must be specified by integer categories, with the lowest category zero. The highest category in the ordinal data is automatically detected by Mx.


Contingency Tables


Syntax:

CTable <r> <c> {File=filename}


Mx will read contingency tables of order r by c. NInput_vars must be 2 for a group reading a contingency table. Both r and c must be greater than 1 but they do not have to be equal. A contingency table contains frequency data (or counts) such that each cell Cij indicates the number of observations falling in row category i and column category j. Normally, the frequencies supplied should be greater than or equal to zero.

If frequency data are read directly into the script, they need to start on a new line, following the CTable <r> <c> line.


Mx automatically handles incomplete ascertainment which the user can flag by supplying a negative number for cells that have not been ascertained (see example on p. 90). Instead of modeling means, the placement of thresholds on the underlying liability distribution is specified with the threshold statement, as shown on page 73.


The ordering of the categories should follow the natural numbering of the rows and columns, so that a table with a strong positive correlation between the variables would have large frequencies on the leading diagonal. Supplying a CTable changes the default fit function to the likelihood of observing the frequencies assuming a bivariate normal distribution of liability underlies the observed presence in a cell. See page 89 for details on fitting structural equation models to contingency table data.


Means


Syntax:

Means {File=filename}


A vector of means, length NInput_vars may be read. When fitting models by maximum likelihood, a matrix formula for the predicted means may be provided. The joint likelihood of the means and the covariances is maximized, enabling tests of hypotheses about equality of means across variables or across groups.


Higher Moment Matrices


Syntax:

Skewness/Kurtosis {File=filename}


Matrices of skewness and kurtosis may be read with these commands. These are provided for future developments in Mx that will allow model fitting to these types of data in addition to means and covariances. Currently there is no facility to use matrices read in this way. However, model fitting with higher moments could be done with user-defined fit functions (see page 90).


3.4Label and Select Variables


Labeling Input Variables


Syntax:

Labels <list of labels>


Labels may be given for the observed data by issuing a Label command, before the Begin Matrices; command. These labels may be used to select variables, for example:


#NGroups 1

Data NInput_vars=3 NObservations=171

CMatrix File=Cov.mat

Labels ALC1 ALC2 AGE

Select ALC2 ALC1 ;


would read the lower triangle of a 3×3 covariance matrix from the file Cov.mat, and label the variables ALC1 ALC2 and AGE. The variables ALC2 and ALC1 are then selected for analysis, changing their original order. See also page 80 for details on labeling specified matrices.


Select Variables


Syntax:

Select <numlist or varlist> ;


Variables may be selected for analysis using the Select command. The command may be used to reorder data or to pick a reduced number of variables for analysis. In either case, a ; or / must end the command. Select accepts integers which correspond to the order of the input variable. More conveniently, Select will operate on variable labels (see page 45). The command will work with raw data as supplied by the Rawdata or VLength commands (see pages 86 and 42).


Select If


Syntax:

Select If <label><space>{< = > ^< ^= ^>} value ;

where ^ denotes not.


Select If may be used in conjunction with raw data (VL or Rectangular) to select a subset of the data for analysis. This feature is useful to eliminate outliers form a raw dataset, if a case number or id variable has been included. Note that a space is necessary between the label and the operator. For example,


Rectangular File=mydata.rec

Labels casenum BMI skinfold1 skinfold2;

Select If casenum ^= 253;

Select BMI skinfold1;

might be used to eliminate all cases where casenumber is 253.


Select with Variable Length Data

In combination with the VL or rectangular data, select changes the identification codes to consecutive integers starting at 1. For example, if the following Select line was read:

Select 3 4 2 ;

a VLength record of the form:

4

1 2 3 4 .1 .2 .3 .4

would be changed to:

3

1 2 3 .3 .4 .2

thus the observation originally numbered 3 has become observation 1, observation numbered 4 has become observation 2, and observation numbered 2 has become observation 3. Select will automatically reduce the number of data vectors if there are no matches for a particular data vector and the codes in the Select line. The final number of vectors and observations used in the analysis is given in the output file.


Select cannot contain more numbers than the NInput_vars specified on the Data line. To do so would necessarily result in a singular correlation or covariance matrix. Likewise, the same variable cannot be selected twice.


3.5Calculation and Constraint Groups


The use of calculation and constraint groups is very similar the use of groups that read data. All three types of group are fully command compatible with the exception of commands for reading data, which can be used by data groups alone.


Calculation Groups


The keyword Calculation on the Group-type line indicates that the group is used for calculation. The calculated matrix formula from such a group is printed if the RSiduals keyword appears on the Options line. There are no restrictions on the type and dimensions of a matrix than can be produced with this command (other than memory limits). The result of the calculation may be used in later groups by using the =%En syntax when specifying a matrix, where n is the number of the calculation group. Note that there is a strict ordering within the input file; results cannot be taken from a calculation that has not yet occurred.

The calculation group provides a facility for printing out results of matrix operations. Any calculation group that is not followed by a constraint or data group is not calculated until the end of optimization, thus avoiding unnecessary waste of computer time.


Constraint Groups


Constraint groups may be used to impose nonlinear equality or inequality constraints among the parameters. Three special operators may be used to impose constraints between matrices. For example, suppose we wish to impose the constraint that x2+y2 =1 where x has parameter specification 1 and y has parameter specification 2. A constraint group to accomplish this might be:


Constrain parameters to ensure that x*x+y*y=1

 Constraint

 Begin Matrices;

  A Full 2 1

  I Iden 1 1

 End Matrices;

  Specify A 1 2 ! put parameters 1 and 2 in to A

 Constraint Aʹ*A=I; ! inner product works out x*x+y*y

End Group;


If we wanted to impose the inequality constraint that x2+y2 >1 instead, then we would use the > symbol in the Constraint statement. Likewise, we could use < to specify a less than inequality. Only one <, > or = symbol may be used in a constraint statement. To specify range constraints such as .5< x2+y2 <1 it is possible to specify both constraints within the same constraint statement by concatenating them as two inequality constraints:


Constrain parameters to ensure that .5 < x*x+y*y <1

 Constraint

 Begin Matrices;

  A Full 2 1

  I Iden 1 1

  H Full 1 1

 End Matrices;

  Matrix H .5

  Specify A 1 2 ! put parameters 1 and 2 into A

 Constraint (A’*A_

                H) < (I_

                      A’*A); ! inner product works out x*x+y*y

End Group;


Note that the constraints are made element by element. Using Option RSiduals we can see the results of imposing equality or inequality constraints.


Whenever Mx encounters a constraint group, it increases the number of degrees of freedom by the number of nonlinear constraints. This increase in the number of statistics is based on the assumption that each constraint identifies a parameter, which may not always be correct. The DF parameter on the Options line (see page 93) may be used to correct for failures of this assumption.


NPSOL, the optimization routine, treats constraints in an intelligent fashion; if it finds the derivatives of the constraint functions with respect to certain parameters to be zero, it does not calculate them during optimization. This means that if some of the specified constraint functions are always zero, little additional computational cost is incurred.

Care is needed to make sure that the constraints can be satisfied. If there is no feasible point for the constraints - for example, one of them always takes the value .5 - an IFAIL=3 error message is returned. One way to avoid such errors is to start optimization at a place where the constraints are satisfied.


3.6Commands for Declaring Variable Options


Missing Command


Syntax:

Missing=<code>


The missing command may be used to supply a character string other than . (dot) to be used for missing values, e.g. Missing=N/A. Note that Mx responds to the exact character string, and not the numerical value of that string. For example, if Missing=-1.0 has been specified, then neither -1 nor -1.00 would be recognized as missing.


Highest Command


Syntax:

Highest=<number list>


Although the highest category in the ordinal data is automatically detected by Mx, in some cases, especially multigroup analyses, it is necessary to override this default with a user specified value. The largest value in the data file must not exceed the corresponding value in the highest statement. This command expects a number for every variable in the analysis and thus should follow the Label and Select statement if any.


Definition Variables


Syntax:

Label {element list}

Definition_variable <label>

...

Specification <matrix name> {element list possible including label}


This feature allows ‘multilevel’ statistical analyses with VL or rectangular data files. Essentially, some variables may be assigned as definition variables which can then be used in constructing the model. Definition variables are automatically #define’d so that their names can be used in Specify statements. A matrix containing a definition variable changes for every case in the raw data file. See page 139 for an example that allows continuous moderators - effectively as many groups as there are cases in the data file. Labels should be provided for all variables before using the definition statement.


3.7Advanced Commands for Script Writing


#Define Command


Syntax:

#define <name> <number>

#define <$name> <string>


Number Substitution


Various commands and keywords used in Mx scripts search for a number. During this search, if Mx encounters a letter it will read the word and check the dictionary for matching #define’d words. If the word is found, the appropriate number is substituted. If it hasn't, a warning will be printed and the search for a number or a #define’d variable will continue. Care is needed with spelling!


In multivariate modeling it is quite common that the same matrix dimensions are used in many different parts of a script. For example, in an oblique factor analysis, with 10 observed variables and 2 factors, the dimensions of the matrices needed to define the model are dictated by these numbers. If matrix L contains the loadings, P the correlations between the loadings, and matrix E the residuals, we would require L to be of order 10×2, P to be 2×2 and E to be of order 10×10. We might specify this in Mx with a script of the form


#NGroups 1

Title - factor analysis

 Data NInput=10 NObservations=100

 CMatrix File=mydata.cov

 Begin Matrices;

  A Full 10 2 Free

  P Stan 2 2 Free

  E Diag 10 10 Free

 Covariance A*P*Aʹ + E ;

 Start .5 all

End


However, this script could be made more general with a couple of #define statements:


#NGroups 1

#define factors 2

#define vars 10

Title - factor analysis

 Data NInput=vars NObservations=100

 CMatrix File=mydata.cov

 Begin Matrices;

  A Full vars factors Free

  P Stan factors factors Free

  E Diag vars vars Free

 Covariance A*P*Aʹ + E ;

 Start .5 all

End


Gain is small in this simple model - we change two numbers to change the number of factors and number of observed variables, instead of seven. With more complex models, the use of #define can make scripts much simpler and more versatile.


String Substitution


If the word following the #define command begins with a $, the rest of the line (or up to a comment character ‘!') is taken to be the value of the #define’d variable. This type of substitution is especially useful because it literally changes the input line. For example, if the command


#Define $var BMI


is followed by the command


Select $var -T1 $var -T2 ;


then the line will become


Select BMI-T1 BMI-T2 ;


Note how the substitution has omitted the space character following $var in the input line. If a space character is required following a string variable, two spaces should be used in the input. To append the contents of a string variable to a command, it is simply a matter of entering the string variable name at the relevant position, for example, if $var is #define’d as 4 the command:


Rectangular file=myfile$var .rec


will become


Rectangular file=myfile4.rec


Automatic #define


Two commands automatically #define variables. First, if the #repeat command (see p. 145) is used, two variables are automatically defined as the number of the current repeat. Repeat_number is #defined as a numeric value, and $Repeat_number is a character string of the repeat number in question. These features facilitate the use of the repeat number in scripts, for example to read in different input files or to change the number of factors in a model.


Second, if the Definition command (see p. 139) is used in raw data analysis, any definition variables are automatically #define’d as ‘-1’, ‘-2’ etc. (corresponding to their position in the Definition command line) to simplify specification of matrices with definition variables. Therefore, syntax of the form:


Definition age sex ;


followed later in the script by a matrix specification command:


Specify C age sex


would appropriately specify C as having ‘parameters’ -1 and -2 which correspond to the definition variables Age and Sex.


#If, #Elseif, #Else and #Endif Commands


Syntax:

#if <condition>

#elseif <condition>

#else

#endif


Conditional compilation of parts of Mx scripts is enabled through the #if, #elseif, #else and #endif commands. The <condition> part of the command uses variables that have been #define’d as either strings (e.g. #define $model Onefac) or as numeric values (such as #define nvar 3). Tests of numeric conditions may be =, >, or <, which may be optionally preceded by ^ to indicate not equal, not greater than (which is equivalent to less than or equal to). For example, the following code might be used to declare matrices differently according to the type of model required:


#if $model = orthogonal

S identity nfac nfac

#elseif $model = oblique

S symmetric nfac nfac

#else

Oops! Error: $model must be #defined as either orthogonal or oblique

#endif


Note the the #if command needs to be accompanied by an #endif command and that the condition operators have a space before and after them. Commands of this type make it possible to write Mx script ‘templates’ which contain code normally created by the more advanced user and which does not change from one use to the next, along with a ‘header’ file which the less advanced user can readily edit using the Mx GUI MxProject|Header Edit menu system. An example script pair of this sort is described on page 149.


#Repeat Command


Syntax:

#repeat <number>

#endrepeat


The #repeat command is normally used to read and execute the same script segment several times. Although doing so might seem futile, it is possible that the script contains elements that change each time the program is run. One example would be where a System command is executed in a script - perhaps to simulate data with an external program which changes the input data for the Mx script. A second example is where the automatically #define’d variables $repeat_number and repeat_number, are used to make the script to read different data files on successive runs. Third, the $repeat_number variable might be used in combination with a conditional statements (see above) e.g.,


#if repeat_number = 1

! Lines of Mx script to be used the first iteration go here

#elseif repeat_number = 2

! Lines of Mx script to be used the second iteration go here

#else

Lines of Mx script to be used for iterations 3 onwards go here

#endif


Note that the #repeat command needs to be accompanied by an #endrepeat command in the same file and not in a #include file (see below). Also note that the #define, #if and #repeat commands can be used anywhere in a script. An example script using the #if and #repeat commands is described on page 145.


#Include Command


Syntax:

#include <filename>


The #include command reads lines from an external file directly into an Mx script. This feature can be useful when the same code or data is used in several scripts, and in combination with the #repeat command.


System Command


Syntax:

System <commands to be executed>


The System command allows the Mx script to execute external programs by calling the system. Under Unix, the external programs will be run with the user's default shell. This command can be useful to manipulate data between stacked problems, e.g., reformatting data output by the first job in a file so that it can be read by the second job in that file. Another use would be to have an external program that simulates data, and to call the system to simulate data prior to running an Mx script that uses these data. In conjunction with the #repeat command, multiple simulations could be run. For example,


#repeat 200

System runsim

Title Mx script to fit model to simulated data

! rest of job goes here

End

#endrepeat


would run an external program called runsim (under Windows this could be a batch file, or under Unix it might be a shell program) and then run the Mx script, and repeat this exercise 200 times.


Matrices Declaration


Syntax:

Begin Matrices; or Matrices= {Group <n>}

<matrix name> <type> <r> <c> {Free/ Unique}

....

End Matrices;


Matrices must be declared after reading any data for the group, and before assigning values or parameters to matrix elements. All declared matrices initially have zero for each ‘modifiable’ element. By default, all matrix elements are fixed. If the keyword Free appears, each modifiable element has a free parameter specified, starting at the highest parameter number yet specified below 10,000. If the keyword Unique is present parameters are numbered from 10,000 onwards. Unique helps to keep parameters from accidentally being constrained with subsequent specify statements. See page 55 for more details on declaring matrices.


Matrix Algebra


Syntax:

Begin Algebra;

<matrix name> = {funct} <matrix name> {operator <matrix name> };

...

End Algebra;


Algebra sections provide a simple way to evaluate matrix algebra expressions, as shown in Appendix C.


In many cases breaking up a complicated matrix algebra expression into smaller parts can improve readability, efficiency or both. For example, the matrix formula (I-A)~*S*(I-A)~ʹ will find the inverse of twice. When matrix A is small, the loss of efficiency will be negligible - the extra time taken to re-program will be greater than any gained in execution time. For large A, the component (I-A)-1 can be computed as an intermediate step so that the cpu-intensive matrix inversion is only carried out once and we have a compact and readable script. Algebra may be thought of as a special form of matrix declaration. Each matrix that appears on the left hand side of the = sign is newly defined in this group (it must not have been previously defined). Note that matrix B, defined in the first line of algebra, may be used in subsequent lines. Matrices computed in an algebra section can be referred to in a later group using the computed keyword (see p 57 below) instead of specifying its type, rows and columns.


 Begin Matrices;

  A Full 10 10

  S Symm 10 10

  I Iden 10 10

 End Matrices;

 Begin Algebra;

  B = (I-A)~ ;

  C = B*S*B' ;

 End Algebra;




4Building Models with Matrices


What you will find in this chapter

 

     How to declare matrices and label them

     The structure of the different types of matrix

     What the matrix operators and functions do

     When and where to use matrix formulae

     The role of different types of group


All groups, be they constraint, calculation, or data, require at least one matrix in order to do anything. The next few sections describe the types of matrix that may be used, the operators that act on and between them, and ways of putting parameters and numbers into them.



4.1Commands for Declaring Matrices


Matrices Command


Syntax:

Begin Matrices {= Group <n>};

<matrix name> <type> <rows> <columns> {= <name> <group> / Free, Unique}

....

<matrix name> <type> <rows> <columns> {= <name> <group> / Free, Unique}

End Matrices;

where n is a previous group number


A group must have the 3-letter MAT command, followed by at least one matrix definition. As used throughout this manual, we recommend using non-abbreviated commands, such as Begin Matrices;.


Matrix names are restricted to one letter, from A to Z. The same letter may be used for different matrices in different groups. If a matrix is declared twice, a warning is printed and only the second declaration is kept.


Note that matrix definitions are group specific; for example, matrix A in group 1 does not have to be the same type or size as matrix A in group 2.


If the keyword = Group n follows the Begin Matrices command, all matrices in that earlier group n are automatically declared in the present group.





Matrix Types


The type of a matrix may be one of the 12 forms described in Table 4.1, and its row and column dimensions are specified with integers. Once the type and size of a matrix has been defined, it cannot be changed.


Table 4.1          Matrix types that may be specified in Mx.


Type

Structure

Shape

Number of Free Elements

Zero

Unit

Iden

IZero

ZIden

Diag

SDiag

Stand

Symm

Lower

Full

Computed

Every element is zero (null matrix)

Every element is one (unit matrix)

Identity matrix

Identity|Zero partitioned matrix

Zero|Identity partitioned matrix

Diagonal matrix

Subdiagonal (zeros on & above diagonal)

Standardized (symmetric, ones on diagonal)

Symmetric

Lower triangular

Full

Equated to formula in previous group

Any

Any

Square

Any

Any

Square

Square

Square

Square

Square

Any

Any

0

0

0

0

0

r

r(r-1)/2

r(r-1)/2

r(r+1)/2

r(r+1)/2

r×c

0

Note: number of free elements indicates the number of elements that can be altered by the user, where r is the number of rows and c the number of columns of the matrix.


Equating Matrices across Groups


Syntax:

<matrix name> <type> <r> <c> = <matrix name> <group number>

or

<matrix name> <type> <r> <c> = <special quantity> <group number>


Optionally, a matrix may be constrained to equal a matrix previously specified. For example, we could use the command

A Symm 3 3 = Y2

to equate matrix A in this group to matrix Y in group 2. In this example the current group must be number 3 or greater.


Several additional options allow constraints to other quantities found in previous groups, such as the observed or expected covariance matrix. For example, the command

B Full 2 2 = %E1

equates matrix B in this group to the expected matrix of group 1.


The special codes for constraining a matrix to equal those defined or computed in previous groups are shown in Table 4.2. These add to the flexibility of Mx.


Table 4.2          Syntax for constraining matrices to special quantities in previous groups.


Symbol

Matrix Quantity

Dimensions

%On

%En

%Mn

%Pn

%Fn

Observed covariance (data) matrix

Expected covariance matrix

Expected mean vector

Expected proportions under bivariate normal

Function value

NIn×NIn

NIn×NIn

1×NIn

NRn×NCn

1×1

Note: NIn is the number of input variables in group n following any selection; NR and NC are respectively the number of rows and columns in a contingency table, and may be requested only if group n has such a table.


It is especially important to note that none of the %E, %O, %M, %F and %P equalities may refer to groups that appear after the current group. When matrices are constrained to be equal in this fashion, the type and row × column dimensions of the earlier matrix are retained. If the two specifications do not agree, a warning is printed. Both the number of rows and the number of columns must be supplied for square matrices, but only the first is used to define the size of the matrix.


Equating Matrices to Computed Matrices


Syntax:

<matrix name> computed {<r> <c>} = <matrix name> <group number>


When matrices are declared with the Begin Matrices; command, a special type, computed, may be used to equate to a matrix which was defined within the algebra section of a previous group. Row and column dimensions are set to those of the previously calculated matrix, and may be omitted when declaring a matrix as computed.


Equating All Matrices across Groups


Syntax:

Begin Matrices = Group <number>;


The usual equating of matrices across groups is supplemented by a global facility. All the matrices defined in an earlier group are made available to the current group. This includes both matrices that are explicitly declared and those that are created in a Begin Algebra; ...End Algebra; section.


Free Keyword


All changeable elements of matrices are initialized at zero and are fixed parameters, unless the Free keyword is used, in which case each changeable element is specified as a different free parameter. Examples of the results of using the keyword Free are shown in Table 4.3.



Table 4.3          Examples of use of the Matrices command to specify the dimensions of different matrix types. The keyword Free following each command makes each modifiable element in the matrix a separate free parameter, numbered in order as shown in the second column. In the third column, values of elements are shown, with ? representing a free parameter.


Example command

Specification Matrix

Values

A Zero 2 3 Free

0 0 0

0 0 0

0 0 0

0 0 0

B Unit 2 3 Free

0 0 0

0 0 0

1 1 1

1 1 1

C Iden 3 3 Free

0 0 0

0 0 0

0 0 0

1 0 0

0 1 0

0 0 1

D Izero 2 5 Free

0 0 0 0 0

0 0 0 0 0

1 0 0 0 0

0 1 0 0 0

E Ziden 2 5 Free

0 0 0 0 0

0 0 0 0 0

0 0 0 1 0

0 0 0 0 1

F Diag 3 3 Free

1 0 0

0 2 0

0 0 3

? 0 0

0 ? 0

0 0 ?

G Sdiag 3 3 Free

0 0 0

1 0 0

2 3 0

0 0 0

? 0 0

? ? 0

H Stand 3 3 Free

0 1 2

1 0 3

2 3 0

1 ? ?

? 1 ?

? ? 1

I Symm 3 3 Free

1 2 4

2 3 5

4 5 6

? ? ?

? ? ?

? ? ?

J Lower 3 3 Free

1 0 0

2 3 0

4 5 6

? 0 0

? ? 0

? ? ?

K Full 2 4 Free

1 2 3 4

5 6 7 8

? ? ? ?

? ? ? ?


More detail on specifying parameters in matrices is given in Sections 4.4 to 4.5.




4.2Building Matrix Formulae


Readers unfamiliar with matrix algebra may benefit from reading Appendix C, where examples and exercises are given. Readers familiar with matrix algebra may wish to examine Tables 4.4 and 4.5 for the variety of available operators and functions, and use this section for reference.


Matrix Operations


In ordinary algebra, operators such as + -× and ÷ have an order of evaluation established by convention. Multiply and divide are done before addition and subtraction. Multiply and divide are done in left-to-right order if they appear consecutively, as are addition and subtraction. We could say then, that × and ÷ have priority 1, and + and - have priority 2. Default priorities can be changed with the use of brackets ( ) which specify that operations inside the brackets are done first. For example, a+b×c=a+bc whereas (a+b)×c= ac+bc.


A similar hierarchy has been established for the matrix operators in Mx, and it too may be revised by the use of brackets. Table 4.4 shows the matrix operators and their (default) order of evaluation. Matrix algebra is subject to certain rules of conformability - requirements about the size and shape of the matrices being multiplied etc. These rules are listed in the right hand column of table 4, where denotes rows in matrix A and columns in matrix B. The number or rows of a matrix () and the number of columns of a matrix () are known as its dimensions. Two matrices A and B where rA=rB and cA=cB are said to have the same dimensions.


Table 4.4          Matrix operators available in Mx, together with their priority for evaluation.

See also Table 4.5 for matrix functions.


Symbol

Name

Function

Example

Priority

Conformability

~

ʹ

Inverse

Transpose

Inversion

Transposition

A~

1

1

r=c

none

^

*

.

@

&

%

+

-

|

_

Power

Star

Dot

Kron

Quadratic

Eldiv

Plus

Minus

Bar

Under

Element powering

Multiplication

Dot product

Kronecker product

Quadratic product

Element division

Addition

Subtraction

Horizontal adhesion

Vertical adhesion

A^B

A*B

A.B

A@B

A&B

A%B

A+B

A-B

A|B

A_B

2

3

3

3

3

3

4

4

4

4

none

cA=rB

rA=rB and cA=cB

none

cA=rB=cB

rA=rB and cA=cB

rA=rB and cA=cB

rA=rB and cA=cB

rA=rB

cA=cB

A line has been drawn between the first two operators (Inverse & Transpose) and the rest because inverse and transpose are unary operators, that is, they operate on one matrix. The rest form a single new matrix from two matrices, and are thus binary operators. These operators are now described in detail.


Inverse ~

Only square matrices may be inverted, but they may be either symmetric or non-symmetric. The inverse of matrix A is usually written A-1 and implies that AA-1 = A-1 A = I where I is the identity matrix. To request an inverse with Mx, we use the symbol ~. If the inverse does not exist (possibly due to rounding errors), Mx will terminate with an error message. Some precautions can be taken to avoid this, such as supplying starting values that allow inversion, or putting boundary constraints on parameters to prevent their taking values that would lead to a singular matrix.


Transpose ʹ

Any matrix may be transposed. The transpose of A is written . The order of the matrix changes from r×c to c×r, as the rows become the columns and vice-versa.


Power ^

All the elements of a matrix may be raised to a power using the ^ symbol. Essentially, this operator works the same way as the Kronecker product (see below), but elements of the first matrix are raised to the power of those in the second matrix instead of multiplied by them. It is possible to use negative powers and non-integer exponents to indicate reciprocal functions and roots of elements, but it is not possible to raise a negative number to a non-integer power. For example, the cube of every element of a matrix would be obtained by A^B if B was a 1×1 matrix with 3 as its only element.


For example, the matrix power A^B is


Multiplication *

* or ‘Star’ is the ordinary form of matrix multiplication. The elements of A(m×n) and B(n×p) are combined to form the elements of matrix C(m×p) using the formula

. Matrices multiplied in this way must be conformable for multiplication. This means that the number of columns in the first matrix must equal the number of rows in the second matrix.


For example, the matrix product A*B


Dot product .

Dot is another type of matrix multiplication, which is done element by element. For two matrices to be multiplied in this way, they must have the same dimensions. Elements of the dot product are described by the formula Cij = Aij×Dij.


For example, the dot product A.D is



Kronecker product

The right Kronecker product of two matrices A ⊗ B is formed by multiplying each element of A by the matrix B. If A is of order (m×n) and B is of order (p×q), then the result will be of order mp×nq. There are no conformability criteria for this type of product. In Mx input files the symbol ⊗ is denoted with the symbol @.


For example, the Kronecker product A ⊗ B is



Quadratic product &

Many structural equation and other statistical models use quadratic products of the form ABA’, and the quadratic operator is both a simple and efficient way to implement quadratics. Note that E can be any shape, but to be conformable for quadratic product the matrix B must be square and have the same number of columns as the matrix E.


For example, the quadratic product E&B





Element division %

% does element by element division. For two matrices to be divided in this way, they must have the same dimensions. Elements of the result, C are described by the formula

Cij = Aij ÷ Dij. If any element of D is zero, the corresponding cell in the result matrix is set to 1035.


For example, the division A%D is



Addition +

Addition of matrices is performed element by element. For two matrices to be added, they must have the same dimensions. Elements of the sum, C are described by the formula

Cij = Aij + Dij.


For example, the sum A+D is



Subtraction -

Subtraction of matrices is performed element by element. For one matrix to be subtracted from another, they must have the same dimensions. Elements of the difference, C are described by the formula Cij = Aij - Dij.


For example, the difference A-D is


Note that in Mx there is also a unary minus operator, so that an expression such as -A is legal. This operation changes the sign of each element of A.




Horizontal Adhesion |

Bar allows partitioning of matrices. Its operation is called horizontal adhesion because A|D is formed by sticking D onto the right hand side of A. For two matrices to be adhered in this way, they have to have the same number of rows. If A (m×n) and D (m×p) are adhered, the result C is of order (m×(n+p)).


For example, the operation A|D is



Vertical Adhesion _

Underscore allows partitioning of matrices. Its operation is called vertical adhesion because A_D is formed by sticking D underneath A. For two matrices to be adhered in this way, they must have the same number of columns. If A (m×n) and D (p×n) are adhered, the result C is of order ((m+p)×n).


For example, the operation A_D is




Matrix Functions


A number of matrix functions, shown in Table 4.5, may be used in Mx. These are useful in specialized applications involving user-defined fitting-functions (see p. 90).




Table 4.5          Matrix functions available in Mx.

Restrictions are on rows r and columns c of input argument.


Keyword

Function

Restrictions

Result Dimensions

\tr( )

\det ( )

\sum( )

\prod( )

\max( )

\min( )

\abs( )

\cos( )

\cosh( )

\sin( )

\sinh( )

\tan( )

\tanh( )

\exp( )

\ln( )

\sqrt( )

\d2v( )

\v2d( )

\m2v( )

\vec( )

\vech( )

\stnd( )

\eval( )

\evec( )

\ival( )

\ivec( )

\mean()

\cov()

\pchi()

\pdfnor()

\mnor()

\momnor()

\allint()

\aorder()

\dorder()

\sortr()

\sortc()

\rprod()

\cprod()

\incrow()

\part()

Trace

Determinant

Sum

Product

Maximum

Minimum

Absolute value

Cosine

Hyperbolic cosine

Sin

Hyperbolic sin

Tan

Hyperbolic tan

Exponent (eA)

Natural logarithm

Square root

Diagonal to Vector

Vector to Diagonal

Matrix to Vector

Matrix to Vector*

Lower triangle to Vector

Standardize matrix

Real eigenvalues

Real eigenvectors

Imaginary eigenvalues

Imaginary eigenvectors

Mean of columns

Covariance of columns

Probability of chi-squared

Multivariate normal density

Multivariate normal integral

Moments of multivariate normal

All integrals of multinormal

Ascending sort order

Descending sort order

Row sort

Column sort

Row product

Column product

Increment row

Extract part of matrix

r=c

r=c

None

None

None

None

None

None

None

None

None

None

None

None

None

None

None

r=1 or c=1

None

None

None

r=c

r=c

r=c

r=c

r=c

None

None

r=1 and c=2

r=c+2

r=c+3

r×1


r×1

r×1

None

None

None

None

None

None

1×1

1×1

1×1

1×1

1×1

1×1

r×c

r×c

r×c

r×c

r×c

r×c

r×c

r×c

r×c

r×c

1×min(r,c)

max(r,c)×max(r,c)

rc×1

rc×1

rc×1

r×c

r×c

r×r

r×1

r×r

1×c

c×c

1×2

1×1

1×1

r×1


r×1

r×1

r×max(1,c-1)

max(1,r-1)×c

r×1

1×c

r×c

Variable

*vec: vectorizes by columns, in contrast to m2v, which vectorizes by rows.


\part (A,B) takes two arguments. The elements of the 1×4 matrix B are used to define a rectangle within matrix A to be extracted.


Functions, called with syntax of the form \func(argument) differ from operators because they take an argument enclosed by parentheses (). This argument may be a single matrix name, or a complex matrix formula. The argument is evaluated before the function is applied, consistent with the rules for using brackets. Functions form a second set of unary operators (see page 59). Descriptions of these functions follow.


Trace \tr( )

The trace of a matrix is the sum of the elements on the leading diagonal, i.e.

It is only allowed for square matrices.


Determinant \det( )

Properties of determinants, and ways of calculating them are discussed in Appendix C. This function is calculated for square matrices only.


Sum \sum( )

The sum of a matrix is the sum of all its elements, i.e.,


Product \prod( )

The product function of a matrix yields the product of all its elements, i.e.,


Maximum \max( )

The maximum function of a matrix yields a 1×1 matrix containing the maximum of all its elements.


Minimum \min( )

The minimum function of a matrix yields a 1×1 matrix containing the minimum of all its elements.


Absolute value \abs( )

The abs function replaces all matrix elements with their absolute value.


Trigonometric functions \cos( ), \sin( ) etc.

These functions replace all matrix elements with their appropriate trigonometric transformation, in radians.


Exponent \exp( )

Any matrix is a legal argument for this function which replaces each element Aij by eAij.


Natural Logarithm \ln( )

Any matrix is a legal argument for this function which replaces each element Aij by ln Aij. If an element is less than 1×10-30 then the result is ln (1×10 -31). Although error messages would be more normal in such a situation, this behavior can be helpful in optimization.


Square Root \sqrt( )

Any matrix is a legal argument for this function which replaces each element Aij by . If an element is less than zero, a fatal error occurs.


Diagonal to Vector \d2v( )

The leading diagonal of any matrix is placed into a row vector with min(RbC) columns, i.e. r or c, whichever is less. e.g.



Vector to Diagonal Matrix \v2d( )

A row or column vector is placed in the leading diagonal of a square matrix. e.g.



Matrix to Vector \m2v( )

A matrix is placed in a column vector, by rows. Thus


This is similar to the function \vec; which places the matrix into a vector by columns, instead of rows.

Matrix to Vector \vec( )

A matrix is placed in a column vector, by columns. Thus


Note that it is more efficient to use \m2v(A) than \vec(A') and more efficient to use \vec(A) than \m2v(A'). Both functions work for matrices of any shape.


Matrix to Vector \vech( )

All the elements on the diagonal and below are placed into a vector, by columns. Thus


Like its counterparts \vec and \m2v, this function will operate on matrices of any shape, terminating at the last row or column, whichever is the smaller. Thus



Standardize \stnd( )

This operation converts a covariance matrix into a correlation matrix. Replacement of elements is made according to the formula:


The diagonal elements of A have to be greater than zero, and A has to be square.




Real Eigenvalues \eval( )

The real parts of the eigenvalues of a square matrix are placed in a column vector, in ascending order of size, smallest first.


Real Eigenvectors \evec( )

The real parts of the eigenvectors of a square matrix are placed in a square matrix, where column j contains the eigenvector corresponding to eigenvalue j, with eigenvalues sorted in ascending order of size, smallest first (j=1).


Imaginary Eigenvalues \ival( )

The imaginary parts of the eigenvalues of a square matrix are placed in a column vector, in ascending order of size, smallest first.


Imaginary Eigenvectors \ivec( )

The imaginary parts of the eigenvectors of a square matrix are placed in a square matrix, where column j contains the eigenvector corresponding to eigenvalue j, with eigenvalues sorted in ascending order of size, smallest first (j=1).


Column Means \mean( )

This function computes the means of the columns of a matrix.


Column Covariances \cov( )

This function computes the covariance matrix of the columns of a matrix. Thus if data are presented as one line per subject, with r rows for each of the c variables, the output would be of order c×c.


Probability of Chi-square \pchi(χ2, ν)

Function \pchi computes the probability of a chi-squared with nu degrees of freedom. Its argument must be a 1x2 vector containing the chi-squared and degrees of freedom. It returns a 1x1 matrix. This can be useful when writing parameter estimates and fit statistics to a file.


Multivariate Normal Density \pdfnor(A)

The function \pdfnor computes the multivariate normal probability density function (pdf) given by the multivariate normal distribution. In the univariate case, this is the height of the normal curve. Matrix A, the argument of the function, is a nvar+2×nvar matrix, containing: (first row) a vector of observed scores xi; (second row) a vector of population means μi; and (rows 3 to nvar+2) the population covariance matrix Σ. The pdf is


Multivariate Normal Integration \mnor( )

The matrix function \mnor will compute multiple integrals of the multivariate normal, up to dimension 10. Its input is structured so that for n dimensional integration, the matrix has n columns and n+4 rows. The first n rows define the covariance matrix, row n+1 defines the mean vector, the last three are used to define the type of truncation experienced by each variable. This is best described with an example. The script:


#NGroups 1

Test multivariate normal integral function

 Calculation

 Begin Matrices;

  A full 1 2 ! Upper limits

  B full 1 2 ! Lower limits

  T Full 1 2 ! Type of integral

  R Stan 2 2 ! Covariance matrix

  M Full 1 2 ! Means

 End Matrices;

  Matrix R .3

  Matrix A 1 1 ! By default, Matrix B 0 0

  Matrix T 2 2

 Compute \mnor((R_M_A_B_T)) ;

 Option RSiduals

End


computes the integral of the bivariate normal distribution with correlation .3 from 0 to 1 in both dimensions. The type parameters (matrix T) are flags that indicate the type of truncation required:

     0 integral from -∞ to aj

     1 integral from bj to ∞

     2 integral from bj to aj

     3 integral from -∞ to ∞ (this dimension is ignored)

where aj and bj are the elements of column j of matrices A and B.


Accuracy is set to six decimals by default. Lower precision may be set with Option Eps=<value> though it should be noted that this option will be treated globally, i.e., for all such integrals in a particular run.


Moments of the Truncated Multinormal \momnor( )

The matrix function \momnor will compute moments of the truncated multinormal distribution. Currently, it will work only with 'tails' of the distribution, though selection may be absent for some variables. Here is a bivariate example:


#NGroups 1

Test moments of truncated normal function

 Calculation

 Begin Matrices;

  R Symm 2 2 !covariance matrix

  M Full 1 2 !means

  T Full 1 2 !thresholds

  S Full 1 2 !selection vector

  N Full 1 2 !# of abscissae

 End Matrices;

  Matrix R 1 .5 1

  Matrix T 1.282 1.282

  Matrix S 1 1

  Matrix N 16 16

 Compute \momnor((R_M_T_S_N)) ;

 Option RSiduals

End


This script requests the covariances and means of individuals selected above the threshold 1.282 in a N(0,1) bivariate normal distribution. It returns the covariance matrix in the first n rows, and the means in row n+1.

Note: this function can give incorrect results when the number of abscissae is small, or the thresholds are extreme (more than 3 standard deviations from the mean). CPU time will go up with the number of abscissae, which is partly user-configurable and will be one of 1, 2, 3, 4, 5, 6, 8, 12, 14, 16, 20, 24, 32, 48, 64, along with some smaller jumps below that). Mx automatically assigns the number of abscissae to: i) 16 if you enter 0 or less, ii) 64 if you enter 64 or more, and iii) the next lowest value if you happen to chose an intermediate value (e.g. it will pick 24 if you enter 30).


All Intervals of the Multivariate Normal Distribution \allint()

It is often necessary to compute the probabilities of all the cells of a multivariate normal that has been sliced by a varying number of thresholds in each dimension. These thresholds are more formally called hyperplanes. While it is possible to use the \mnor function to achieve this goal, it can be more efficient and more convenient to use the \allint function. The argument to the \allint function must be a matrix with as many columns as there are variables, and with as many rows as the number of columns plus 2 plus the maximum number of thresholds to be evaluated. The general form is \allint(R_M_N_T) where R is the m × m covariance matrix of m variables, M is the mean vector, N is a row vector whose elements ti specify the number of thresholds in dimension i, and T contains the thresholds and is of order (max(ti) × m).


\Allint returns the proportions in all the cells, cycling from lowest to highest with the last variable in R changing most rapidly. For example, the following script:


#NGroups 1

#define nvar 2 ! number of variables

#define maxthresh 3 ! maximum number of thresholds

Test of allint function

 Calculation

 Begin Matrices;

  R symm nvar nvar

  N full 1 nvar

  M full 1 nvar

  T full maxthresh nvar

 End Matrices;

  Matrix R 1 0 1 ! identity matrix here

  Matrix M 0 0 ! zero means

  Matrix N 2 3 ! first dimension has 2 thresholds, second has 3

  Matrix T

  -1.282 -2.323 ! thresholds are -1.282 and 0 for first dimension,

       0 0 ! and are -2.323, 0 and 1.282 for second dimension

      10 1.282 ! the number 10 is irrelevant here

  Compute \allint(R_M_N_T) ;

End Group


will return:


   MATRIX C

 This is a computed FULL matrix of order 1 by 12

  [=\ALLINT(R_M_N_T)]

          1 2 3 4 5 6 7 8 9

 1 0.0010 0.0490 0.0400 0.0100 0.0040 0.1960 0.1601 0.0400 0.0050

         10 11 12

 1 0.2450 0.2000 0.0500

containing the desired probabilities.


Ascending Order \aorder( )

This function gets the ascending order of a column vector. For example, \aorder(A) with



Descending order \dorder( )

This function gets the descending order of a column vector. For example, \dorder(A) with


Sort Rows \sortr( )

Used to sort a column vector or matrix by rows. If a vector, the vector elements themselves are sorted. If a matrix, the first column is taken to be the sort order - and must contain a permutation of the integers 1 to the number of rows, as might be extracted using, e.g., \aorder() above.


Sort Columns \sortc( )

This function works the same way as \sortr() but by columns.


Row Product \rprod( )

This function computes the product of the elements in a matrix, row-wise.


Column Product \cprod( )

This function works the same way as \rprod() but column-wise.


Increment Row \incrow( )

This function forces element i+1,j to be greater than element i,j by a constant amount that is user-configurable with Option Rinc= (default is .01). This rather unusual matrix function is useful for certain ordinal data threshold problems.


Extract Part \part(A,B)

This function extracts a rectangular sub-matrix of matrix A (formerly this was possible only by pre- & post-multiplying by elementary matrices). One has to be very careful to initialize matrix B before this statement is given, because the result dimensions are needed to check syntax. To pre-initialize B you would use the following job structure


#NGroups 1

Title

 Calculation

 Begin Matrices;

  A Symm 3 3

  B Full 4 1

 End Matrices;

 Matrix A

  1

  2 3

  4 5 6

 Matrix B 2 1 3 3

 Compute \part(A,B) ; ! <- Compute statement *after* matrix statement

 Option RSiduals

End


The format for matrix B is row, column, row, column so in this example the rectangle from 2,1 (row 2, column 1) to 3,3 will be extracted, giving

2 3 5

4 5 6

Note that the elements of B may define any two opposite corners of a submatrix of A. To some extent, the \part() function is binary, but we prefer to list it with the other matrix functions.


4.3Using Matrix Formulae


A matrix formula is a sequence of matrix names and matrix operators terminated by a semi-colon. For example

A*B + \m2v(C);


Covariances, Compute Command


Syntax:

Covariances/Compute <formula>;

where formula is a legal matrix algebra formula


The Covariance command uses the matrices specified following the Matrices command and special symbols to perform operations or functions on or between them. A Covariance statement may contain a single matrix and no operations, or it could be very complex. The command may extend over several lines and must end in a ; or /. Compute is the recommended keyword for calculation groups, to make reading scripts easier for humans.


The primary method of carrying out matrix algebra is within an algebra section (see page 53). Matrices that appear on the left hand side should not already exist in that group.


Means Command


Syntax:

Means <formula>;

where formula is a legal matrix algebra formula


The Means command operates in the same way as the Covariance command. It exists to facilitate the modeling of means. All the matrix operators and functions (Section 4.2) may be used just as when specifying a model for covariances. A ; or / must end the command. Currently, Mx will do nothing with models for means when applying the functions LS, GLS, AWLS, DWLS. Only the ML, US and RM fit functions make use of models for means.


Threshold Command


Syntax:

Threshold <formula>;

where formula is a legal matrix algebra formula, resulting in a matrix with 2 rows, the first row for the row thresholds and the second row for the column thresholds


The Threshold command operates in the same way as the Means to specify thresholds. It enables modeling of thresholds when fitting to contingency table data. All the matrix operators and functions (Section 4.2) may be used just as when specifying a model for covariances. A ;or / must end the command. Threshold cannot be used with any fit function other than contingency table ML, which is used when CTable data have been supplied (see chapter 5).


Special restrictions apply to the dimensions of the matrix calculated in the Threshold command. The result must have 2 rows and must have at least d columns where d=max ((r-1),(c-1)), in other words, at least one less than the number of rows or the number of columns in the contingency table, whichever is the greater. The first (r-1) elements of the first row of the matrix will contain the thresholds that separate the rows. The first (c-1) elements of the second row of the matrix will contain the thresholds that separate the columns. If r is not equal to c, then the row with the fewest thresholds is filled up with zero’s. These elements are unstandardized row and column threshold estimates, which may be standardized by dividing by the square root of the product of the two diagonal elements of the expected covariance matrix calculated by the Covariance or Constraint statement. Use of unstandardized thresholds allows the testing of models that predict differences in variance between groups, but have equal thresholds.

The user should take care to supply starting values for thresholds that increase from left to right in both rows of the matrix calculated by the Threshold command. Ideal starting values are those that, when standardized, mark the z-scores on the normal distribution corresponding to the cumulative frequencies of the normal distribution of the row totals (first row of the calculated matrix) or the column totals (second row of the calculated matrix). For example, if the following contingency table was supplied as data:


CTable 3 2

20 180

40 360

20 180


then appropriate starting values for 2 row thresholds would be -.67 and +.67 (z-scores corresponding to the lower 25% and 75% of the normal distribution), and -1.28 would be appropriate for the starting value of the column threshold (z-score corresponding to the lower 10% of the normal distribution). Therefore if the threshold model was simply T, we would declare

T Full 2 2

and use

Matrix T -.67 .67 -1.28 0

to initialize it.


Weight command


Syntax:

Weight <formula>;

where formula is a legal matrix algebra formula


The fundamental assumption of fitting a model to a population is that there is only one model. However, the population may consist of a mixture of groups which differ in the parameters or the entire structure of the model. In Mx, the weight command, coupled with the NModel parameter, allow analysis of such mixtures when the raw data are available. NModel controls the number of models supposed to exist in the population. The predicted means and covariances are simply vertically stacked in the usual matrix expression for the means and covariances. For example, if three variables were being studied with one model, the predicted mean vector would be of order (1×3) and the predicted covariance matrix would be (3×3). If two models are being used, the predicted mean vector should be (2×3) and the predicted covariance matrix (6×3). Mx checks that the size of the predicted covariance and mean vectors agree with the NModel and NInput (including any changes made with Select/Definition statements). Weight allows modeling of the likelihood that a particular observed vector is a member of a particular model class. The weight matrix expression should evaluate to a vector of order (NModel×1). The log-likelihood for a particular vector then becomes:


where wi is the weight, Li is the likelihood under the ith model.


Often, the weights used will reflect simple proportions, and usually Σwi = 1. (see page 141 for an example). Sometimes, covariates may be used to compute the weight applied to a particular model. An example of such weighting is quantitative trait loci analysis where the probability that a pair of siblings have 0, 1 or 2 alleles in common at a particular place on the genome can be used to weight their likelihood under three models (Eaves et al., 1996).


Frequency Command


Syntax:

Freq <formula> ;

where formula is a legal matrix algebra formula


For maximum likelihood analysis of raw continuous data, it is possible to enter a formula for the frequency of the individual observations. For a constant frequency that does not change across the individual cases, this formula could be a scalar (1x1) matrix with the weight in it. More commonly it is desired that the frequency changes across the observations, in which case a definition variable may be used (see p 139).


4.4Putting Numbers in Matrices


This section describes three methods of entering numbers into matrices (see Section 4.5 for how to specify elements of matrices to be free, fixed or constrained parameters). In Section 3.1, we saw how matrices could be declared as one of 12 types, such as identity, symmetric, diagonal or full (see Table 4.1), and how their dimensions (rows, r and columns, c) were specified. On inspection of the table, we see that types Zero, Identity Identity|Zero and Zero|Identity (IZ) have no free elements at all. For example, there is nothing more to know about an IZ matrix which has 2 rows and 4 columns. It looks like this:


and it cannot be changed at all. If it was altered, then it would no longer be an IZ matrix.


All six remaining matrix types have modifiable elements which may be altered with the commands Matrix, Start or Value. The number of modifiable elements varies according to:

     The number of rows and columns in the matrix

     The type of the matrix


All modifiable elements of a matrix are initialized at zero. The order of elements in a matrix is left to right, by rows. For example, a symmetric (3×3) matrix would be read as:

1

2 3

4 5 6

See Table 4.3 for more examples on the patterning of matrices.


Matrix Command


Syntax:

Matrix <matrix name> {File=filename} <numlist>

where <numlist> is a free format list of numbers.

Note that different syntax is required in multiple fit mode:

Matrix <group number> <matrix name> {File=filename} <numlist>


The Matrix command supplies a list of values for the modifiable elements of a matrix. The list length required varies according to matrix type, and size as described at the start of this Section, on page 75. For example, suppose we specify a diagonal matrix A with 3 rows and 3 columns. The fourth column in Table 4.1 shows that the number of free elements is equal to r for diagonal elements, so we supply r elements. The command lines

Matrix A .3 5 9

or, equivalently

Matrix-I-would-like-to-change-is A

0.3D+00 5 9.00000000

would result in matrix A as:


The Matrix command operates regardless of whether elements have been specified as fixed or free parameters.


Matrix will read its elements from a file with a FORTRAN format on the first line. Such files may have been produced by an earlier run of Mx, or by another program. LISREL matrix output files (produced by commands such as gamma=filename on the LISREL OU line) are fully compatible. The files must contain at least as many numbers as required to fill the changeable elements of the matrix specified (see page 75).

The Matrix command always expects a file to have a format as first line, so a * should be supplied for matrices in free format (numbers separated by blanks and carriage returns).


Start and Value Commands


Syntax:

Start/Value <value> <element list>/ All

where <element list> consists of matrix elements and may include the TO keyword


In a large matrix, it is not convenient to provide a value for all the elements of a matrix, when only a few need to be modified. Under these circumstances, it is easier to explicitly change elements by name. Elements may be referred to by up to three subscripts (e.g. A 1 2 3), according to the syntax

A {<group>} <row> <col>


If the matrix you wish to refer to is in the current group, the group number may be omitted. The numbers <group> <row> <col> may be separated by any number of non-numeric or blank characters, so that, for example, to put .5 in row 2 column 3 of group 1's A matrix, you could enter:

Value.5 A 1 2 3

will work the same as

Value.5 A(1,2,3)

N.B. It is only possible to modify matrices declared in the current or previous groups.

Value and Start recognize #define’d variables (see page 49). For example. We could have the statements

#define first 1

#define rowsinA 6

#define colsinA 10

at the top of the script, and then

Value 1.5 A first 1 1 to A first rowsinA colsinA

would set 1.5 to all the fixed (non-free) elements of A, from A 1 1 to A 6 10.

The difference between Start and Value lies in their treatment of elements when the keywords ALL or TO are used (- is a synonym for TO). With the keyword ALL, Start assigns a starting value to every free parameter specified at that point in the input file. Value does the opposite -- it assigns its value to every fixed matrix element specified up to that point. Although Start does the same thing if the TO keyword is specified, i.e. only apply its value to free parameters, Value behaves differently. It will assign a value to all elements in the same specified range, free parameter or fixed.

The TO keyword should be used only to specify a range of matrix elements within the same matrix.


4.5Putting Parameters in Matrices


Parallel to the placement of numbers in matrices described in Section 4.4, there are facilities for putting parameters in matrices. Note also that all modifiable elements of a matrix can be specified as different free parameters using the keyword Free after the matrix is specified (see Section 4.1), and that building models with this in mind can be much faster and more flexible (see Chapter 1).


Pattern Command


Syntax:

Pattern <matrix name> {File=filename} <numlist>

where <numlist> is a list of 1's and 0's.

Note that different syntax is required in multiple fit mode:

Pattern <group number> <matrix name> {File=filename} <numlist>


The Pattern command is a simple method that has the same syntax as the LISREL command on which it was based. Following the Pattern command, the user must provide the correct number (see