Wednesday, March 26, 2008

Modeling Causation with Non-Experimental Data (Part I)

by Alan

For more than a decade, I've taught a course on structural equation modeling (SEM). The technique, nearly always used with survey data (although in theory also applicable to experimental data), involves drawing shapes to represent variables and arrows between the shapes to represent the researcher's proposed flow of causation between the variables. Regression-type path coefficients are then generated to assess the strength of the hypothesized relations. Though the unidirectional arrows in the diagrams and the tone often used in reports of SEM analyses (e.g., "affected," "influenced," "led to") imply causality, such an inference cannot, of course, be supported with non-experimental data. This is especially true of cross-sectional data (i.e., where all variables are measured concurrently). As will be discussed later, causal inferences can be supported to a greater degree -- though not completely -- with longitudinal data.

It is in this context that I read the chapter on SEM in Judea Pearl's (2000) book Causality: Models, Reasoning, and Inference. Pearl, a UCLA professor with expertise in artificial intelligence, logic, and statistics, writes at a level that is, quite frankly, well over my head. I did find his SEM chapter relatively accessible, however, so that I will discuss.

Pearl's apparent thesis in this chapter is that contempory SEM practitioners are too quick to dismiss the possibility of being able to draw causality from the technique (much like I did in my opening paragraph). Early on, in fact, Pearl writes that, "This chapter is written with the ambitious goal of reinstating the causal interpretation of SEM" (p. 133).

Pearl reviews a number of writings on SEM that he feels, "...bespeak an alarming tendency among economists and social scientists to view a structural equation as an algebraic object that carries functional and statistical assumptions but is void of causal content" (p. 137). He further notes:

The founders of SEM had an entirely different conception of structures and models. Wright (1923, p. 240) declared that "prior knowledge of the causal relations is assumed as prerequisite" in the theory of path coefficients, and Haavelmo (1943) explicitly interpreted each structural equation as a statement about a hypothetical controlled experiment. Likewise, Marschak (1950), Koopmans (1953), and Simon (1953) stated that the purpose of postulating a structure behind the probability distribution is to cope with the hypothetical changes that can be brought about by policy.

An interpretation of the above paragraph that would make the causal interpretation of SEM defensible, in my view, would be as follows: If the directional relations one is modeling with non-experimental (survey) data have previously been demonstrated through experimentation, then SEM can be a useful tool for estimating quantitatively how much of an effect a policy change could have on some outcome criterion. It's when we start talking about "assumed" or "hypothetical" experimental support that things get dicey.

As I noted above, however, contemporary SEM practitioners would probably be more comfortable with suggestions of causation if the data were collected longitudinally (more specifically, with a panel design, in which the same respondents are tracked over time). Of the three major criteria for demonstrating causality, longitudinal studies are clearly capable of demonstrating correlation and time-ordering; provided that the most plausible "third variable" candidates are measured and controlled for, the approximation to causality should be good (these latter two linked documents are from my research methods lecture notes).

MacCallum and colleagues (1993, Psychological Bulletin, 114, 185-199), while acknowledging some limitations to longitudinal studies, succinctly summarize why they're useful: "When variables are measured at different points in time, equivalent models in which effects move backward in time are often not meaningful" (p. 197).

In the upcoming Part II, we highlight a recent empirical study whose discussion provides a particularly thoughtful exposition on causal inference with longitudinal survey data.

No comments: