The March 2010 issue of the journal Psychological Methods contained a special series on quasi-experiments and causal inference, highlighting the complementary approaches of Donald Rubin and the late Donald Campbell. The heart of the series is a pair of lead articles, by Will Shadish, and by Stephen West and Felix Thoemmes, with a brief introductory piece by journal editor Scott Maxwell and three commentaries following the two main articles. It's taken me a while, but I'm now ready to comment on the special series.
Being able to demonstrate causality in social science research is a daunting task. Even in a tightly controlled one-hour laboratory experiment, there are potential threats such as experimenter bias or demand characteristics. In long-term field studies, even with random assignment, there is a lot of time for things to go wrong, namely the threats to internal validity enumerated by Campbell (with colleagues Stanley and Cook) such as history effects, attrition, or diffusion and imitation of treatments (here and here). Finally, in quasi-experiments, which lack the crucial element of random assignment, one knows from the start that a full causal inference is out of reach.
Where Cook and Rubin differ is in their main techniques for trying to get as much probative evidence as possible out of a given research design (experimental, quasi-experimental, or correlational/observational). As Shadish notes, with some qualification, "[Rubin] has focused more on analysis, whereas [Campbell] has focused more on design" (p. 11). West and Thoemmes phrase the contrast as follows: "Campbell's perspective emphasizes the prevention of threats to internal validity rather than their correction" (p. 22).
Campbell's approach, in essence, seems to be examining ahead of time where the flaws in a research design might lie, and then shoring up the design as best as possible. The latter might be done via steps such as pre-testing, extra control groups, and extra dependent variables (any of which might be affected by an artifactual phenomenon, but only one of which should be affected by the intervention). West and Thoemmes describe a previous quasi-experimental study by Reynolds and West (1987), who sought to test the effectiveness of an intervention to help convenience stores sell lottery tickets. This example, which involved matched control groups because store managers would not accept random assignment, was very clear and came with accompanying graphics to illustrate interpretation of the findings.
Rubin's approach relies on quantitative operations to assemble comparable experimental and control groups, when original group membership was not determined randomly. Propensity scores appear to be the main weapon in Rubin's arsenal. West and Thoemmes also walk the reader through an example of propensity scores, using 2008 research by Wu, West, and Hughes on retention in first grade and later academic achievement.
Even with my background teaching research methods for 13 years and working on projects using quasi-experimentation and propensity scores, I still found the articles highly educational. For example, the idea that Campbell's internal-validity-threats framework could be taken beyond the simple presence or absence of threats to possible quantification of the magnitude of the threats' effect was new to me. I also found discussion of the continuing controversy over matching to be highly informative. (As part of the extended commentary, Rubin writes in his piece about being "flabbergasted" upon hearing Campbell's denunciation of matching in a face-to-face meeting between the two in the early 1970s [p. 39].)
I enthusiastically recommend the Psych Methods special series as an aid to non-experimental research and as assigned reading for graduate-level methodology courses.