The September/October 2009 issue of Child Development includes an article entitled, "Correlates and Consequences of Spanking and Verbal Punishment for Low-Income White, African American, and Mexican American Toddlers." The article was authored by Duke University's Lisa Berlin and a long list of co-authors from multiple institutions; the eight authors of the article were themselves representing an even larger set of investigators who formed the Early Head Start Research Consortium. The abstract of the article is available here. I would like to thank Jonathan Mueller for bringing the article to my attention and suggesting I comment upon it.
The Berlin et al. article is an example of a longitudinal/panel study, from which causal inference is potentially very good, but never complete (see earlier posting on this topic). Given the emotional reaction many people have to spanking and related issues, however, any article of this type is bound to receive great scrutiny.
The study drew a sample of 2,573 children and their primary caregivers (99% mothers) from 17 sites nationally. Families were assessed when the children were 1, 2, and 3 years of age, with key study variables including parental spanking and verbal punishment, and child fussiness (EASI II temperamental emotionality; age 1 only), aggression (CBCL), and mental development (Bayley scores). The latter two child outcomes were measured only at ages 2 and 3. The authors addressed causality issues in the Introduction, on pp. 1405-1406:
In keeping with transactional theories of child development... another question requiring further study concerns the direction of effects. In particular, to what extent do parental discipline strategies drive child outcomes, to what extent are these parenting strategies elicited by particular child behaviors, and to what extent are both causal mechanisms operative? ... As recommended by Gershoff and Bitensky, cross-lagged path models that simultaneously estimate effects from parental discipline strategies to child behaviors and vice versa are critical to disentangling such issues.
Some of the key statistically significant findings from the regression analyses were as follows:
Greater age-1 child fussiness was associated with greater age-1 and -2 spanking and verbal punishment;
Greater age-1 spanking was associated with greater age-2 child aggression; and
Greater age-1 spanking was associated with lower age-3 child mental development.
Two of the criteria for causality -- presence of statistical association and time-ordering -- are clearly met for the above relationships. The third and final criterion is that all possible "third variables" (e.g., something that might cause both spanking and child aggression) are ruled out. No study can ever rule out all possible third variables, so a more realistic question is whether the investigators ruled out the most plausible contenders. Quoting from the Table 5 caption, Berlin et al report controlling for "Early Head Start program participation, maternal race/ethnicity, age, and education, maternal depression at age 1, family income and structure, and child sex." It also appears from Table 5 and Figure 1 that the significant path from age-1 spanking to age-2 child aggression was obtained while controlling for age-1 child fussiness (i.e., an age-1 fussiness to age-2 aggression path was also included in the model, and was significantly positive).
The above set of control variables seems fairly comprehensive, but observers can usually suggest more (some more plausible than others). A skeptic might note that the study design was not "genetically informed," in other words, not able to examine rigorously whether, for example, genetic tendencies toward irritability that may have been shared between mother and child may have contributed to the obtained relationships (a phenomenon known as passive gene-environment correlation). Controlling for child fussiness probably helps a little bit in overcoming this objection, although it would have been good to control for symptoms of other forms of maternal psychopathology besides depression.
It is important to add, however, that the authors appear to have gone to great pains to examine the possible strengths and weaknesses of their study, in order to present an honest appraisal of the findings. Specifically, they undertook a number of supplementary analyses to (potentially) qualify the scope of their conclusions, including tests for whether the basic results held up equally well across different racial/ethnic groups (i.e., moderation) and whether removing the most "severe" spankers affected the results. They also discussed what they perceived as limitations to their own study, such as the self-report measure of spanking not containing a specific definition of the act, and acknowledged that the results suggesting an effect of spanking were of "small," though significantly non-zero, magnitude.
Berlin et al. summarize their study as follows:
[The findings] support the conclusion that spanking during toddlerhood can have negative consequences for toddlers' cognitive as well as socioemotional functioning (p. 1417).
I find this conclusion appropriate. As noted, the authors' research design approaches causality about as well as is possible with a non-experimental study, and the use of "can" as a qualifier in the preceding quote conveys the necessary caution.
UPDATE (May 17, 2010): Robert Larzelere, a faculty member at Oklahoma State University whose interests include causal inference from longitudinal surveys, e-mailed me a few weeks ago that he and his colleagues had published some articles on spanking and alternative sources of discipline. Here's a link to his list of publications, from which interested readers can see the kinds of issues Dr. Larzelere has raised regarding causal inference and parental discipline.