Ecological inference

Martin Elff

From methodological textbooks we learn about "ecological fallacies", the erroneous conclusion from relations between variables at an aggregate level to relations between variables at an individual level. Often enough, however, such aggregate data is all information we have about individual behaviour. One famous example is the question about who voted for the Nazi party in 1930s Germany. Here only aggregate data about the vote shares for different parties are available at district level as well as some demographic or social-structural data on these districts. Thus on the one hand, ecological fallacies are mistakes that should be avoided, but on the other hand, to ignore information contained in aggregate data would mean to drop out the baby with the bath water.

An ecological inference is a scientific inference that starts from aggregate data to make inferences about patterns of individual behaviour. Formally, such inferences are plagued by problems in identifying a unique answer to the question about those patterns. Without certain restrictive assumptions, trying an ecological inference is equivalent to trying to solve an equation with more unknowns than knowns. Unfortunately, the assumptions that would help to identify a solution cannot be tested.

Based on these considerations, I used to be very sceptical about the possibility of ecological inference. On the other hand, I conducted some simulation studies with Goodman's ecological regression model and was surprised about its good performance - given that certain conditions are met (there is some possibility that I will report on these results somewhere here). So my conclusion was that ecological inference may not be impossible, but still is plagued by the fundamental problem of untestable assumptions. The work I published with Thomas Gschwend and Ron Johnston tries to address the dilemma that ecological inferences may be desirable, but that such inferences are beset by a level of uncertainty not captured by standard methods of inferential statistics. Using a heuristic method based on the principle of maximum entropy, we are able to construct something similar to confidence intervals, which allows to delimit the uncertainty of estimates obtained from a specific method of ecological inference.

Articles

Elff, Martin, Thomas Gschwend, and Ron J. Johnston. 2008.
"Ignoramus, Ignorabimus? On Uncertainty in Ecological Inference." Political Analysis 16(1): 70-92. (Web appendix, replication material.)