Multilevel analysis when the number of clusters is small

A small number of top-level units in multilevel analysis is a problem that has worried comparativists for quite some time. In a paper that is forthcoming in the British Journal of Political Science (Elff, Heisig, Schaeffer, and Shikano 2020) my co-authors and I show that this problem can be satisfactorily addressed by only moderate modifications of common techniques of statistical inference: using restricted maximum likelihood (REML) instead of (ML) estimators (Patterson, and Thompson 1971) and by avoiding the assumption of asymptotic normality for the sampling distribution of coefficient estimates and assume a t-distribution instead. (Satterthwaite 1941; Kenward, and Roger 1997)


Performance of Likelihood-based Confidence Intervals of Upper-Level Covariate Effect in Multilevel Linear and Probit Models

Scholars who work with cross-national comparative surveys, such as the Eurobarometer, European Social Survey, or Comparative Study of Electoral Systems often worry whether multilevel analysis of these data is possible at all. The countries covered by such surveys form the top-level units in such analysis, but their number is usually much smaller than the number of units in a typical survey sample, apparently too small to conduct statistical analysis with confidence. Indeed, methodological literature exists that suggests that inferences drawn from multilevel analysis may suffer from serious problems: Reported standard errors of model coefficients tend to be too small, confidence intervals too short, and statistical hypothesis tests may lead to false discoveries. In a widely cited article that appeared in the American Journal of Political Science, Daniel Stegmüller argues that while frequentist inference suffer from such shortcomings, Bayesian inference does not, allowing to obtain valid results even if the number of top-level is 10 or less (Stegmüller 2013). His simulation studies suggest that estimates of coefficients of contextual variables may even be severly biased. In our article we demonstrate that improvents in inference can also be achieved using non-Bayesian or “frequentist” methods. Further we reproduce an already known proof that coefficient estimates are unbiased in the linear-normal caste if they exist (Kackar, and Harville 1981). The bias in coefficient estimates which Stegmüller finds in his simulation study is attributed to a flaw in its design.


Elff, Martin, Jan Paul Heisig, Merlin Schaeffer, and Susumu Shikano. 2020. "Multilevel Analysis with Few Clusters: Improving Likelihood-based Methods to Provide Unbiased Estimates and Accurate Inference". British Journal of Political Science Online first.

Kackar, Raghu N. and David A. Harville. 1981. "Unbiasedness of Two-stage Estimation and Prediction Procedures for Mixed Linear Models". Communications in Statistics-Theory and Methods 10(13): 1249--1261.

Kenward, Michael G. and James H. Roger. 1997. "Small Sample Inference for Fixed Effects from Restricted Maximum Likelihood". Biometrics 53(3): 983--997.

Patterson, H. D. and R. Thompson. 1971. "Recovery of Inter-Block Information When Block Sizes Are Unequal". Biometrika 58(3): 545--554.

Satterthwaite, Franklin E.. 1941. "Synthesis of variance". Psychometrika 6(5): 309--316.

Stegmüller, Daniel. 2013. "How Many Countries for Multilevel Modeling? A Comparison of Frequentist and Bayesian Approaches". American Journal of Political Science 57(3): 748--761.