# Multilevel analysis when the number of clusters is small

01 February 2019

A small number of top-level units in multilevel analysis is a problem that has
worried comparativists for quite some time. In a paper that is forthcoming in
the *British Journal of Political Science*
(Elff, Heisig, Schaeffer, and Shikano 2020) my co-authors and I show that this
problem can be satisfactorily addressed by only moderate modifications of common
techniques of statistical inference: using *restricted* maximum likelihood (REML)
instead of (ML) estimators (Patterson, and Thompson 1971) and by avoiding the assumption of asymptotic
normality for the sampling distribution of coefficient estimates and assume a
*t*-distribution instead.
(Satterthwaite 1941; Kenward, and Roger 1997)

Scholars who work with cross-national comparative surveys, such as the
*Eurobarometer*, *European Social Survey*, or *Comparative Study of Electoral
Systems* often worry whether multilevel analysis of these data is possible at
all. The countries covered by such surveys form the top-level units in such
analysis, but their number is usually much smaller than the number of units in a
typical survey sample, apparently too small to conduct statistical analysis with
confidence. Indeed, methodological literature exists that suggests that
inferences drawn from multilevel analysis may suffer from serious problems:
Reported standard errors of model coefficients tend to be too small, confidence
intervals too short, and statistical hypothesis tests may lead to false
discoveries. In a widely cited article that appeared in the *American Journal of
Political Science*, Daniel Stegmüller argues that while frequentist inference
suffer from such shortcomings, Bayesian inference does not, allowing to obtain
valid results even if the number of top-level is 10 or less
(Stegmüller 2013). His simulation studies suggest that estimates of
coefficients of contextual variables may even be severly biased. In our article
we demonstrate that improvents in inference can also be achieved using
non-Bayesian or “frequentist” methods. Further we reproduce an already known proof
that coefficient estimates are unbiased in the linear-normal caste if they exist
(Kackar, and Harville 1981). The bias in coefficient estimates which
Stegmüller finds in his simulation study is attributed to a flaw in its design.

References

Elff, Martin, Jan Paul Heisig, Merlin Schaeffer, and Susumu Shikano. 2020. "Multilevel Analysis with Few Clusters: Improving Likelihood-based Methods to Provide Unbiased Estimates and Accurate Inference". *British Journal of Political Science *Online first.

Kackar, Raghu N. and David A. Harville. 1981. "Unbiasedness of Two-stage Estimation and Prediction Procedures for Mixed Linear Models". *Communications in Statistics-Theory and Methods *10(13): 1249--1261.

Kenward, Michael G. and James H. Roger. 1997. "Small Sample Inference for Fixed Effects from Restricted Maximum Likelihood". *Biometrics *53(3): 983--997.

Patterson, H. D. and R. Thompson. 1971. "Recovery of Inter-Block Information When Block Sizes Are Unequal". *Biometrika *58(3): 545--554.

Satterthwaite, Franklin E.. 1941. "Synthesis of variance". *Psychometrika *6(5): 309--316.

Stegmüller, Daniel. 2013. "How Many Countries for Multilevel Modeling? A Comparison of Frequentist and Bayesian Approaches". *American Journal of Political Science *57(3): 748--761.