A method for small-area estimation of population mortality in settings affected by crises

Background Populations affected by crises (armed conflict, food insecurity, natural disasters) are poorly covered by demographic surveillance. As such, crisis-wide estimation of population mortality is extremely challenging, resulting in a lack of evidence to inform humanitarian response and conflict resolution. Methods We describe here a ‘small-area estimation’ method to circumvent these data gaps and quantify both total and excess (i.e. crisis-attributable) death rates and tolls, both overall and for granular geographic (e.g. district) and time (e.g. month) strata. The method is based on analysis of data previously collected by national and humanitarian actors, including ground survey observations of mortality, displacement-adjusted population denominators and datasets of variables that may predict the death rate. We describe the six sequential steps required for the method’s implementation and illustrate its recent application in Somalia, South Sudan and northeast Nigeria, based on a generic set of analysis scripts. Results Descriptive analysis of ground survey data reveals informative patterns, e.g. concerning the contribution of injuries to overall mortality, or household net migration. Despite some data sparsity, for each crisis that we have applied the method to thus far, available predictor data allow the specification of reasonably predictive mixed effects models of crude and under 5 years death rate, validated using cross-validation. Assumptions about values of the predictors in the absence of a crisis provide counterfactual and excess mortality estimates. Conclusions The method enables retrospective estimation of crisis-attributable mortality with considerable geographic and period stratification, and can therefore contribute to better understanding and historical memorialisation of the public health effects of crises. We discuss key limitations and areas for further development. Supplementary Information The online version contains supplementary material available at 10.1186/s12963-022-00283-6.


Mortality estimation in crisis-affected populations
In populations exposed to conditions of crisis (armed conflict, food insecurity, natural disasters, etc.), estimates of population mortality provide a basis on which to predicate an appropriate humanitarian response [1,2], and support advocacy and historical documentation [3,4]. Over the past two decades, estimates of mortality have informed war crime prosecution in the former Yugoslavia [5], illuminated the toll of armed conflict in Darfur [6,7], the Democratic Republic of Congo [8] and Iraq [9,10], documented the impact of famine in Somalia [11] and, most recently, demonstrated the direct and indirect health impacts of the SARS-CoV-2 pandemic [12][13][14].
Crisis-attributable mortality is difficult to estimate, even in high-income countries [15,16]. In low-income and/or insecure settings, additional challenges [4,17] arise, including (i) lack of robust vital events registration; (ii) unfeasibility of representative primary data collection due to insecurity, lack of authorisations, funding constraints or other factors; and (iii) inability to collect robust retrospective data due to having to elicit information on demographic events over a long period in the past (e.g. > 2 years). Response bias as questionnaires probe farther back in time, plus survival and selection biases caused by households disintegrating due to high mortality or migration, challenge survey validity [17]. Establishing a counterfactual (i.e. non-crisis) death rate presents a further challenge, particularly in very protracted crises (e.g. Afghanistan or the eastern Democratic Republic of Congo) where such a baseline has been unobservable for decades.

Scope of this paper
Here, we describe the design and implementation of a method that addresses the above challenges, and estimates crisis-attributable death rates and tolls based on previously collected data. Applications of previous iterations of the method in Somalia (2010-2012) [11] and South Sudan (2013-2018) [18] have been published elsewhere. Further applications in Somalia (2014-2018), Nigeria and the Democratic Republic of the Congo will be published separately. South Sudan, Somalia and Nigeria examples are however used here to illustrate the application and constraints of the method.

General design
Why a small-area estimation approach?
Small-area estimation was developed in the United States to estimate characteristics of interest, e.g. smoking prevalence or poverty levels, for small geographical units (e.g. counties) without having to conduct primary data collection within each such unit [19]. Our method is designed to deliver estimates for small geographical and time strata based solely on existing data.

General framework
Crisis-attributable mortality can be defined conceptually as the difference between the number (or rate) of deaths that has actually occurred during the crisis and the number (rate) that would have occurred in the absence of the crisis.
As illustrated hypothetically in Fig. 1, in a counterfactual (i.e. no-crisis) scenario it is plausible that the precrisis secular decline would have continued; the crisis has negated these improvements and effectively returned the population to a 'higher' baseline than pre-crisis; moreover, excess, crisis-attributable mortality may occur even Fig. 1 Illustration of actual and counterfactual mortality during and after a hypothetical crisis years after crisis conditions (e.g. armed conflict) resolve (e.g. increased tuberculosis mortality due to higher transmission of M. tuberculosis when people lived in displacement camps years earlier, or the multi-generational effects of psychological stress).
We wish to estimate excess mortality for the entire 'person-time' at risk during the crisis, but also for specific sub-periods and geographic units (these could be administrative level 2 entities such as counties or districts; they could also however be geographical units whose boundaries may correlate more closely with mortality risk, such as settlements for internally displaced persons (IDPs) or 'livelihood zones' , namely areas characterised by a dominant economic activity, e.g. pastoralism or agriculture). Information on where and when mortality is highest may be useful to identify gaps in the humanitarian response or to better understand the dynamics of an armed conflict. More generally, we can write where D is the death toll, y is the mean death rate and N the population at risk; E , A and C denote excess, actual (i.e. what truly happened) and counterfactual (what would have happened in the absence of a crisis) levels; k is any geographic unit (e.g. a district), and t any time unit (e.g. a month) within the crisis period (thus, kt , the smallest analysis stratum, could be a district-month). Note that N C may differ from N A , for example because in a no-crisis counterfactual forced displacement would not have occurred. If the quantities on the right-hand side of Eq. (1) are all estimated, we can sum results for any kt strata for different aggregations of interest or to compute the overall death toll. Equation (1) also applies for age-or cause-specific mortality (e.g. among children under 5 years old; due to intentional injury), provided these stratifications are available or can also be estimated.

Estimation steps
Our adaptation of small-area estimation consists of using available data to fit and validate a statistical model (specific to each crisis) that predicts the death rate y kt as a function of several predictor variables; and applying this model to project y A,kt and y C,kt under actual (observed) and assumed counterfactual conditions. Separately, N A,kt and N C,kt are reconstructed based on growth rates and displacement patterns. Excess deaths are then computed by applying Eq. (1). Table 1 summarises the steps involved in the full application of the method. Data management details are omitted here, but annotated on R statistical scripts (see Declarations and Additional file 1: pages 11-13).
Step 2, namely reconstructing population denominators, will be detailed in a separate paper.

Defining the analysis person-time and strata
Specifying the population and period for which estimates are sought, and the granularity with which these may be computed, determines most of the subsequent steps. In some scenarios, this will be straightforward (e.g. an entire country or a specific region is affected by armed conflict with a clear start and end date). In other cases, the analysis may be conducted to estimate mortality up to a certain time point in the crisis.
The definition of 'crisis' also needs to be made explicit: for example, Somalia has experienced 30 years of armed conflict; against this backdrop, drought and flooding emergencies have repeatedly occurred. Our analyses to date in Somalia have aimed to estimate mortality attributable to exceptional food insecurity events (2010)(2011)(2012)(2017)(2018) [20] triggered by drought, i.e. above and beyond any excess deaths caused by the protracted conflict alone. Accordingly, we have defined the period of analysis as that over which key food security indicators and other markers of crisis conditions were reported to be unusually poor. In Nigeria, we wished to estimate mortality attributable to the armed conflict between the government and Boko Haram, which affects three states (Borno, Yobe, Adamawa) in the northeast: this is a more straightforward scenario in which a relatively recent baseline of no conflict precedes the crisis. Refugees who leave the crisis-affected region should also be considered within the study population. However, this bears several complexities: for example, refugees will be exposed to different risk factors and may paradoxically experience lower mortality than if they had remained in their country of origin, implying a negative excess mortality: this has been documented for South Sudanese refugees in Uganda [21], and could plausibly apply to the large Syrian refugee population now living in Europe.
In practice, the person-time boundaries of the analysis and the smallest level of stratification ( kt ) may be constrained by data availability. However, if possible a 'buffer' period (e.g. 6-12 months pre-crisis) should be included in the analysis to allow exploration of lagged effects of predictors on mortality and to use 'baseline' observations to set counterfactual values for the predictors (see below, 'Excess mortality estimation' section). Furthermore, stratification should be as granular as possible to maximise observations available for model fitting and the utility of estimates.
As detailed below, sample surveys conducted by various humanitarian actors are the commonest source of mortality ground data with which to fit and validate models. In a Somalia study (2010-2012) [11] we conducted in the aftermath of a severe famine, nearly all such surveys had as their sampling universe the intersection of regional and livelihood zone boundaries: for example, within Gedo region some surveys were designed to represent communities that predominantly relied on pastoralism, while other surveys covered IDPs or riverine agriculturalists. Most of the predictors and demographic estimates were also collected at or could be aggregated to this stratification level, and by month. Our chosen kt was thus regional livelihood zones and months ( Table 2).
In more recent work, available data have supported stratification by level 2 administrative unit (counties and districts, respectively). Conduct sensitivity analyses of interest Explore how possible bias or uncertainty in key parameters affect the estimates, by running the analysis with alternative data or assumptions Step 5

Mortality data
Ground mortality observations are required to train and validate a predictive model. The Standardised Monitoring and Assessment of Relief and Transitions (SMART) initiative [22] has developed a globally applicable protocol for rapid surveys that primarily aim to estimate the prevalence of acute malnutrition, but often also include a questionnaire module that elicits information from sampled households on their demographic experience over a retrospective 'recall' period, typically 3-6 months long [23]. SMART surveys are highly standardised and conducted routinely in most humanitarian responses [24], typically at administrative level 2 or similarly small scale. Surveys mostly rely on two-stage cluster sampling, though some, e.g. in IDP camps, are exhaustive or use systematic random selection. Sample sizes of 300-1000 households and 20-30 clusters are typical, i.e. sampled households are only a small fraction of the total. Survey design and analysis are automated by Emergency Nutrition Assessment (ENA) software, reducing the potential for surveyor error [25].
We identified 205 analysis-eligible SMART surveys in Somalia . Despite these substantial numbers, geographic and period data coverage can be sparse, as illustrated in Fig. 2 for South Sudan.
After cleaning datasets to resolve errors (e.g. values out of the allowed range), the crude death rate (CDR), under 5 years death rate (U5DR or CDR among the population aged under 5 years), crude birth rate, in-, out-and net migration rate, and, for individual questionnaire surveys only, cause-and gender-specific death rates may be computed (Additional file 1: page 2 and Table S1). The CDR and U5DR in particular are widely used by humanitarian actors to benchmark the severity of a crisis in health terms [1]. Inspection of crude patterns in survey indicators may be informative: for example, in South Sudan many surveys indicated high injury-attributable death rates and relative risks of dying among males, compared to females (Fig. 3).
Humanitarian surveys have varying robustness [26,27]. While SMART survey reports do not systematically report quality issues, they should nonetheless be scrutinised to identify potential biases, particularly any restriction of the effective sampling frame to only a fraction of the intended sampling universe, due for example to insecurity or inaccessibility. We attribute to each survey s a weight w s = w B,s w Q,s , wherew B,s , a representativeness weight, is the approximate fraction of the sampling universe that was actually included in the sample, as per the survey's report (for example, if a report states that the sampling frame excluded 3 out of 5 districts, we set w B,s = 0.4 ; where an unspecified number of sampling units are excluded from the sampling frame, we assume w B,s = 0.5 ); and w Q,s is a quality weight derived from the dataset (see Additional file 1: page 3).

Predictor data
If the statistical objective of analysis is merely to predict the death rate, any set of predictor variables that does so accurately, whatever their causal relationship with mortality, may be appropriate. However, choosing predictors that are causally related to mortality, or proxies for mortality risk determinants, is likely to enhance predictive power and help assess the model's internal validity.  To this end, we have defined a generic framework of factors leading to crisis mortality (Additional file 1: Figure S1). At least some of the selected predictors should be related to plausible drivers of excess mortality risk: for example, in a drought-triggered food security crisis these might include rainfall, food purchasing power, burden of malnutrition and the incidence of epidemics (cholera, measles); in an armed conflict, the intensity of violence and disruptions to public health services might be more relevant. Identifying such 'crisis-specific' predictors is critical, as the method defines no-crisis scenarios by specifying counterfactual values for these very predictors.
In armed conflict settings and humanitarian responses, data collection is often unsystematic and disrupted [28]. In our experience to date, data are available for only few causal factors, and negotiation with agencies and humanitarian coordination mechanisms holding nonpublic datasets occupies a large share of analyst time. Such datasets generally have poor integrity; they are typically entered onto spreadsheet software without standardisation of geographical nomenclature, value cell or formula protections, variable dictionaries or automatic error checking-thus necessitating extensive curation. Missingness is a common problem (Additional file 1: Figures S2 and S3). We retain potential predictor datasets by applying a '70-70-70' rule, namely ≥ 70% complete for ≥ 70% of k and ≥ 70% of t . Remaining missingness is resolved through imputation, either statistical or manual (i.e. based on contextual knowledge). In order to reduce the influence of outliers (some of which may be data entry errors), where appropriate we apply moderate smoothing or running means to time series. Details of predictors considered are presented in crisis-specific papers; Table 3 shows predictors included in the final models for each of the crises studied thus far.

Analysis steps Predictive model fitting
If the raw datasets of mortality surveys are mostly unavailable, only stratum-level regression is feasible (Additional file 1: page 9). If raw data for most mortality surveys are available, household-level regression may be undertaken. SMART surveys do not report the exact date of deaths within the recall period: therefore, we merge predictor with survey data by computing the former's In Somalia, some surveys were only representative of IDP settlements or urban areas within districts: we assume simplistically that district-wide predictor values also apply to these populations. We use a generalised linear model with weights w s (see above) and a quasi-Poisson distributional assumption to account for overdispersion in the death count outcome. The model's formula is thus: where d i,j,k,T r,s is the number of deaths in household i within survey cluster j and geographic stratum k occurring during the recall period T r of survey s , where r means recall; x 1,k,T r,s , x 2,k,T r,s , x 3,k,T r,s . . . x p,k,T r,s are the values of predictors x 1 ,x 2 , x 3 . . . x p averaged over the survey's recall period, and for stratumk ; β 1 , β 2 , β 3 . . . β p etc., are the corresponding fixed-effect linear coefficients; u j and u k are, respectively, random effects for cluster j and stratum k , assumed to follow a normal distribution with mean 0 ( u j ∼ N (0, σ u j 2 ) and u k ∼ N (0, σ u k 2 )), and capturing a plausible hierarchy of data as well as the repeated nature of observations; log i,j,k,T r,s is an offset to account for varying household person-time at risk (Additional file 1: Table S1); and ǫ i,j,k is the residual error not explained by the model. We validate candidate models for out-of-sample prediction through k-fold crossvalidation (CV; partition of data into folds is at the k, T r,s level given predictors are not specified below this level). We use the mean Dawid-Sebastiani score ( DSS ) [29] as a proper scoring rule appropriate for count outcomes to evaluate model fit on the training data and on CV (in the latter case, we take the mean DSS across all folds). After exploratory analysis, where possible we select between maintaining the continuous version of the predictor or categorising into bins, as well as alternative lags, based on the lowest DSS CV , and screen out predictors that are not significantly better-fitting than the null model based on an F-test p-value threshold. We fit each possible combination of remaining predictors ( Xpredictors = 2 X possible combinations) and shortlist candidate models whose DSS is within a given bottom quantile. We select the final set of predictors based on DSS CV , plausibility considerations and whether they are crisis-specific (see above). We test for plausible interactions and, lastly, add random effects, retaining the mixed model if its DSS CV improves on the fixed-effects alternative. In practice, a (2) log d i,j,k,T r,s = x 1,k,T r,s β 1 +x 2,k,T r,s β 2 +x 3,k,T r,s β 3 . . .+x p,k,T r,s β p +u j +u k +log � i,j,k,T r,s +ǫ i,j,k mixed model may be of limited utility if most prediction happens for person-time with new levels of the random effect (e.g. in geographic strata not covered by any survey used to train the model on).
As an example, we provide in Table 4 model coefficients and performance metrics for South Sudan, all computed based on observations and predictions aggregated at the k, T r,s level; predictive accuracy on cross-validation is shown in Fig. 4. As shown, the DSS, which, like other prediction scores, quantifies the error between observations and predictions, increases only slightly on CV, indicating that the model only marginally overfits data and is valid when used out-of-sample. There is also little evidence of predictive bias. Aside from moderately good performance, model coefficients support model validity: mortality increases with insecurity and where measles epidemics are present, but decreases if people are living in Protection of Civilians camps (in South Sudan, these places afforded relative safety and more intense humanitarian services) and as purchasing power improves.

Excess mortality estimation
In our framework, excess mortality estimation requires projecting the death toll in counterfactual no-crisis scenarios. These scenarios should specify counterfactual values for all crisis-specific predictors included in the final models, and for the population denominators. Several approaches to set counterfactual values may be used: (i) in the absence of a crisis, it may be assumed that certain predictors or types of displacement would have taken a zero value: for example, epidemics (e.g. cholera, measles) that are known to be associated with extreme food insecurity crises might not have occurred; similarly, no warrelated displacement would have happened; (ii) pre-crisis values of the predictors, if available, may be adopted as counterfactuals: for some predictors (e.g. market prices), we use the local average (e.g. the district median prior to the crisis' start); for others (e.g. rainfall), seasonality should also be considered; (iii) if no pre-crisis data are available, levels from reasonably comparable regions within the country that are not affected by the crisis may instead be considered. Table 5 shows 'most likely' counterfactual assumptions for the South Sudan analysis we previously conducted. To explore uncertainty in these assumptions, we also define reasonable best-and worstcase scenarios.
To propagate error in the model predictions of y A,k,t and y C,k,t into final estimates, we can set up a bootstrap simulation that, for a large number of iterations and each kt stratum, implements Eq. (1) by drawing random values from the models' normal distribution of log standard errors. Outputs of each iteration are then summed across all kt or for specific aggregations of interest (e.g. a single year within the crisis period), and point estimates and 95% confidence intervals are computed as the median, 2.5th and 97.5th percentiles of the resulting distribution of iteration sums. Note that if counterfactual population denominators are considerably different from the actuals (e.g. if large-scale displacement outside the region of interest has occurred), comparing actual and counterfactual mortality is fraught due to the difference in at-risk populations: we therefore scale excess death rates to the actual population denominators.

Sensitivity analyses
While a number of sensitivity analyses may be conducted to explore estimate uncertainty, we focus here on two particularly important issues. Population denominator uncertainty Most displacement data in crisis settings do not arise from statistically robust estimation methods. Over-reporting of population figures may occur if population counts are perceived as registration for relief allocation [30]. Conversely, insecurity and lack of connectivity may result in undetected population movements. We thus explore combinations of sensitivity values for both displacement and demographic estimates (as a ratio of true to reported values, where values < 1 indicate over-reporting, and vice versa), and re-run analysis accordingly.

Under-estimation of mortality in surveys
In previous South Sudan work, possible under-estimation of deaths among children under 5 years has been noted, as indicated by a low ratio of under 5 years to all-age deaths and low proportion of infant deaths (Table 6). Similar concerns have been raised in Yemen [31]. Under-reporting of infant and particularly neonatal deaths is plausible, due to stigma and/or emotional trauma associated with losing a young child or insufficient probing during questionnaire administration. We thus re-run analysis after augmenting the model training data (number of deaths and persontime within surveyed households) based on a varying assumed proportion of all deaths that are unobserved (Additional file 1: page 10).

Advantages of the method
The approach we have described can efficiently reconstruct the evolution of mortality across long retrospective periods and large areas, including where ground data collection would be unfeasible due to inaccessibility or the difficulty of asking households to recall events over a long recall period; in South Sudan, a setting with virtually no vital events registration, our application of the method generated evidence supporting a large excess death toll (about 380,000, half attributable to intentional injuries) attributable to 5 years of war, that might otherwise have evaded historical documentation forever. Somalia estimates (2010-2012) documented the impact of one of the worst famines in the past decades. Predictive models underlying the estimates have quantifiable external validity. While predictive power is ultimately their most important attribute, observing the directionality of coefficients can help to appraise internal validity, particularly if dose-response associations are noted. To our knowledge, no other studies have developed statistical models that predict with reasonable accuracy the crude or under 5 years death rate among some of the world's most vulnerable populations. A known challenge of crisis-attributable mortality estimation is defining an appropriate counter-factual: our method achieves this by generating non-crisis death tolls through the same statistical processes that result in the estimate of actual

Known limitations
The method's main limitations reflect sources of unknown error in input data: (i) error in the predictor data, for example arising from differences in the way predictors are measured over time or in different locations; random error would result in underestimation of associations between predictors and mortality, or 'regression dilution' in predictive terms; bias could cause over-or underestimation; (ii) bias in mortality data, e.g. due to problems with under-ascertainment of deaths (see above), which survey quality weights may reduce but not eliminate; (iii) nonparametric uncertainty around population and displacement estimates; (iv) demographic projections based on inaccurate assumed growth rates (both (iii) and (iv) will be discussed in a separate paper); (v) inappropriate assumptions on counterfactual conditions; and (vi) omission of excess mortality among people who migrate out of the affected region (e.g. refugees), or due to long-term impacts of the crisis beyond its resolution. These limitations imply that estimates should be interpreted with caution, with reference to confidence intervals and after thorough exploration of uncertainty through alternative counterfactual scenarios and sensitivity analyses. Perhaps the most important limitation among the above concerns how counterfactual conditions are specified. Varying predictor values to represent no-crisis conditions presents analogies with both interrupted time series [32] and growth models [33]. However, our approach quantifies the effects of multi-factorial and dynamic crises rather than a single public health  intervention implemented in a fairly stable setting: as such, our estimates rely heavily on a few model predictors faithfully representing a more complex system; moreover, counterfactual values for many predictors (e.g. food security, vaccination coverage) are not simply zero, as in the case of a counterfactually absent intervention, but rather some quantity relative to the actual levels.

Data requirements
The method's applicability is limited by the following data requirements: (i) at least some ground mortality information arising from a population-based method of recognised validity, e.g. a survey or prospective surveillance system. Such data should be granular in nature, i.e. representative of small geographic units and time periods (alternatively, one could use large-area surveys as long as the location of surveyed communities is reported in the dataset). Some documentation (e.g. survey reports) should be available to scrutinise methods; (ii) data covering the entirety or most of the person-time of interest for at least a few variables that may plausibly be expected to predict mortality. The system for measuring these predictors should have remained consistent over time. The pattern of data missingness should be mostly random: missingness clustered in specific areas or periods (particularly at the start or end of the time series, or where mortality data are also least available) makes imputation harder and more bias-prone; (iii) reasonable demographic estimates based on a census or similarly robust data collection exercise, performed no more than a few years prior to the analysis; in addition, data on displacement (including both the geographic unit of origin and that of arrival) covering most or all of the person-time should be available, or composable from existing reports and databases. Minimal data requirements, e.g. how many ground surveys or predictor variables are needed, are difficult to establish a priori: the predictive power of the model is a function not just of the amount of data, but also of the extent to which these data capture population variability and the local strength of correlation between predictors and mortality. As such, an additional limitation of the method is that the precision, and thus interpretability, of estimates arising from it may only become clear a posteriori.

Computational implementation
With the exception of step 2 (population denominator reconstruction), for which only crisis-specific analysis methods appear feasible, we have developed generic R analysis scripts that implement estimation steps for any crisis setting and generate output datasets, tables and graphs (see Additional file 1: pages 10-13 and https:// github. com/ franc escoc hecchi/ morta lity_ small_ area_ estim ation). The analyst interacts with these scripts through Microsoft Excel spreadsheets containing input datasets and various parameters to control the analysis.

Conclusions
We are currently testing an extension of the method for forecasting mortality over short time horizons of 3-6 months: this could provide an efficient means to do real-time estimation across the crisis-affected region, thereby generating information for decision-makers tasked with allocating humanitarian resources. Key requirements for such an application would be immediate predictor data sharing and standing capacity to implement analysis.
Other improvements to the method are worth exploring. As instances of its use accumulate, a Bayesian estimation framework specifying informative priors for key predictor coefficients (e.g. armed conflict intensity) may be attractive. Improvements to model fitting could include machine learning techniques or Bayesian model averaging; due to limited resources, we have not systematically compared our generalised linear model with any of these alternatives. Indeed, these further developments will require dedicated scientific resources and buy-in from humanitarian stakeholders who hold access to key input data.
Disclaimer Geographical names and boundaries presented in this paper are used solely for the purpose of producing scientific estimates, and do not necessarily represent the views or official positions of the authors, the London School of Hygiene and Tropical Medicine, any of the agencies that have supplied data for this analysis, or the donors. The authors are solely responsible for the analyses presented here, and acknowledgment of data sources does not imply that the agencies or individuals providing data endorse the results of the analysis.