Improving program targeting to combat early-life mortality by identifying high-risk births: an application to India

Background It is widely recognized that there are multiple risk factors for early-life mortality. In practice most interventions to curb early-life mortality target births based on a single risk factor, such as poverty. However, most premature deaths are not from the targeted group. Thus interventions target many births that are at not at high risk and miss many births at high risk. Methods Using data from the second wave of Demographic and Health Surveys from India and a hierarchical Bayesian model, we estimate infant mortality risk for 73.320 infants in India as a function of 4 risk factors. We show how this information can be used to improve program targeting. We compare our novel approach against common programs that target groups based on a single risk factor. Results A conventional approach that targets mothers in the lowest quintile of income correctly identifies only 30% of infant deaths. By contrast, using four risk factors simultaneously we identify a group of births of the same size that includes 57% of all deaths. Using the 2012 census to translate these percentages into numbers, there were 25.642.200 births in 2012 and 4.4% died before the age of one. Our approach correctly identifies 643.106 of 1.128.257 infant deaths while poverty only identifies 338.477 infant deaths. Conclusion Our approach considerably improves program targeting by identifying more infant deaths than the usual approach that targets births based on a single risk factor. This leads to more efficient program targeting. This is particularly useful in developing countries, where resources are lacking and needs are high. Electronic supplementary material The online version of this article (10.1186/s12963-018-0172-6) contains supplementary material, which is available to authorized users.


Introduction
Our approach has three steps. In step one, we estimate each infant's mortality risk using a Bayesian hierarchical model. Next, we cross-classify births into cells based on their risk factors combinations. Finally, we select cells with the highest estimated mortality risk to target births with interventions.
We gave details of these steps in this document.

Statistical model
We use a Bayesian hierarchical logistic regression model to predict the probability of mortality for an infant by one year after birth. The model includes the following risk factors listed in the main text: age of the mother at birth classified into three categories: under 19 years old, from 19 up to 35 years old, and older than 35 years old; the highest level of education achieved by the mother classified into four categories: no education, primary education, secondary education and higher education; wealth categorized into five wealth quintiles; and the 436 districts. We include all main effects as well as 2-way, 3-way, and 4-way interactions. Main effects and interactions are modeled as either fixed or random effects. If a particular effect, either main or interaction, has more than 20 unique levels, it is included as a random effect. For example, the main effect of district, and the 3-way interaction of 1 age by education by wealth are treated as random effects; otherwise effects are treated as fixed effects. The intercept was given a Gaussian prior with mean logit(0.1) and unit variance, where logit(p) = log((p)/(1 − p)). All other fixed effects were given standard Gaussian (0, 1) priors. All random effects were given mean zero Gaussian priors with unknown variances. For the variances, we use an inverse-gamma (ψ, ν), for the variance of each effect, with ψ = 10 − k, where k is the order of the interaction and ν = 10. This specification shrinks the random effects more towards zero for higher order interaction terms.
Models are fit using the MCMCglmm package in the R statistical environment. 1,2 We ran two chains for each model, using 230,000 simulations for each chain, letting the first 30,000 interactions be burn-in and then thinning every 25, giving 8,000 posterior samples from each chain. We assessed MCMC convergence using standard graphical and statistical procedures and convergence was deemed satisfactory.

Selection of the target population
We use the risk factor combinations to cross-classify infants mortality risk π i into cells c = 1, . . . , C that are amenable to program targeting. Allowable risk factor combinations are dependent on the particular allocation National: Policymakers need to pick the same risk factor combinations nationally.
In this scenario, the restriction is that risk factor combinations must be the same across all districts in the country. Thus we cross-classify our births into 3 * 4 * 5 = 60 cells. In this scenario policy makers face the greatest constraints in their ability to implement program targeting.
States: Policymakers may select different risk factor combinations by state but within states the combinations need to be the same.
Policy makers may target births with different risk factors in different states but, within a state, all districts target births with identical risk factors. Thus births are cross-classified into 3*4*5*27=1620 cells. Different states will contribute different percentages of births to the target population. The proportion of births from each state will be selected to maximize the mortality risk of the final national sample. Under this scenario, policy makers have more flexibility than in the National scenario but less flexibility than in the Districts scenario.

Extensions and conclusions
Our approach can be extended in a number of ways. First, the current model uses only four risk factors. However, demographic and health surveys collect a vast amount of information that can be potentially be useful in estimation mortality risk. Further work needs to be done to identify the number and types of risk factors combinations that optimize the results.
The allocation mechanism can also be further developed to ensure optimal results under complex sets of constraints.
Our approach suggests that the more flexibility policymakers have in selecting the target population, the greater the program targeting gains.
Most program targeting operates under constraints. Even under constraints, considering multiple risk factors increases the value of the program.