Adapting the log quadratic model
We adapted the log quadratic model between all-cause mortality for adult mortality and under-five all-cause mortality developed by Wilmoth et al. [24]. Specifically, we employed the following log quadratic model,
$$\begin{aligned} \log (_{x}q_{0}) = a_{x} + b_{x} \log (_{5}q_{0}) +c_{x} \log (_{5}q_{0})^2 + v_{x}k , \end{aligned}$$
(1)
where \(_{x}q_{0}\) is the probability of dying from birth up to age x, x is a preselected age less than 5 years, and log(\(_{5}q_{0}\)) is the logarithm of the probability of dying between birth and age five. We chose x to predict \(_{x}q_{0}\) in age groups 0–6, 0–27 days, and 0–5, 0–11, 0–23, and 0–59 months to be consistent with the Lives Saved Tool [16], where the upper age limit represents completed days or months, so that 0–59 months is equivalent to the standard under-five mortality rate. The variability of age-specific mortality at a given \(_{5}q_{0}\) is represented by \(v_{x}\), estimated from the singular value decomposition of the matrix of residuals from the quadratic equation above. The parameter k represents the deviation from the average pattern in a life table at a given \(_{5}q_{0}\) and can be tailored to fit \(_{x}q_{0}\) for a specific age group x or to match the mortality over a given age range. The parameter \(v_{x}\) is estimated for a reference set of probabilities, while k is selected to best fit estimated mortality in a specific life table.
We used probabilities of dying from birth to age x for modeling rather than the probability of dying in each age interval (\(_{x}q_{0}\) rather than \(_{n}q_{y}\), where n is the length of the age interval and \(x = y + n\)). Probabilities of dying from birth to age x have the advantage of being more stable. However, violations are possible where predicted \(_{x}q_{0}\) may be less than \(_{y}q_{0}\) for \(0< y < x\), contrary to the interpretation of \(_{x}q_{0}\). In the event that these violations are observed, \(_{y}q_{0}\) will be restricted by \(_{x}q_{0}\) such that \(_{y}q_{0}\) < \(_{x}q_{0}\) for \(0< y < x\). In practice, this type of violation was only observed when \(_{x}q_{0}\) and \(_{y}q_{0}\) were very similar, when observed mortality between ages x and y was zero or close to zero. We also focused on the probabilities \(_{x}q_{0}\) for specific age groups rather than the mortality rate from birth up to age x (\(_{x}m_{0}\)) as employed by Wilmoth and colleagues. Even though these probabilities (\(_{x}q_{0}\)) and rates (\(_{x}m_{0}\)) are closely related, we chose the probability \(_{x}q_{0}\) as we observed a smaller coefficient of variation than \(_{x}m_{0}\) for under five mortality in our empirical data, which we expected to yield greater model stability. In addition, probabilities \(_{x}q_{0}\) are generally more available than rates \(_{x}m_{0}\) [26].
Parameters \(a_{x}\), \(b_{x}\), \(c_{x}\), and \(v_{x}\) are estimated using Eq. (1), where \(_{5}q_{0}\) and \(_{x}q_{0}\) for each age group of interest are available from source data, and k is estimated for each life table where \(_{5}q_{0}\) is known but \(_{x}q_{0}\) for smaller age groups is not. We expanded this model to U5ACSM using
$$\begin{aligned} \log (_{x}q_{0,c}) = a_{x,c} + b_{x,c} \log (_{5}q_{0,c}) +c_{x,c} \log (_{5}q_{0,c})^2 + v_{x,c} k_c \end{aligned}$$
(2)
for age x and cause c. Here we focused on children of ages 0–6, 0–27 days, and 0–5, 0–11, 0–23, and 0–59 months based on epidemiological evidence [12, 13, 15, 16]. For each of these age groups, we illustrate the proposed method through pneumonia-specific and injury-specific mortality. We first fitted the adapted log quadratic model using empirical data, then used simulations to address potential measurement issues in the empirical data.
Empirical validation
We illustrated our adapted methods using mortality data from the Chinese Maternal and Child Health Surveillance System (MCHSS), a sample registration system for child mortality, from the period 1996 until 2015 [27]. The MCHSS was designed to be representative in each of six strata of China, defined by geography (East, Mid, and West) and urbanicity (urban or rural). This system was expanded in 2009 to cover additional population in the age of falling maternal mortality. The livebirths and under-five deaths monitored by this system over time are shown in Additional file 1. Over 80% of causes of death registered in this system were ascertained by medical certification, and the remainder by verbal autopsy [27]. From these data, we aimed to predict \(_{x}q_{0}\) and \(_{x}q_{0,c}\) for pneumonia and injury.
We used cross validation to examine model performance, estimating these parameters using five of six total strata over the period 1996–2015, and with the resulting parameter values estimated all-cause \(_{x}q_{0}\) in the hold out strata using Eq. (1). We examined the average absolute difference between observed and predicted \(_{x}q_{0}\), \(| \widehat{_{x}q_{0}} \, - \, _{x}q_{0} |\) as well as the average absolute relative difference in the sixth held out stratum,
$$\begin{aligned} \frac{|\widehat{_{x}q_{0}} \, - \, _{x}q_{0} |}{_{x}q_{0}} \end{aligned}$$
We estimated \(_{x}q_{0}\) first for the average age-specific mortality profile at a given \(_{5}q_{0}\) (when parameter k is zero). We then estimated k for each life table to match exactly the all-cause neonatal mortality rate for each year in the hold out strata, for another estimate of \(_{x}q_{0}\). We compare these estimated \(_{x}q_{0}\) to what is typically done when age specific mortality is not available, assuming a constant mortality rate within 0–27 days and 1–59 months [3]. We have labeled these estimates and their associated results as the standard approach.
For U5ACSM, we used observed \(_{5}q_{0,c}\) and \(_{x}q_{0,c}\) to estimate \(a_{x,c}\), \(b_{x,c}\), \(c_{x,c}\) , and \(v_{x,c}\) for deaths due to pneumonia and injury, repeating the above analyses. We used \(_{5}q_{0,c}\) as in Eq. (2) to predict a typical \(_{x}q_{0,c}\) for age x, and we also used neonatal cause-specific mortality when estimating k for predicting all \(_{x}q_{0,c}\) in a specific life table. Pneumonia and injury were selected because they are among the most common causes of mortality for children under-five in China across the study time period and with a known variation across age [12, 13, 18]. Following the GATHER guideline for international health statistics, data and software to implement the proposed method are available at https://github.com/jamieperin/U5ACSM [28].
Simulation validation
We also conducted a simulation study to examine the log quadratic model while minimizing the data quality concern associated with the China mortality surveillance system. We generated \(_{5}q_{0,c}\) to resemble observed pneumonia mortality in China strata-years, such that \(_{5}q_{0,c}\) was uniformly distributed and ranging from 2 deaths per 1000 live births up to 40 deaths per 1000 live births. We then estimated parameters \(a_{x,c}\), \(b_{x,c}\), \(c_{x,c}\) , and \(v_{x,c}\) from life tables over six strata in 1996–2015 for pneumonia-specific mortality in China and the log quadratic relationship in Eq. (2). Age- and cause-specific probabilities were generated with varying degrees of error \(e_{x,c}\), such that
$$\begin{aligned} \log (_{x}q_{0,c}) = a_{x,c} + b_{x,c} \log (_{5}q_{0,c}) +c_{x,c} \log (_{5}q_{0,c})^2 + v_{x,c} k + e_{x,c} , \end{aligned}$$
(3)
where \(e_{x,c}\) is a normally distributed error term for each age group as observed in China, or with twice the error as in China. We compared parameter estimates to the known parameter values used in (3). We also used parameter estimates from simulated data to predict for an unobserved life table whose pneumonia-specific under five mortality is known in order to estimate the corresponding pneumonia mortality in fine age groups. This estimate of an unobserved life table is the primary interest of the log quadratic model and not the parameter estimates of \(a_{x,c}\), \(b_{x,c}\), \(c_{x,c}\) , and \(v_{x,c}\). We selected three values of k to represent settings with low, middle, and high neonatal mortality due to pneumonia. We examined prediction error in estimated \(_{x}q_{0,c}\) for these hypothetical life tables across 1000 simulations.