- Open Access
- Open Peer Review
Error and bias in under-5 mortality estimates derived from birth histories with small sample sizes
Population Health Metricsvolume 11, Article number: 13 (2013)
Estimates of under-5 mortality at the national level for countries without high-quality vital registration systems are routinely derived from birth history data in censuses and surveys. Subnational or stratified analyses of under-5 mortality could also be valuable, but the usefulness of under-5 mortality estimates derived from birth histories from relatively small samples of women is not known. We aim to assess the magnitude and direction of error that can be expected for estimates derived from birth histories with small samples of women using various analysis methods.
We perform a data-based simulation study using Demographic and Health Surveys. Surveys are treated as populations with known under-5 mortality, and samples of women are drawn from each population to mimic surveys with small sample sizes. A variety of methods for analyzing complete birth histories and one method for analyzing summary birth histories are used on these samples, and the results are compared to corresponding true under-5 mortality. We quantify the expected magnitude and direction of error by calculating the mean error, mean relative error, mean absolute error, and mean absolute relative error.
All methods are prone to high levels of error at the smallest sample size with no method performing better than 73% error on average when the sample contains 10 women. There is a high degree of variation in performance between the methods at each sample size, with methods that contain considerable pooling of information generally performing better overall. Additional stratified analyses suggest that performance varies for most methods according to the true level of mortality and the time prior to survey. This is particularly true of the summary birth history method as well as complete birth history methods that contain considerable pooling of information across time.
Performance of all birth history analysis methods is extremely poor when used on very small samples of women, both in terms of magnitude of expected error and bias in the estimates. Even with larger samples there is no clear best method to choose for analyzing birth history data. The methods that perform best overall are the same methods where performance is noticeably different at different levels of mortality and lengths of time prior to survey. At the same time, methods that perform more uniformly across levels of mortality and lengths of time prior to survey also tend to be among the worst performing overall.
Under-5 mortality, the probability of death before age 5 (denoted 5 q 0), is an important overall indicator of child health. In countries without functioning systems to continuously register births and deaths, estimates of under-5 mortality are generally derived from survey and/or census data, particularly in the form of birth histories where women are asked for information about the survival of their children.
Birth history data are routinely used for estimating mortality at the national level. It is often of interest, however, to estimate under-5 mortality at a subnational level or to stratify by some other characteristic (e.g., income or maternal education). Subnational or stratified analyses with survey data are complicated by small sample sizes: in the case of surveys in particular, the sample size for a given subnational unit or stratum is often quite small, and it is not apparent if the estimates derived from these limited data are useful. While a number of subnational analyses with birth history data have been undertaken using census data[1–3] where small sample sizes are less of a concern, existing subnational mortality estimates using survey data tend to be at a relatively coarse level (often provinces or regions) to avoid small samples[2, 4].
Two different types of birth histories are routinely collected. In a complete birth history (CBH), women are asked for information about the date of birth and, if applicable, the age at death of each child they have given birth to. Because complete birth histories contain information about dates and ages for individual children they allow for direct calculation of under-5 mortality. In a summary birth history (SBH), women are asked only about the total number of children they have given birth to and the number of these children who are still alive. Summary birth histories lack information about dates and ages for individual children and demographic models must be employed to estimate under-5 mortality from these data. Although complete birth histories are more straightforward to analyze they are less frequently undertaken than summary birth histories, which are far less labor-intensive and time-consuming to collect.
In this paper, we aimed to determine how much error and/or bias can be expected in under-5 mortality estimates derived from both types of birth histories at various small sample sizes. To this end, we carried out a data-based simulation study using Demographic and Health Survey (DHS) data wherein we treated each survey as a population with known mortality and sampled from this population to mimic surveys with small sample sizes. We examined how estimates derived from summary birth history data and complete birth history data (analyzed using several alternative methods) compared in terms of error and bias at increasingly small sample sizes. Further, we performed stratified analyses to explore in more detail how the performance of each method relates to the underlying true level of mortality and the time prior to data collection.
This analysis made use of all DHS publicly available as of May 2012 that contain birth histories for all women, regardless of marital status, a total of 152 surveys in 62 countries. Table1 provides a full listing of all DHS included in this analysis.
Birth history methods
Summary birth history method
We analyzed summary birth history data using updated models and methods described in Rajaratnam, et al.[6, 7]. The combined version of the maternal age cohort, time since first birth cohort, maternal age period, and time since first birth period methods was used to generate annual estimates for the 25 years preceding each survey.
Standard complete birth history method
To analyze complete birth history data we first expanded the record for each child such that there was a record of each month that a child lived and was observed under age 5: this will be less than the full 60 months if the child died before age 5 or if the mother was surveyed before the child reached age 5. For each child-month of life we indicated whether the child was alive or dead at the end of the month and then assigned the child-month to the appropriate time period and age group. Time periods were non-overlapping and equally sized and were assigned starting at the time of the most recent survey and moving back in time. The ages considered were 0 months, 1–11 months, 12–23 months, 24–35 months, 36–47 months, and 48–59 months; these age groupings were designed such that mortality is expected to be reasonably constant across the age range. From these data we calculated the monthly probability of survival in each time period for each age group by calculating the proportion of child-months in a given time period and age group that end with the child alive. These monthly probabilities of survival were converted to the probability of surviving the entire age interval under consideration by raising them to a power equal to the number of months in the age interval. Under-5 mortality was then calculated by subtracting from one the product of all of the age-specific survival probabilities. This process generated a single estimate of under-5 mortality for each time period which was then assigned to the midpoint of the period. Different length periods can be used, with longer periods providing more pooling of information across time but also producing less frequent estimates. For this analysis, we tested periods of length one, two, and five years. It is possible to pool data from multiple surveys in the same country and estimate mortality from the combined data. Except when explicitly stated otherwise, the non-pooled version of the complete birth history method is used throughout this analysis.
Moving window complete birth history method
As an alternative to the above, the same procedures were carried out except that instead of having non-overlapping time periods and generating one estimate per period, an estimate was generated for each year incorporating all data from a window around that year. This ‘moving window’ method used each observed child-month multiple times and allowed for pooling of information across time while still producing annual estimates. For each year T, all child-months were weighted before finding the monthly survival probability for each age group as described in the previous section. Two different kinds of weights were used. In one version, all data within the window were treated equally: for a window of length x years, all child-months that occurred between x/2 years before time T and x/2 years after time T were assigned a weight of 1, and all other child-months a weight of 0. We refer to these as ‘flat’ weights. In the second version, the weights decreased linearly with time as child-months became further away from T, reaching 0 at x/2 years on either side of T. We refer to these as ‘triangle’ weights. Different length windows can be used, with wider windows providing more pooling of information across time. For this analysis, we tested window lengths of five and 10 years for both variants and 20 years for the triangle-weighted variant. Figure1 shows the weights that would be applied for estimates in 2000 (top row) and 2005 (bottom row) using a five-year, 10-year, or 20-year window (first, second, and third column, respectively).
We validate these birth history analysis methods using the following procedure:
For each survey, we calculated ‘true’ under-5 mortality by applying the standard method described above with two-year periods and then linearly interpolating to produce a continuous time-series.
Five hundred samples each of sizes 10, 50, 100, 500, and 1,000 women were drawn without replacement from each survey, for a total of 2,500 samples from each survey.
Estimates of under-5 mortality were derived for each survey in each of the resulting 2,500 samples using the summary birth history method and each of the complete birth history methods described above.
The estimates ( ) for each of the 2,500 samples from each method were matched to the true under-5 mortality (5 q 0) by survey and year and then the error, relative error, absolute error, and absolute relative error were calculated as shown in Table 2 for each sample, method, survey, and year. The mean of each error metric was calculated for every sample size and method across all samples and surveys.
To illustrate this procedure further, Figure2 shows examples of the birth history estimates generated from subsamples of one survey (Zambia, 2007). For each method and at three sample sizes (10, 100, and 1000) the birth history series derived from five of the samples are shown alongside the ‘true’ mortality level (shown in black) as calculated from the full sample. Each of the error metrics is based on the comparison of the sample curves (in color) to the ‘true’ mortality curve (in black).
The mean error and mean relative error were intended to indicate whether or not estimates from a given method are biased: since over and underestimates cancel in these metrics, if methods are unbiased (that is, if overestimates and underestimates of the same magnitude are equally likely) the mean error and the mean relative error should be approximately zero. The mean absolute error and mean absolute relative error were intended to capture the extent to which estimates of under-5 mortality can differ from true under-5 mortality; these metrics measure the magnitude of the error, regardless of the direction.
In addition to this overall analysis, we also carried out two stratified analyses. First, country-years were stratified by level of true mortality (<50, 50–100, 100–150, 150–200, >200 deaths per 1,000 births) and the mean of each of the above error metrics was calculated for each method and sample size for each set of country-years. Second, country-years were stratified by the time prior to the survey, 0–1, 2–3, 4–5,..., and 24–25 years prior to the survey, and the mean of each of the above error metrics was calculated for each method and sample size for each set of country-years. These stratified analyses were meant to test if the methods perform consistently well at different levels of mortality and for different lengths of time prior to a survey.
Finally, in order to test how the performance of the complete birth history methods changes when multiple surveys are available and can be pooled, we repeated the above validation procedure on all countries with multiple surveys but pooled both across the survey data when calculating ‘true’ under-5 mortality in step 1 and when estimating birth histories from the 2,500 samples of each survey in step 3. The 2,500 samples were still drawn at the survey level, so for a country with multiple surveys the final number of women is proportional to the number of surveys (e.g., when the sample size for each survey is 10, the total number of women for a given country will be 20 if there are two surveys available, 30 if there are three surveys available, and so on). Consequently, when calculating the mean of each error metric, we stratify by the number of surveys.
All analyses were carried out R, version 2.15.2. Code is available from the authors upon request.
Figures3,4,5, and6 show the mean error, mean relative error, mean absolute error, and mean absolute relative error, respectively, observed for each method at each sample size. Additional file1: Table S1 also gives these values along with the corresponding 2.5th and 97.5th percentiles.
Overall, all methods are close to unbiased at sample sizes of at least 500, as measured by the mean error and mean relative error. At smaller sample sizes, however, the mean error and mean relative error for the standard complete birth history method becomes noticeably negative, suggesting that these methods tend to underestimate true mortality when sample sizes are small. This tendency is more pronounced when the period length used is smaller: the downward bias observed is more extreme for the one-year estimates than for the five-year estimates, which may reflect the greater pooling of information when longer period lengths are employed. The complete birth history moving window methods follow a similar pattern and are progressively more negatively biased at smaller sample sizes. Similar to the standard methods, for the moving window methods the downward bias is more pronounced when window lengths are shorter. Additionally, for the same window length, there is slightly more downward bias in the triangle weights version than in the flat weights version. In contrast, the summary birth history method appears to be almost unbaised even at small sample sizes.
The mean absolute error and mean absolute relative error of all methods increases noticeably as the sample size decreases. No method performs better on average than 73% error at sample size 10, 40% error at sample size 50, or 29% error at sample size 100. Across all sample sizes there is an ordering of performance among the methods, with moving window complete birth history methods and summary birth history methods generally performing better than standard complete birth history methods. Additionally, within each class of methods, methods with more pooling (e.g., longer periods or windows) have lower error at each sample size than methods with less pooling.
Stratified by true mortality
Figures7,8,9, and10 show the mean error, mean relative error, mean absolute error, and mean absolute relative error, respectively, observed for each method at each sample size stratified by true mortality level. Additional file1: Table S2 also gives these values along with the corresponding 2.5th and 97.5th percentiles.
For all methods there are some differences in the mean error and mean relative error at different levels of mortality. In general, there is a tendency to underestimate in high mortality settings and to overestimate in low-mortality settings. These differences are most pronounced for the summary birth history method and for the complete birth history methods with long (10- or 20-year) windows. For these methods, the differential is present at all sample sizes and is only slightly attenuated at higher sample sizes compared to the smallest sample sizes. For complete birth history methods with less smoothing, this pattern is less pronounced and is only present at sample sizes smaller than 500.
The magnitude of the error, as measured by the mean absolute error and mean absolute relative error, also varies by level of mortality for all methods. In relative terms (see Figure10), performance is always poorer when true mortality is lower. This is true for all methods, but the differential is greater in some–notably the standard complete birth history method–than in others and, broadly speaking, increases in magnitude as the sample size decreases. In non-relative terms (see Figure9), the magnitude of the error is greatest when true mortality is higher. As with the relative measure, the differential in performance between low- and high-mortality situations is greatest for the standard complete birth history method and the moving window birth history method with shorter windows. For all methods, this differential increases as the sample size decreases.
Stratified by time prior to survey
Figures11,12,13, and14 show the mean error, mean relative error, mean absolute error, and mean absolute relative error, respectively, observed for each method at each sample size stratified by time prior to survey. Additional file1: Table S3 also gives these values along with the corresponding 2.5th and 97.5th percentiles.
There are clear differences in the pattern of mean error and mean relative error at different times prior to survey for the summary birth history method, the moving window complete birth history methods with longer windows, and the moving window complete birth history methods with shorter windows, as well as the standard complete birth history methods. There are some differences in mean error and mean relative error between different time periods prior to survey for the summary birth history methods, but while this pattern is consistent across sample sizes, there is not a clear ordering in terms of time periods. In contrast, for complete birth history methods with substantial smoothing (i.e., moving window versions with 10- or 20-year windows), there’s a prominent pattern of over predicting mortality in the most recent period and under predicting mortality in the most distant period. As with the summary birth histories, this pattern is relatively consistent across sample sizes. For the complete birth history methods with less smoothing (i.e., windows and periods of no more than five years) there is little difference in mean error or mean relative error at larger sample sizes, but at smaller sample sizes, the downward bias previously noted in the overall analysis is increasingly concentrated in earlier time periods.
The magnitude of the error, as measured by mean absolute error and mean absolute relative error, varies by time prior to survey for all methods. In absolute terms, all methods perform better for more recent time periods than for more distant time periods. The difference is greatest for the standard complete birth history methods with one- or two-year periods and, in general, decreases as the amount of smoothing increases. The same general pattern is observed in relative terms for most methods, though the difference between the most recent time periods and time periods in the middle of the range are less obvious. In both cases the gap in magnitude of error between different time periods is present at all sample sizes, though it gets somewhat larger as the sample size decreases.
Figures15,16,17, and18 show the mean error, mean relative error, mean absolute error, and mean absolute relative error, respectively, observed for all methods at each sample size stratified by the number of surveys included. The results shown for a single survey are the same as those shown in Figures3,4,5, and6 and are included here for comparison. The results shown for multiple surveys are based on complete birth history methods where data are pooled across these multiple surveys within a given country. Additional file1: Table S4 also gives these values along with the corresponding 2.5th and 97.5th percentiles.
For very small samples, additional surveys appear to alleviate some of the downward bias, as measured by the mean error and mean relative error, exhibited by all of the complete birth history methods. Additionally, there is an obvious decline in the magnitude of the error, as measured by the mean absolute error and the mean absolute relative error, as the number of surveys increases: on average, the mean absolute relative error decreases by 22 percentage points at sample size 10, 20 percentage points at sample size 50, and 15 percentage points at sample size 100 when five surveys are available as compared to a single survey. Both of these effects almost certainly reflect that the overall sample size increases as the number of surveys increases. It is not surprising that the effect of adding additional surveys is in some ways similar to the effect of increasing the sample size in a single survey.
This analysis suggests that all methods of analyzing birth history data perform poorly at sample sizes of fewer than 100 women, with large expected errors and, for some methods, noticeable downward bias. There are large differences in performance between models, however, and even at higher sample sizes (500 and 1000 women), the magnitude of the expected error for many methods is still unacceptably high.
Unfortunately, there is not an obvious ‘best’ method. Overall, summary birth histories and moving window complete birth history methods with very long windows provide estimates with the smallest magnitude error and least bias, especially at the smallest sample sizes. In the case of the former, the better performance may be a result of the models that underlie the method which could, to some extent, constrain more outlying estimates from being generated. In the case of the latter, the better performance, particularly in terms of the expected magnitude of the error, is likely a result of the increased pooling of information across time. These same methods, however, do not perform uniformly across levels of mortality, and in particular, they tend to overestimate in low-mortality settings and underestimate in high-mortality settings. It is likely that the same strengths that underlie the better performance of these models overall are also at least partly responsible for these pitfalls. In the case of the summary birth histories, the models may be constraining final estimates too closely to the mean, biasing unusually low or unusually high estimates toward this mean. In the case of the moving window complete birth history methods, the increased pooling also runs the risk of smoothing out real trends in mortality and biasing the final estimates. Similarly, the moving window complete birth history methods with very long windows do not perform uniformly across time periods prior to the survey: they tend to overestimate in more recent periods and underestimate in more distant periods, and the magnitude of the error increases noticeably the earlier the estimate. Under-5 mortality has generally decreased with time, so it is likely that differences in the level of mortality at different time periods are at least partially driving the differences in performance observed in this analysis at different time periods (the reverse is also possible). Beyond this effect, however, it is also likely that the magnitude of the error is larger in earlier time periods because only the oldest women captured in the survey report children that far in the past and consequently the total number of children observed is smaller in earlier time periods compared to later time periods. The methods with less smoothing (i.e., the complete birth history methods with period or window lengths of no more than five years) are far less problematic with respect to differential bias by level of mortality or time prior to survey, but the magnitude of overall error from these methods is much larger than the other methods.
The results of this analysis suggest that the birth history methods considered are of limited utility for estimating mortality in small samples and, in particular, for making meaningful comparisons among geographic units or strata. Given the value of these types of estimates, however, investment in other data sources may be warranted. In particular, sample registration schemes may be a useful alternative to both surveys, with the problems enumerated here, and full vital registration systems, which are expensive and technically challenging to maintain. Alternatively, research into adapting existing small area methods frequently used in epidemiology and other fields[10, 11] for use with birth histories could prove useful. These models explicitly account for unusually high sampling error in estimates derived from small samples and attempt to overcome this challenge by exploiting spatial and temporal relatedness. Several authors have already used birth history data to inform these models, though the focus of these analyses has generally been on the relationship between other factors and mortality and not on prediction of mortality levels for specific areas or subgroups[12–16].
This analysis has several limitations. The stratified analyses by mortality level and time prior to survey do not control for each other, making it difficult to conclusively disentangle the two effects. Further, birth histories, like all survey data, are subject to a number of data errors, including, among others, recall bias and age misreporting. We treat the reported population in each survey as truth and don’t consider the additional effect on error or bias that any of these errors could introduce. It is well documented that these types of errors can impact the reliability of mortality estimates, but future research could consider specifically how these errors interact with the problems due to sample size explicitly considered here. Microsimulation–where synthetic populations are created by simulating births and deaths given set mortality and fertility schedules–could provide useful mechanisms for more fully exploring the issues described here.
Nonetheless, this study boasts several strengths. The use of empirical data, rather than simulated populations, ensures that the mortality and fertility relationships are realistic and representative of the types of scenarios where birth history data are most likely to be collected. Additionally, in contrast to previous research[17, 18] which has examined errors in birth history estimates and compared different methods of analyzing birth history data, we estimate error by comparing to a true gold standard (in this case the full sample) rather than using statistical techniques such as the Jackknife to estimate error. Finally, this study compares a large number of different methods for analyzing available data and makes explicit the comparison between these methods at different sample sizes, which should prove useful to analysts deciding between different methods given a particular dataset.
Overall, the results of this analysis suggest that birth histories in all but the largest of surveys are of limited utility for making subnational estimates or estimates across many strata. Censuses may be more useful for this purpose, having much larger sample sizes, but generally only include summary birth history information if they include birth history information at all. Given the value of subnational and stratified analyses of under-5 mortality and the limitations of the methods examined here, further research into methods for using existing data sources and investment in alternative data sources is warranted. In particular, small area methods, which address the issue of small sample sizes by borrowing strength across geographic units, may be useful when analyzing birth history data at a subnational level.
Bangha M, Simelane S: Spatial differentials in childhood mortality in South Africa: evidence from the 2001 census. Afr Popul Studies 2007,22(2):3-21.
Storeygard A, Balk D, Levy M, Deane G: The global distribution of infant mortality: a subnational spatial view. Popul, Space Place 2008,14(3):209-229. 10.1002/psp.484
Bauze AE, Tran LN, Nguyen KH, Firth S, Jimenez-Soto E, Dwyer-Lindgren L, Hodge A, Lopez AD: Equity and geography: the case of child mortality in Papua New Guinea. PLoS ONE 2012,7(5):e37861. 10.1371/journal.pone.0037861
Singh A, Pathak PK, Chauhan RK, Pan W: Infant and child mortality in India in the last two decades: a geospatial analysis. PLoS ONE 2011,6(11):e26856. 10.1371/journal.pone.0026856
MEASURE DHS: Demographic and health surveys. Calverton: Macro International, Inc.; http://www.measuredhs.com
Rajaratnam J, Tran L, Lopez A, Murray C: Measuring under-five mortality: validation of new low-cost methods. PLoS Med 2010,7(4):e1000253. 10.1371/journal.pmed.1000253
Lozano R, Wang H, Foreman K, Rajaratnam J, Naghavi M, Marcus J, Dwyer-Lindgren L, Lofgren K, Phillips D, Atkinson C, Lopez A, Murray C: Progress towards millennium development goals 4 and 5 on maternal and child mortality: an updated systematic analysis. Lancet 2011,378(9797):1139-1165. 10.1016/S0140-6736(11)61337-8
Garenne M, Gakusi E: Health transitions in sub-Saharan Africa: overview of mortality trends in children under 5 years old (1950–2000). Bull World Health Organ 2006,84(6):470-478. 10.2471/BLT.05.029231
R Development Core Team: R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing; 2012.
Banerjee S, Carlin BP, Gelfand AE: Hierarchical Modeling and Analysis for Spatial Data No. 101 in Monographs on Statistics and Applied Probability. Boca Raton: Chapman & Hall/CRC; 2004.
Cressie N, Wikle CK: Statistics for Spatio-Temporal Data. Hoboken: Wiley; 2011.
Gemperli A, Vounatsou P, Kleinschmidt I, Bagayoko M, Lengeler C, Smith T: Spatial patterns of infant mortality in Mali: the effect of malaria endemicity. Am J Epidemiol 2004, 159: 64-72. 10.1093/aje/kwh001
Adebayo SB, Fahrmeir L, Klasen S: Analyzing infant mortality with geoadditive categorical regression models: a case study for Nigeria. Econ Human Biol 2004,2(2):229-244. 10.1016/j.ehb.2004.04.004
Kandala NB, Ghilagaber G: A geo-additive Bayesian discrete-time survival model and its application to spatial analysis of childhood mortality in Malawi. Qual Quantity 2006,40(6):935-957. 10.1007/s11135-005-3268-6
Kazembe LN, Appleton CC, Kleinschmidt I: Spatial analysis of the relationship between early childhood mortality and malaria endemicity in Malawi. Geospatial Health 2007, 2: 41-50.
Kazembe LN, Mpeketula PMG: Quantifying spatial disparities in neonatal mortality using a structured additive regression model. PLoS ONE 2010,5(6):e11180. 10.1371/journal.pone.0011180
Korenromp EL, Arnold F, Williams BG, Nahlen BL, Snow RW: Monitoring trends in under-5 mortality rates through national birth history surveys. Int J Epidemiol 2004,33(6):1293-1301. 10.1093/ije/dyh182
Pedersen J, Liu J: Child mortality estimation: appropriate time periods for child mortality estimates from full birth histories. PLoS Med 2012,9(8):e1001289. 10.1371/journal.pmed.1001289
This research was supported by core funding from the Bill & Melinda Gates Foundation. An earlier version of this paper was presented at the 2013 Population Association of America Annual Meeting.
The authors declare that they have no competing interests.
LD-L conceived of the study, carried out the analysis, and wrote the first draft of the manuscript. EG, AF, and HW participated in design of the study, participated in interpretation of results, and edited the manuscript. All authors read and approved the final manuscript.
Electronic supplementary material
Additional file 1:Mean, 2.5th percentile, and 97.5th percentile of the error, relative error, absolute error, and absolute relative error for all methods and sample sizes. Table S1 gives results for the overall analysis; Table S2 gives results for the analysis stratified by mortality level; Table S3 gives results for the analysis stratified by time prior to survey; and Table S4 gives results for the analysis stratified by number of surveys. (XLSX 82 KB)
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.