Overall performance
Figures3,4,5, and6 show the mean error, mean relative error, mean absolute error, and mean absolute relative error, respectively, observed for each method at each sample size. Additional file1: Table S1 also gives these values along with the corresponding 2.5th and 97.5th percentiles.
Overall, all methods are close to unbiased at sample sizes of at least 500, as measured by the mean error and mean relative error. At smaller sample sizes, however, the mean error and mean relative error for the standard complete birth history method becomes noticeably negative, suggesting that these methods tend to underestimate true mortality when sample sizes are small. This tendency is more pronounced when the period length used is smaller: the downward bias observed is more extreme for the one-year estimates than for the five-year estimates, which may reflect the greater pooling of information when longer period lengths are employed. The complete birth history moving window methods follow a similar pattern and are progressively more negatively biased at smaller sample sizes. Similar to the standard methods, for the moving window methods the downward bias is more pronounced when window lengths are shorter. Additionally, for the same window length, there is slightly more downward bias in the triangle weights version than in the flat weights version. In contrast, the summary birth history method appears to be almost unbaised even at small sample sizes.
The mean absolute error and mean absolute relative error of all methods increases noticeably as the sample size decreases. No method performs better on average than 73% error at sample size 10, 40% error at sample size 50, or 29% error at sample size 100. Across all sample sizes there is an ordering of performance among the methods, with moving window complete birth history methods and summary birth history methods generally performing better than standard complete birth history methods. Additionally, within each class of methods, methods with more pooling (e.g., longer periods or windows) have lower error at each sample size than methods with less pooling.
Stratified by true mortality
Figures7,8,9, and10 show the mean error, mean relative error, mean absolute error, and mean absolute relative error, respectively, observed for each method at each sample size stratified by true mortality level. Additional file1: Table S2 also gives these values along with the corresponding 2.5th and 97.5th percentiles.
For all methods there are some differences in the mean error and mean relative error at different levels of mortality. In general, there is a tendency to underestimate in high mortality settings and to overestimate in low-mortality settings. These differences are most pronounced for the summary birth history method and for the complete birth history methods with long (10- or 20-year) windows. For these methods, the differential is present at all sample sizes and is only slightly attenuated at higher sample sizes compared to the smallest sample sizes. For complete birth history methods with less smoothing, this pattern is less pronounced and is only present at sample sizes smaller than 500.
The magnitude of the error, as measured by the mean absolute error and mean absolute relative error, also varies by level of mortality for all methods. In relative terms (see Figure10), performance is always poorer when true mortality is lower. This is true for all methods, but the differential is greater in some–notably the standard complete birth history method–than in others and, broadly speaking, increases in magnitude as the sample size decreases. In non-relative terms (see Figure9), the magnitude of the error is greatest when true mortality is higher. As with the relative measure, the differential in performance between low- and high-mortality situations is greatest for the standard complete birth history method and the moving window birth history method with shorter windows. For all methods, this differential increases as the sample size decreases.
Stratified by time prior to survey
Figures11,12,13, and14 show the mean error, mean relative error, mean absolute error, and mean absolute relative error, respectively, observed for each method at each sample size stratified by time prior to survey. Additional file1: Table S3 also gives these values along with the corresponding 2.5th and 97.5th percentiles.
There are clear differences in the pattern of mean error and mean relative error at different times prior to survey for the summary birth history method, the moving window complete birth history methods with longer windows, and the moving window complete birth history methods with shorter windows, as well as the standard complete birth history methods. There are some differences in mean error and mean relative error between different time periods prior to survey for the summary birth history methods, but while this pattern is consistent across sample sizes, there is not a clear ordering in terms of time periods. In contrast, for complete birth history methods with substantial smoothing (i.e., moving window versions with 10- or 20-year windows), there’s a prominent pattern of over predicting mortality in the most recent period and under predicting mortality in the most distant period. As with the summary birth histories, this pattern is relatively consistent across sample sizes. For the complete birth history methods with less smoothing (i.e., windows and periods of no more than five years) there is little difference in mean error or mean relative error at larger sample sizes, but at smaller sample sizes, the downward bias previously noted in the overall analysis is increasingly concentrated in earlier time periods.
The magnitude of the error, as measured by mean absolute error and mean absolute relative error, varies by time prior to survey for all methods. In absolute terms, all methods perform better for more recent time periods than for more distant time periods. The difference is greatest for the standard complete birth history methods with one- or two-year periods and, in general, decreases as the amount of smoothing increases. The same general pattern is observed in relative terms for most methods, though the difference between the most recent time periods and time periods in the middle of the range are less obvious. In both cases the gap in magnitude of error between different time periods is present at all sample sizes, though it gets somewhat larger as the sample size decreases.
Multiple surveys
Figures15,16,17, and18 show the mean error, mean relative error, mean absolute error, and mean absolute relative error, respectively, observed for all methods at each sample size stratified by the number of surveys included. The results shown for a single survey are the same as those shown in Figures3,4,5, and6 and are included here for comparison. The results shown for multiple surveys are based on complete birth history methods where data are pooled across these multiple surveys within a given country. Additional file1: Table S4 also gives these values along with the corresponding 2.5th and 97.5th percentiles.
For very small samples, additional surveys appear to alleviate some of the downward bias, as measured by the mean error and mean relative error, exhibited by all of the complete birth history methods. Additionally, there is an obvious decline in the magnitude of the error, as measured by the mean absolute error and the mean absolute relative error, as the number of surveys increases: on average, the mean absolute relative error decreases by 22 percentage points at sample size 10, 20 percentage points at sample size 50, and 15 percentage points at sample size 100 when five surveys are available as compared to a single survey. Both of these effects almost certainly reflect that the overall sample size increases as the number of surveys increases. It is not surprising that the effect of adding additional surveys is in some ways similar to the effect of increasing the sample size in a single survey.