Main findings
The proportion of children for whom a complete date-of-birth was recorded in the EN-INDEPTH survey data differed by site, but the regression analysis identified the same factors associated with recording an incomplete birthdate across all five sites. The date of death was less likely to be recalled than date-of-birth in all sites, but variation was large.
Despite comparable mortality rates in HDSS and survey data in three of the sites, the number of births and deaths differed markedly between the two data sources. Furthermore, the proportion of HDSS births matched to EN-INDEPTH survey data was considerably lower for children who had died than for children who had survived. Heaping of age-at-death at full weeks during the neonatal period and at 12 months was common in the EN-INDEPTH data, but less so in the HDSS data.
Consistencies with other studies
Day of birth has only been reported in survey data since the introduction of DHS-7 in 2013 [18]. The proportion reporting imprecise birthdates has been described for two DHS-7 surveys: in Malawi (2015–2016), 4.6% birthdates were incomplete, while the proportion in Tanzania (2016–2017) was 1.6% [19]. Incomplete birthdates ranged from 1 to 7% in our sites, but the populations followed through HDSS sites may be more accustomed to reporting dates.
A prior study from IgangaMayuge indicates that the number of pregnancies identified in the year prior to the retrospective survey was higher than captured in the HDSS, although for longer recall periods, a higher number of pregnancies were missed through a retrospective pregnancy history survey [6]. A similar pattern for births may explain the linking pattern observed across sites here: births more than 2 years prior to the survey were less likely to be matched than births within the past 2 years. Lower matching rates for births of children who subsequently died than for children still alive have also been observed previously in Matlab [5]. In a survey collecting pregnancy and birth histories in Matlab in 1994, deaths which occurred >5 years ago and deaths at early ages were particularly likely to be omitted from the pregnancy survey [5]. Similar patterns for age-at-death were observed in the sites with the largest number of HDSS deaths (Bandim, Matlab and Kintampo) in the present study (Additional files 3.8A, 3.8D and 3.8E).
In addition to omission of births in the EN-INDEPTH survey, lack of matching could also be caused by displacement if births recorded in the EN-INDEPTH survey were reported to have occurred before or after the real date. While we have no direct measure of the displacement since the linking was at the level of the mother rather than the individual child, displacement has in a prior study been more common for children who had died [5].
In line with prior evidence, we observed that the age-at-death was heaped, preferentially reported at full weeks [7] and around 12 months of age [9] as reflected in peaks at 7, 14 and 21 days (Fig. 3) and at 12 months (Fig. 4). This heaping was only observed in the African sites and was more marked in the survey than HDSS data. While the lack of a 12-month peak in Matlab may be explained by the underlying different distribution of child mortality with child mortality increasing after 12 months of age due to drowning [20], this does not explain why there was less heaping in the neonatal period in this one site. The higher rate of maternal literacy and a lower number of children per woman in Matlab than in the other sites may explain why there is less heaping [21].
Interpretation
The child mortality estimates from the five HDSS sites are not necessarily representative of the underlying child mortality in the HDSS, as the sample in three sites was not chosen at random to focus efforts on the women with births in the past 5 years [14]. Thus, the estimated mortality levels should not be interpreted as the HDSS mortality, but rather as the mortality levels for children born to the same subset of women.
The HDSS data do not include records of births that were never part of the HDSS population, i.e. HDSS data do not include births to interviewed women that occurred before the woman moved into the HDSS, where the child did not in-migrate with the mother since it had died or because the child was living elsewhere. In Dabat, the HDSS surveillance data were truncated approximately 18 months prior to the EN-INDEPTH survey. In the Bandim HDSS, only children followed prospectively in the HDSS data contribute time-at-risk in the mortality estimates, due to the assumption that deaths are less likely to be reported to the interviewers than surviving children [22] (Additional file 2.1, Additional file 2.2). Thus, the number of births and deaths in the HDSS data is a subsample of the real birth history of the women.
The number of HDSS births in this subset of women should thus be lower than the EN-INDEPTH data, which attempt to capture the full history of all live births, and the proportion registered should be lower in Bandim than the other sites. Looking at the Bandim numbers in Table 2, the ratio of HDSS birth to survey births of 0.92 among women resident in the same location for the past 5 years, does support that Bandim HDSS may capture only a sample of the births. However, as the other HDSSs seek to capture all births to resident women, even if the pregnancy had not been registered and the child would no longer be part of the population after registration, we expected that the ratios of HDSS to survey births would be higher in the other sites. With the exception of Matlab, this was not the case (Table 2). Censoring of the Dabat data likely explains much of the lower numbers in Dabat, but the 20 and 44% lower numbers in IgangaMayge and Kintampo in the HDSS compared with the survey among women stating residence in the same location for the past 5 years, are unlikely to be made up only by children who have never been living in the HDSS. Thus, some births are likely missed by the HDSS also in the sites, which assume full information of all births to resident women. If the proportion of HDSS-unrecorded children is independent of survival status, the number of deaths should be lower by a similar proportion as the number of births. In the four HDSSs, which assume full information of all births to women under HDSS surveillance, the ratio of the two ratios ‘deaths in HDSS to deaths in survey data’ vs ‘births in HDSS to birth in survey data’ was less than one (Table 2), which may indicate that under-reporting could be more severe for deaths. When limiting the analysis to women who had continuously lived at their present location for the past 5 years, this indication of relative underreporting of deaths to births in the HDSS relative to the survey was weakened in the four sites assuming full information on births to women under surveillance. In contrast, in Bandim, there was an indication that more deaths were captured in the HDSS than in the survey, the ratio of ratios being > 1.
A ratio of ratios of 1 either indicates that mortality is estimated correctly in both HDSS and survey data or that both estimates are off by a similar magnitude. Without a gold standard, making firm conclusions on either interpretation is not possible. However, when looking at the age distribution of under-5 mortality, the HDSS estimates indicate that 42, 44 and 44% of under-5 mortality was neonatal in Dabat, IgangaMayuge and Kintampo, respectively. The HDSS estimates were substantially higher in Matlab (57%), which had intensive surveillance with bimonthly visits and pregnancy testing after missed periods [23] and in Bandim (63%) (Additional file 3.1), where mortality estimates are based on prospective surveillance (Additional file 2.1). Thus some HDSSs also likely underestimate early mortality, especially when intervals between follow-up rounds are long: deaths in children under surveillance are captured, but early deaths among children born between rounds are likely to be missed [24].
Since the HDSS data are per definitions a subset of the real birth history, all HDSS-recorded births should have a matched birth record in the survey data had the precision of the birth dates been high in both sources. However, when the HDSS reported births were linked to the survey data (+/−1 months), only between 51% and 89% of the HDSS records were matched to child records in the EN-INDEPTH survey data. In all five sites, the probability of matching an HDSS birth to a birth in the EN-INDEPTH survey was lower if the child had died. This is consistent with the survey being more likely to miss births of children who have subsequently died and thus underestimates the real child mortality. However, as described above, misreported date-of-birth for children who died may also contribute.
Strengths and limitations
This is to our knowledge the first study investigating variation in child mortality data measured through retrospective survey data across a range of countries and in populations where survey data could also be linked to prospective data on mortality at the level of births. In spite of having estimates from two different data sources, we do not have a gold standard. Some births and deaths may be missed through either or both sources, because they were not reported to the interviewer. Thus, there is no gold standard to evaluate either of the measures against.
Potentially, though the women to be interviewed were selected from a listing of women registered in the HDSS data, another woman may have been interviewed instead of the listed woman. Such errors may have occurred since the common way of identifying a woman in the HDSS sites is ‘mother of xx’; for the present listing, we could not use these relations and that may have hampered the identification. Nevertheless, this is likely to be rare and does not explain the difference in matching by survival status of the child.
Implications
While the HDSS does not capture the true full birth history, our analyses indicate that the survey data likely missed some births too and in particular births where the child had subsequently died. Thus, both mortality from some HDSSs and estimates from surveys may systematically underestimate child mortality. Since the retrospective survey interviews are conducted to fill data gaps, studying omissions is challenging. We found that the EN-INDEPTH survey underestimated mortality compared with the Bandim HDSS, but we did not observe this pattern in the other sites, where full information of all births to registered women is assumed. In light of the different HDSS definitions of when a child is under surveillance (Additional file 2.1 and 2.2), future studies should ensure that both the data where full information on all births is assumed and the additional data necessary to perform analyses limited to prospective follow-up is available in the same populations. Sex-ratio at birth has been suggested as an indicator of potential omissions [25], but none of our analyses indicated that sex was associated with the likelihood of linking, which it should have been if girls or boys were selectively underreported. Sex-ratios alone are therefore not enough to reassure completeness of the survey data.
The precision of age-at-death is important in establishing the proportion of deaths having occurred below a specific age and thus to inform global mortality estimates [12]. With changes in mortality patterns, departures from previously modelled fractions of mortality in younger age groups may be introduced, but could be overlooked in the absence of empirical data. Thus, improved measurement of infant and neonatal mortality is necessary to monitor progress towards the mortality targets of the Sustainable Development Goals [26].
Establishing the precision of the survey data may, furthermore, open up for new use of this type of data. If the survey data is sufficiently accurate, DHS/MICS data could be useful in studying the effects of ‘shocks’—e.g. effects of environmental exposures, pandemics or other events fixed at specific time points both before and after birth, and therefore be relevant for targeting interventions. If the imprecision increases markedly with the recall period, assessing potential effects of events several years prior to the survey may be impossible. The consistent finding that the sex of the child was not associated with the indicators of precision opens up to the use of survey data to study interventions, which may affect boys and girls differently [27] or differences in access to care, which may cause sex-differential mortality patterns [28].