Adaptation of a probabilistic method (InterVA) of verbal autopsy to improve the interpretation of cause of stillbirth and neonatal death in Malawi, Nepal, and Zimbabwe

Background Verbal autopsy (VA) is a widely used method for analyzing cause of death in absence of vital registration systems. We adapted the InterVA method to extrapolate causes of death for stillbirths and neonatal deaths from verbal autopsy questionnaires, using data from Malawi, Zimbabwe, and Nepal. Methods We obtained 734 stillbirth and neonatal VAs from recent community studies in rural areas: 169 from Malawi, 385 from Nepal, and 180 from Zimbabwe. Initial refinement of the InterVA model was based on 100 physician-reviewed VAs from Malawi. InterVA indicators and matrix probabilities for cause of death were reviewed for clinical and epidemiological coherence by a pediatrician-researcher and an epidemiologist involved in the development of InterVA. The modified InterVA model was evaluated by comparing population-level cause-specific mortality fractions and individual agreement from two methods of interpretation (physician review and InterVA) for a further 69 VAs from Malawi, 385 from Nepal, and 180 from Zimbabwe. Results Case-by-case agreement between InterVA and reviewing physician diagnoses for 69 cases from Malawi, 180 cases from Zimbabwe, and 385 cases from Nepal were 83% (kappa 0.76 (0.75 - 0.80)), 71% (kappa 0.41(0.32-0.51)), and 74% (kappa 0.63 (0.60-0.63)), respectively. The proportion of stillbirths identified as fresh or macerated by the different methods of VA interpretation was similar in all three settings. Comparing across countries, the modified InterVA method found that proportions of preterm births and deaths due to infection were higher in Zimbabwe (44%) than in Malawi (28%) or Nepal (20%). Conclusion The modified InterVA method provides plausible results for stillbirths and newborn deaths, broadly comparable to physician review but with the advantage of internal consistency. The method allows standardized cross-country comparisons and eliminates the inconsistencies of physician review in such comparisons.


Background
Cause-specific mortality data on childhood deaths are vital to identify health needs, compare patterns of death across populations, plan and monitor interventions, and inform policy [1][2][3]. In high-income countries, all births and deaths are enumerated through vital registration systems, and death certification is routine. In low-income settings, most births and deaths occur at home, death certificates are rarely available, and vital registrations are often inadequate or nonexistent [2][3][4].
Verbal autopsies (VAs) provide an alternative means of identifying probable causes of death through interviews with a close caregiver of the deceased, in which information about the circumstances, signs, and symptoms leading to death are gathered. VAs have limitations: they require recollection of events at the time of death, rely on understanding and reporting of signs and symptoms by interviewees, and may be influenced by interviewer skills. The data must also be interpreted to establish a diagnosis [5]. Conventionally, VA questionnaires are read by two or more physicians separately and one or more causes of death are attributed. A cause of death is established when physicians' opinions correspond; otherwise diagnosis is reconsidered and discussed with or without the input of an additional physician. If no agreement is reached, the cause of death is considered undetermined. Repeatability of this diagnostic process over time and in different settings is problematic, particularly when diagnostic criteria are not standardized amongst different clinicians [6][7][8]. In some situations, disagreement between physicians is such that a large proportion of causes of death remain indeterminate [7,9]. Moreover, the method is costly, time-consuming, and requires the involvement of physicians who are an already overstretched resource in low-income countries [6,10].
Despite these limitations, VAs are useful in estimating cause-specific mortality fractions (CSMFs) in population studies [6,8,11]. They have been used extensively in epidemiological studies, household surveys, and sentinel surveillance sites, and have been piloted in subsamples from sample registration systems. There remains a need to refine the technique to make it more comparable, repeatable, easy to apply, and cost-effective.
VA questionnaires devised by the WHO attempt to standardize the interview process, but more standardized approaches to interpreting VA data are needed. Hierarchical algorithms and computer programs based on logistic regression have been used, but they are difficult to standardize across cultures and age groups and can usually only identify single causes of death [12,13]. InterVA uses a probabilistic method and has been tested in a range of settings for deaths at all ages, across sexes, and for maternal deaths [14,15].
We describe the refinement and evaluation of InterVA to identify causes of death in the perinatal (stillbirths and neonatal deaths in the first seven days) and neonatal periods, using data from three different settings: Malawi, Zimbabwe, and Nepal.

Methods
Based on Bayes' theorem [16], the InterVA model calculates the probability of a set of causes of death given the presence of circumstances, signs, and symptoms (collectively called 'indicators') reported in VA interviews. The method is described in detail elsewhere [10,17]. Briefly, a finite number of causes of death are assigned to a predefined matrix of estimated probabilities of occurrence. The presence of indicators (Table 1) modifies the predefined probabilities of each cause of death upward or downward using Bayes' theorem according to the formula where p (C|I) indicates the probability of a cause of death (C) given the presence of the indicator (I) and p (I/!C) is the probability of I in the absence of C [10].
Probabilities of final-cause categories increase or decrease in relation to specific signs and symptoms reported in the VA interview. If symptoms are not reported, the probabilities do not change. The program is available online http://www.InterVA.net. Users can enter the data as single cases or in batches, and the model generates up to three causes of death and their respective likelihoods. Prior to the current study, the probability matrix consisted of 34 cause of death classifications and 104 indicators [10].

Data sources
To explore the performance of InterVA in different settings, 734 stillbirth and neonatal VAs were obtained from rural areas of three low-income countries.
In Malawi (Mchinji District), 169 stillbirth and neonatal VAs were collected from 2004 to 2005, as part of a cluster-randomized study evaluating two community interventions to improve maternal and child health [18]. Although designed for the study, the VA questionnaire was comparable in structure and content with the subsequent WHO questionnaire [19]. Completed questionnaires were interpreted independently by two Malawian pediatricians, who assigned up to three causes of death on the basis of a hierarchical classification and algorithm [20]. They were able to use alternative diagnoses where necessary. Discrepancies were resolved by discussion and, if consensus could not be reached, the cause of death was recorded as indeterminate.
In Nepal (Makwanpur district), 385 VAs were collected from 2001 to 2003 as part of a cluster-randomized study of a community intervention to improve maternal and child health [21]. The questionnaire was again comparable with the subsequent WHO tool. Questionnaires were interpreted independently by two Nepalese pediatricians, who each assigned a single cause of death on the basis of the same algorithm used in Malawi. Discrepancies were resolved after review by a third physician.
The third data source included 180 neonatal deaths from Zimbabwe, identified as part of a maternal and perinatal mortality study conducted in 2007 and 2008 [22]. Neonatal VAs were conducted using the WHO tool. Questionnaires were interpreted independently by two physicians, who each assigned a single cause of death using the International Classification of Diseases and Related Health Problems (ICD-10). Discrepancies were resolved after review by a third physician ( Table 2).

Refining InterVA
Initial refinement of the InterVA model was based on 100 (59%) physician-reviewed VAs from Malawi. The use of these data for refinement was pragmatic in that, at the time of refinement, they were the only data available. Data from the VA questionnaire were entered in the InterVA model, which assigned causes of death and associated likelihoods. The open histories, where the caregiver reported the events leading to death, were coded and also entered in the model. CSMFs obtained using the original InterVA and physician review were compared. CSMFs were calculated from the InterVA output as the sum of the likelihoods computed for each single cause of death category, divided by the sum of the likelihoods for all causes. For the calculation of CSMFs from physician-review data, if more than one cause of death was assigned, each was considered as a proportion of the total death. Therefore, if a single cause of death was assigned by all physicians, or if only one was available, it explained 100% of that death. If more than one cause of death was attributed, each contributed an equal proportion of the total 100%. For example, if both reviewing physicians assigned prematurity as a cause of death and one of them also assigned sepsis, then prematurity contributed 75% and sepsis 25% to the death. In this way, every available physician diagnosis contributed to the cause-specific mortality profile, avoiding a potential loss of information and bias that might have been introduced by using consensus diagnoses alone. Fifty-four neonatal-death questionnaires were analyzed with the original InterVA model. Stillbirths were initially excluded, as InterVA was not designed to classify them. The results of this first analysis identified the need for greater differentiation in the model among causes of death in the neonatal period. The InterVA indicators and matrix probabilities were therefore reviewed for clinical and epidemiological coherence by a pediatrician-researcher (SV) and an epidemiologist involved in the development of InterVA (EF). Following this initial refinement, InterVA was evaluated by comparing caseby-case diagnoses with physician-assigned diagnoses for the same 100 VA cases, as well the population-level CSMFs. A process of refinement and comparisons with physician review was undertaken until InterVA elicited mortality profiles deemed by the researchers to be plausible and satisfactorily comparable to physician review.
Evaluating the refined InterVA model The modified InterVA model was evaluated by comparing population-level CSMFs derived from the two methods of interpretation (physician review and InterVA) for a further and hitherto-untouched 69 VAs from Malawi, 385 from Nepal, and 180 from Zimbabwe. A diversity of data sources was chosen to assess the performance of InterVA in a range of settings. Comparisons of population-level CSMFs were considered paramount as InterVA is intended as a public health tool for health monitoring and program evaluation, rather than for use in clinical settings. Nevertheless, individual level, caseby-case comparisons between physician diagnoses and InterVA were also conducted and the kappa statistic for interrater agreement was calculated to further evaluate the InterVA against the only available alternative method in our populations [23].

Ethical considerations
The

Refining InterVA
InterVA was modified to include two extra cause of death categories: fresh stillbirth and macerated stillbirth.
To define the stillbirth diagnoses and differentiate among possible causes of stillbirth and neonatal death, nine further indicators were added to the model. The resulting modifications to the specific indicators and cause of death categories included in InterVA are shown in Table 1. As these are extra entities in the model, they run in parallel to the existing indicators and causes without directly affecting them.
To compare the InterVA output and physician diagnoses in the three settings, some rationalization between the physician-assigned causes and the causes obtained from InterVA was necessary; therefore, causes of death not included in the InterVA classification were grouped as "other." Similarly, infectious causes of neonatal deaths, including sepsis, pneumonia, and meningitis were grouped together into an "infection" category, since the possibilities of clinically distinguishing them in newborn infants is difficult. There were no cases of neonatal tetanus. The resulting CSMFs for InterVA and physician review of the 100 VA cases from Malawi used to refine the model are shown in Figure 1. In 73% of cases, at least one of the InterVA diagnoses agreed with at least one of the physician diagnoses (kappa 0.60 (95% confidence interval [CI]: 0.57, 0.70)).
CSMFs derived from InterVA and physician review in Malawi, Zimbabwe, and Nepal are illustrated in Table 3. In Malawi and Zimbabwe, the rank order of causes of death was identical when derived from InterVA or physician review. In Nepal, the most common cause of death according to InterVA was perinatal asphyxia, while it was neonatal infections according to physicians. Prematurity was diagnosed more commonly by InterVA than by physicians in Nepal and Zimbabwe. InterVA detected a higher proportion of neonatal infections than physicians in Zimbabwe, but a lower proportion in Nepal.

Stillbirths
The proportion of total stillbirths identified by the two methods of VA interpretation was similar in all three settings. Data from Malawi and Nepal allowed for a more detailed comparison of the relative proportions of fresh and macerated stillbirths (Table 4).

Multicountry mortality comparison
Considering the above evaluations and taking the refined model to be adequate for the purposes of characterizing cause compositions of neonatal mortality for population health planning and monitoring, a threecountry comparison of neonatal cause-specific mortality was conducted (Figure 2). It showed some differences in cause compositions of neonatal deaths, particularly in Zimbabwe compared to the other two settings. In Zimbabwe, the proportions of preterm births and deaths due to infection were higher (44%) than in Malawi (28%) or Nepal (20%).

Discussion
The deadline for the Millennium Development Goals (MDGs) is less than five years away and the need to quantify childhood mortality, understand its causes, and assess the effects of proposed interventions are central to MDG4. Neonatal deaths contribute about 40% of under-5 mortality globally [24]. A recent evaluation of the INDEPTH network of Health and Demographic Surveillance Sites [25] calls for all sites to use InterVA for coding of causes of death, since such approaches represent "the only viable strategy to produce timely and comparable cause of death statistics" [26]. Our study has revised the InterVA method for verbal autopsy to improve its ability to identify causes of stillbirth and newborn death and tested it in three populations.
In this study, physician review was used as a reference standard to compare InterVA. The use of physician review was the only alternative source of cause of death assessment for our study populations. This choice has limitations, however. Physicians are influenced by their experience, perception, and interpretation of local epidemiology [23,27]. Moreover, they mostly use the open history to reach a decision and may not account consistently for all the indicators. Sensitivity and specificity of physician review compared with hospital diagnosis in neonatal populations varied between 64% and 74% in a recent study [20] and concerns about inter-and intrarater reliability are well described [23].
An alternative to physician diagnoses is the use of hospital records. Hospital diagnoses have been used to establish sensitive, specific, and positive predictive values of VA diagnoses [8,12,20]. The main pitfall of hospital diagnoses in developing countries, particularly in rural settings, is that the CSMF of deaths occurring in hospitals are likely to be different from the ones in communities [23]. There is therefore the risk of increasing precision of an interpretative method, defined as its ability to reproduce hospital diagnoses in the population where it is tested. This would not necessarily produce results that are correct when used in populations where Vergnano et al. Population Health Metrics 2011, 9:48 http://www.pophealthmetrics.com/content/9/1/48 access to hospitals and health care is limited. Moreover, the ability to recognize, recall, and report signs of illnesses may be different among hospital users and nonhospital users.
The results of InterVA as compared with physician review showed an almost identical ranking of causes of death. However, differences exist. Some of these differences can be explained by the way the model was constructed. Prematurity, for example, was over-diagnosed by InterVA in Zimbabwe and Nepal. This probably resulted from using a dataset where clinicians were allowed more than a single cause of death to refine InterVA. In fact, when multiple causes of death are allowed, prematurity is more likely to be listed as a coexisting cause of death than when a single cause is selected [28]. The model did not include "other" as a cause of death and would have classified such causes of death in one of the available diagnoses.
InterVA over-diagnosed neonatal infections compared with physician review in Zimbabwe, while the opposite happened in Nepal. This inconsistency could be due to the interpretation of signs by different physicians.
Alternatively, it could be due to the selection of a priori probabilities. Greater understanding of the way physicians decide to value or ignore signs and symptoms may help in future refinements and evaluations of InterVA.
Stillbirths were included for practical and public health reasons. Although globally there are about 3.2 million stillbirths per year, reliable statistics are lacking [29]. This information gap has to be addressed. About half of perinatal deaths are accounted for by stillbirths [29]. The refinements including stillbirths in the model eliminate the need to differentiate between live births and stillbirths before processing VA data, making the method more suitable for use in large surveys. The separation between fresh and macerated stillbirths is relevant, as prevention strategies are different. The comparisons between InterVA and physician review in Malawi and Nepal suggest that InterVA can differentiate the two categories, although, as with neonatal deaths, there may be room for further refinement.
Case-by-case agreement was moderate in all datasets, however it was lower for Zimbabwe compared to Nepal and Malawi. The new indicators and matrix probabilities have been chosen and modified on the basis of the personal experience of the researchers, and subsequently tested and modeled on a subset of the Malawi data. There is a risk, therefore, that the tool may be too closely modeled on a sub-Saharan African setting (although the results from Nepal do not support this) or on a InterVA PR Figure 1 Cause-specific mortality fractions from InterVA and physician review (PR) for the 100 VA cases from Malawi used to develop and refine the model. Note to Figure 1: Other causes include "jaundice," "multiple pregnancies," "maternal causes," "hypothermia," and "hypoglycemia."  There may be important epidemiological and social explanations for the difference in the CSMF in Malawi, Zimbabwe, and Nepal. However, even if the interpretation of verbal autopsy data by InterVA was consistent, methodological variability in other aspects of VA may have contributed to the observed cause distribution. Indeed, the close comparability of CSMF between Malawi and Nepal may to some degree reflect common data capture processes that differ from those used in Zimbabwe. It is possible that in Nepal and Malawi, the populations were part of research areas and might have been sensitized to recognize, describe, and recall signs of neonatal diseases, while in Zimbabwe the community was part of a government surveillance and may have responded differently. Nevertheless, this is a reality of all VA studies conducted in research settings. Use of lay (in Malawi and Nepal) versus health-professional (in Zimbabwe) interviewers and their gender may also have had an impact on data capture. This highlights the need for further methodological research into the effects of other aspects of VA. It is likely that a number of strategies and international collaborations will be necessary to ensure the success of such investigations.
The modified version of InterVA for stillbirths and neonatal deaths produced plausible results when compared with physicians' opinions but had the advantage of being completely internally consistent, allowing standardized comparisons of data from different countries. Ultimately, standardized methods are essential and their application and evaluation in a wide range of settings is encouraged. Through wider application, the strengths and weakness of InterVA, and VA in general, will become more apparent, thereby better informing the application and public health utility of surrogate methods for measuring mortality in absence of vital registration systems.  We thank the Maimwana office and field staff who made this project