Performance of the Tariff Method: validation of a simple additive algorithm for analysis of verbal autopsies
© James et al; licensee BioMed Central Ltd. 2011
Received: 14 April 2011
Accepted: 4 August 2011
Published: 4 August 2011
Verbal autopsies provide valuable information for studying mortality patterns in populations that lack reliable vital registration data. Methods for transforming verbal autopsy results into meaningful information for health workers and policymakers, however, are often costly or complicated to use. We present a simple additive algorithm, the Tariff Method (termed Tariff), which can be used for assigning individual cause of death and for determining cause-specific mortality fractions (CSMFs) from verbal autopsy data.
Tariff calculates a score, or "tariff," for each cause, for each sign/symptom, across a pool of validated verbal autopsy data. The tariffs are summed for a given response pattern in a verbal autopsy, and this sum (score) provides the basis for predicting the cause of death in a dataset. We implemented this algorithm and evaluated the method's predictive ability, both in terms of chance-corrected concordance at the individual cause assignment level and in terms of CSMF accuracy at the population level. The analysis was conducted separately for adult, child, and neonatal verbal autopsies across 500 pairs of train-test validation verbal autopsy data.
Tariff is capable of outperforming physician-certified verbal autopsy in most cases. In terms of chance-corrected concordance, the method achieves 44.5% in adults, 39% in children, and 23.9% in neonates. CSMF accuracy was 0.745 in adults, 0.709 in children, and 0.679 in neonates.
Verbal autopsies can be an efficient means of obtaining cause of death data, and Tariff provides an intuitive, reliable method for generating individual cause assignment and CSMFs. The method is transparent and flexible and can be readily implemented by users without training in statistics or computer science.
KeywordsVerbal autopsy validation gold standard Tariff Method cause of death mortality cause-specific mortality fractions
Verbal autopsies (VAs) are increasingly being used to provide information on causes of death in demographic surveillance sites (DSSs), national surveys, censuses, and sample registration schemes [1–3]. Physician-certified verbal autopsy (PCVA) is the primary method used to assign cause once VA data are collected. Several alternative expert-based algorithms [4–6], statistical methods [7–9], and computational algorithms  have been developed. These methods hold promise, but their comparative performance needs to be evaluated. Large-scale validation studies, such as the Population Health Metrics Research Consortium (PHMRC) , provide objective information on the performance of these different approaches.
The main limitation to date of PCVA is the cost and feasibility of implementation. Finding and training physicians to read VAs in resource-poor settings has proven challenging, leading in some cases to long delays in the analysis of data [1, 11]. In some rural areas with marked shortages of physicians, assigning the few available physicians to read VAs may have a very high opportunity cost in terms of health care delivery. Lozano et al.  have also shown that there is a substantial idiosyncratic element to PCVA related to physician diagnostic performance. In contrast, some automated methods (whether statistical or computational in nature) have demonstrated performance similar to PCVA [7, 8], but some users may be uncomfortable with the "black box" nature of these techniques. It is often very difficult for users to unpack how decisions on a cause are reached. Furthermore, the actual statistics and mechanics that form the basis for cause assignments are difficult to access and understand due to the myriad computations involved. One method, the King-Lu method, is a direct cause-specific mortality fraction (CSMF) estimation approach [13, 14] that does not assign cause to specific deaths, making it even harder for a user to understand how the cause of death is being determined.
Empirical methods that use the observed response pattern from VAs in a training dataset have an advantage over expert judgment-based methods in that they capture the reality that some household respondents in a VA interview may respond "yes" to some items even when they would not be considered part of the classical clinical presentation for that cause. For example, 43% of households report coughing as a symptom for patients who died from a fall, and 58% of households report a fever for patients who died from a road traffic accident. However, a limitation of many existing methods such as Simplified Symptom Pattern and Random Forest is that they may not give sufficient emphasis to pathognomonic signs and symptoms. For example, if 20% of patients dying of epilepsy report convulsions, and only 2% of nonepilepsy patients report convulsions, a statistical model will not assign this symptom as much significance as these data imply. Put another way, Bayesian methods such as InterVA and Symptom Pattern and statistical methods such as King-Lu direct CSMF estimation assume that the probability of signs and symptoms conditional on true cause is constant, but in reality it is not. There are subsets of patients who may have signs and symptoms that are extremely informative, and other subsets with less clearly defined signs/symptoms.
In this paper, we propose a simple additive approach using transparent, intuitive computations based on responses to a VA instrument. Our premise is that there ought to be highly informative signs or symptoms for each cause. Our goal is to develop an approach to cause of death estimation based on reported signs and symptoms that is simple enough to be implemented in a spreadsheet so that users can follow each step of cause assignment. We illustrate the development of this approach and then use the PHMRC gold standard VA validation study dataset  to assess the performance of this approach compared to PCVA, which is current practice.
Logic of the method
The premise behind the Tariff Method is to identify signs or symptoms collected in a VA instrument that are highly indicative of a particular cause of death. The general approach is as follows. A tariff is developed for each sign and symptom for each cause of death to reflect how informative that sign and symptom is for that cause. For a given death, based on the response pattern in the VA instrument, the tariffs are then summed yielding an item-specific tariff score for each death for each cause. The cause that claims the highest tariff score for a particular death is assigned as the predicted cause of death for that individual. The tariffs, tariff scores, and ranks are easily observable at each step, and users can readily inspect the basis for any cause decision.
where tariffij is the tariff for cause i, item j, xij is the fraction of VAs for which there is a positive response to deaths from cause i for item j, median(xij) is the median fraction with a positive response for item j across all causes, and interquartile range xij is the interquartile range of positive response rates averaged across causes. Note that as defined, tariffs can be positive or negative in value. As a final step, tariffs are rounded to the nearest 0.5 to avoid overfitting and to improve predictive validity.
where xjk is the response for death k on item j, taking on a value of 1 when the response is positive and 0 when the response is negative, and w is the number of items used for the cause prediction. It is key to note that for each death, a different tariff score is computed for each of the possible causes. In the adult module of the PHMRC study, for example, there are 46 potential causes and so there are 46 different tariff scores based on the tariffs and the response pattern for that death. For actual implementation, we use only the top 40 items for each cause in terms of tariff to compute a tariff score. The set of 40 items used for each cause prediction are not mutually exclusive, though cumulatively across all cause predictions the majority of items in the PHMRC VA questionnaire are used for at least one cause prediction.
Implementation of the Tariff Method
We use the PHMRC gold standard VA training datasets to develop tariffs and then to assess the performance of Tariff compared to PCVA. Details on the design of this multicountry study are provided elsewhere . The study collected 7,836 adult, 2,075 child, and 2,631 neonatal deaths with rigorously defined clinical diagnostic and pathological criteria. For each death, the PHMRC VA instrument was applied. The resulting VA dataset consists of responses to symptoms and signs that may be expressed as dichotomous, continuous, and categorical variables. The survey instrument also included items for the interviewer to transcribe medical record text from the household and to take notes during the "open response" portion of the interview, when the respondent explains anything else that he/she feels is relevant. The text from these responses has been converted to dichotomous items. The continuous and categorical variables, such as "how long did the fever last?" were also converted to dichotomous variables. These data processing steps are described in more detail elsewhere . We use the dichotomized training datasets to develop tariffs. We then compute tariff scores for each death in the test and train datasets and assign a cause of death to each death in the test dataset. We compute chance-corrected concordance and CSMF accuracy  on the cause of death predictions in the test dataset to avoid in-sample analysis. Chance-corrected concordance is a sensitivity assessment that measures the method's ability to correctly determine individual cause of death. CSMF accuracy is an index that measures a VA method's ability to estimate a population's cause-specific mortality fractions and is determined by calculating the sum of the absolute value of CSMF errors compared to the maximum possible error in CSMFs. Examination of the tariff score ranks can yield a second, third, etc., most likely cause of death. We also compute partial chance-corrected concordance for up to six causes . We undertake separate analyses for adult, child, and neonatal deaths. It is important to note that for each train-test data split from the PHMRC study, we compute a new set of tariffs based only on that particular training set. In other words, in no case are test data used in the development of the tariff that is applied to that particular test dataset.
We have repeated the development of tariffs and tariff scores using household recall of health care experience (HCE) and excluding these variables  in order to estimate the method's performance in settings where access to health care is uncommon. HCE items capture any information that the respondent may know about the decedent's experiences with health care. For example, the items "Did [name] have AIDS?" or "Did [name] have cancer?" would be considered HCE items. Text collected from the medical record is also classified as HCE information. For example, the word "malaria" might be written on the decedent's health records and would be considered an HCE item. Based on the validation dataset collected by the PHMRC , we were able to estimate causes of death and evaluate the method for 34 causes for adults, 21 causes for children, and 11 causes for neonates. We compared Tariff's performance to PCVA for the same cause lists and item sets for the adult and child results; however, PCVA produces estimates for only six neonate causes and consequently direct comparison for neonates was not possible.
In order to analyze the performance of Tariff in comparison with PCVA across a variety of cause of death distributions, 500 different cause compositions based on uninformative Dirichlet sampling  were processed with both Tariff and PCVA. The frequency with which Tariff outperforms PCVA in both chance-corrected concordance and CSMF accuracy is then computed across these 500 population cause-specific constructs.
Selected tariffs in the adult module of the PHMRC dataset
Ulcer oozed pus
Lump in the neck
Pain in left arm
Free text: "cancer"
Diabetes with skin infection
Hypertensive disorder (maternal)
Acute myocardial infarction
Additional files 1, 2, and 3 show the tariffs (derived from the full dataset) for the top 40 items based on tariff absolute value for each cause for the adult, child, and neonate modules, respectively.
Validation of Tariff cause assignment
Individual death assignment
Median chance-corrected concordance (%) for Tariff and PCVA with 95% uncertainty interval (UI), by age group with and without HCE information
Median CSMF accuracy for Tariff and PCVA with 95% UI, by age group with and without HCE information
The Tariff Method is a simple additive approach based on identifying items in a VA interview that are indicative of particular diseases. It is based on the premise that individual items or signs/symptoms should be more prominently associated with certain causes (the "signal") compared with others (the "noise"). This simple approach performs as well as or better than PCVA for adult causes in assigning an underlying cause of death, though PCVA performs better in this comparison for child deaths. At the level of particular causes, Tariff has higher chance-corrected concordances than PCVA for 14/34 adult and 8/21 child causes. Results for neonatal deaths are not comparable due to differences in cause lists. For estimating CSMFs, Tariff performs better than PCVA for adult and child deaths in all comparisons with and without household recall of health care experience. In all comparable cases, Tariff yields higher median CSMF accuracy than PCVA. Overall, at the individual and the CSMF level, Tariff in general offers a competitive alternative to PCVA. Performance for assigning neonatal causes of death, however, is worse than for PCVA.
The tariffs for each cause-item pair have already been established using Stata code, which will be available online. Using this pre-existing tariff matrix, the Tariff Method requires only multiplication and addition to make cause of death assignments for each individual death in a given dataset. Though we processed VA response data to develop our method, users need not conduct additional processing to use Tariff since our processing steps can be integrated into the code that makes cause of death assignments. The absence of a statistical model or complex computational algorithm means that the steps involved in assigning cause of death to a particular death can be completed in a spreadsheet and are readily available for user scrutiny. Further, the tariff matrix and algorithm can be implemented on a simple device such as a cell phone - the Open Data Kit research team at the University of Washington has already implemented the tariff algorithm on an Android cell phone using their Free/Libre Open-Source Survey Platform. In other words, tariff-based cause assignments can be made immediately after data collection in the field.
One of the key strengths of Tariff is its flexibility. Each item's tariff for a cause is computed independently from all other items. Consequently, any instrument's verbal autopsy items that can be mapped to one of the items in the PHMRC dataset can be evaluated using Tariff. Other methods, such as Random Forest and Simplified Symptom Pattern, require the testing data to have the same item set as the data on which the model was trained. This is an important asset of Tariff because it allows users to implement the method without having to recalculate tariffs or revise the algorithm. It can essentially be used as is for any verbal autopsy instrument with overlapping items with the PHMRC instrument.
Tariff does not take into account the interdependencies of signs and symptoms conditional on particular causes. It does not take into account the complex time sequence captured in open narratives, which are often used by physicians. How can such a simple algorithm be more effective than physicians? The answer may lie in the key attributes of Tariff that distinguish it from other methods: identification of items that are unusually important for different causes through computation of the tariff and the additive rather than multiplicative nature of the tariff score. The tariffs focus attention on the specific subset of items that are most strongly related to a given cause. The additive approach may make Tariff more robust to measurement error either in the train or test datasets.
Because of its simplicity, we plan to make available several different platforms on which to apply Tariff. Programs in R, Stata, and Python will be available for assigning a cause for a given death or set of deaths, as well as a version of Tariff in Excel for users without training in statistics packages. Tariff will also be available in the Open Data Kit for use on the Android operating system for cell phones and tablets. We hope these tools will lead to widespread testing and application of Tariff. The full sign/symptom-cause tariff matrix will also be available for user inspection and application to other verbal autopsy diagnostic methods such as Random Forest and Simplified Symptom Pattern, which rely on tariffs to identify meaningful signs and symptoms. The tariffs can also be used to refine further verbal autopsy instruments, possibly in reducing the number of survey items, since they show which specific signs/symptoms should be included for accurately predicting certain causes of death. For example, one strategy for item reduction would be to drop items that have low tariffs for all causes and then assess the change in CSMF accuracy or chance-corrected concordance when cause assignment is undertaken with the restricted item set.
Given that PCVA can be costly and time consuming, it would seem that Tariff provides an attractive alternative. Compared to the current version of InterVA , Tariff performs markedly better. We believe that users interested in rapid, low-cost, easy-to-understand VA methods should consider Tariff. As indicated by analysis of CSMF accuracy and true versus estimated CSMF regressions, there are certain cases where Tariff may overestimate or underestimate CSMFs for particular causes. It will be important for users of Tariff to understand these limitations, particularly for the purposes of using Tariff to better inform public health decision-making. Future research may yield new techniques to more accurately determine CSMFs based on verbal autopsy through back calculation. Tariff is also attractive to those who wish to examine the exact computation by which a verbal autopsy algorithm makes a cause of death assignment. In the future, as more gold standard deaths are collected to augment existing causes in the PHMRC dataset, or for new causes, it will be straightforward to revise existing tariffs or report tariffs for new causes. This step is particularly easy compared to other computer-automated methods, for which expansion with more causes requires revision of the algorithm itself.
Verbal autopsies are likely to become an increasingly important data collection platform in areas of the world with minimal health information infrastructure. To date, methods for evaluating verbal autopsies have either been expensive or time-consuming, as is the case with PCVA, or they have been computationally complex and difficult for users to implement in different settings. This has inhibited the widespread implementation of verbal autopsy as a tool for policymakers and health researchers. Tariff overcomes both of these challenges. The method is transparent, intuitive, and flexible, and, importantly, has undergone rigorous testing to ensure its validity in various settings through the use of the PHMRC verbal autopsy dataset. Using the method on verbal autopsies to determine both individual-level cause assignment and cause-specific mortality fractions will greatly increase the availability and utility of cause of death information for populations in which comprehensive and reliable medical certification of deaths is unlikely to be achieved for many years to come, but is urgently needed for health policies, programs, and monitoring progress with development goals.
cause-specific mortality fraction
health care experience
physician-certified verbal autopsy
root mean squared error
This research was conducted as part of the Population Health Metrics Research Consortium: Christopher J.L. Murray, Alan D. Lopez, Robert Black, Ramesh Ahuja, Said Mohd Ali, Abdullah Baqui, Lalit Dandona, Emily Dantzer, Vinita Das, Usha Dhingra, Arup Dutta, Wafaie Fawzi, Abraham D. Flaxman, Sara Gomez, Bernardo Hernandez, Rohina Joshi, Henry Kalter, Aarti Kumar, Vishwajeet Kumar, Rafael Lozano, Marilla Lucero, Saurabh Mehta, Bruce Neal, Summer Lockett Ohno, Rajendra Prasad, Devarsetty Praveen, Zul Premji, Dolores Ramírez-Villalobos, Hazel Remolador, Ian Riley, Minerva Romero, Mwanaidi Said, Diozele Sanvictores, Sunil Sazawal, Veronica Tallo. The authors would like to additionally thank Charles Atkinson for managing the PHMRC verbal autopsy database and Alireza Vahdatpour, Benjamin Campbell, Michael K. Freeman, and Charles Atkinson for intellectual contributions to the analysis.
This work was funded by a grant from the Bill & Melinda Gates Foundation through the Grand Challenges in Global Health initiative. The funders had no role in study design, data collection and analysis, interpretation of data, decision to publish, or preparation of the manuscript. The corresponding author had full access to all data analyzed and had final responsibility for the decision to submit this original research paper for publication.
- Fottrell E, Byass P: Verbal Autopsy: Methods in Transition. Epidemiol Rev 2010, 32: 38-55. 10.1093/epirev/mxq003View ArticlePubMedGoogle Scholar
- Soleman N, Chandramohan D, Shibuya K: Verbal autopsy: current practices and challenges. Bull World Health Organ 2006, 84: 239-245. 10.2471/BLT.05.027003View ArticlePubMedPubMed CentralGoogle Scholar
- Baiden F, Bawah A, Biai S, Binka F, Boerma T, Byass P, Chandramohan D, Chatterji S, Engmann C, Greet D, Jakob R, Kahn K, Kunii O, Lopez AD, Murray CJL, Nahlen B, Rao C, Sankoh O, Setel PW, Shibuya K, Soleman N, Wright L, Yang G: Setting international standards for verbal autopsy. Bull World Health Organ 2007, 85: 570-571. 10.2471/BLT.07.043745View ArticlePubMedPubMed CentralGoogle Scholar
- Byass P, Fottrell E, Dao LH, Berhane Y, Corrah T, Kahn K, Muhe L, Do DV: Refining a probabilistic model for interpreting verbal autopsy data. Scand J Public Health 2006, 34: 26-31. 10.1080/14034940510032202View ArticlePubMedPubMed CentralGoogle Scholar
- Byass P, Huong DL, Minh HV: A probabilistic approach to interpreting verbal autopsies: methodology and preliminary validation in Vietnam. Scand J Public Health Suppl 2003, 62: 32-37.View ArticlePubMedGoogle Scholar
- Byass P, Kahn K, Fottrell E, Collinson MA, Tollman SM: Moving from data on deaths to public health policy in Agincourt, South Africa: approaches to analysing and understanding verbal autopsy findings. PLoS Med 2010, 7: e1000325. 10.1371/journal.pmed.1000325View ArticlePubMedPubMed CentralGoogle Scholar
- Flaxman AD, Vahdatpour A, Green S, James SL, Murray CJL, the Population Health Metrics Research Consortium (PHMRC): Random forests for verbal autopsy analysis: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011, 9: 29. 10.1186/1478-7954-9-29View ArticlePubMedPubMed CentralGoogle Scholar
- Murray CJL, James SL, Birnbaum JK, Freeman MK, Lozano R, Lopez AD, the Population Health Metrics Research Consortium (PHMRC): Simplified Symptom Pattern Method for verbal autopsy analysis: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011, 9: 30. 10.1186/1478-7954-9-30View ArticlePubMedPubMed CentralGoogle Scholar
- Murray CJL, Lopez AD, Feehan DM, Peter ST, Yang G: Validation of the Symptom Pattern Method for Analyzing Verbal Autopsy Data. PLoS Med 2007, 4: e327. 10.1371/journal.pmed.0040327View ArticlePubMedPubMed CentralGoogle Scholar
- Murray CJL, Lopez AD, Black R, Ahuja R, Ali SM, Baqui A, Dandona L, Dantzer E, Das V, Dhingra U, Dutta A, Fawzi W, Flaxman AD, Gómez S, Hernández B, Joshi R, Kalter H, Kumar A, Kumar V, Lozano R, Lucero M, Mehta S, Neal B, Ohno SL, Prasad R, Praveen D, Premji Z, Ramírez-Villalobos D, Remolador H, Riley I, Romero M, Said M, Sanvictores D, Sazawal S, Tallo V: Population Health Metrics Research Consortium gold standard verbal autopsy validation study: design, implementation, and development of analysis datasets. Popul Health Metr 2011, 9: 27. 10.1186/1478-7954-9-27View ArticlePubMedPubMed CentralGoogle Scholar
- Gakidou E, Lopez AD: What do children die from in India today? The Lancet 2010, 376: 1810-1811. 10.1016/S0140-6736(10)62054-5View ArticleGoogle Scholar
- Lozano R, Lopez AD, Atkinson C, Naghavi M, Flaxman AD, Murray CJL, the Population Health Metrics Research Consortium (PHMRC): Performance of physician-certified verbal autopsies: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011, 9: 32. 10.1186/1478-7954-9-32View ArticlePubMedPubMed CentralGoogle Scholar
- King G, Lu Y: Verbal Autopsy Methods with Multiple Causes of Death. Statistical Science 2008, 23: 78-91. 10.1214/07-STS247View ArticleGoogle Scholar
- Flaxman AD, Vahdatpour A, James SL, Birnbaum JK, Murray CJL, the Population Health Metrics Research Consortium (PHMRC): Direct estimation of cause-specific mortality fractions from verbal autopsies: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011, 9: 35. 10.1186/1478-7954-9-35View ArticlePubMedPubMed CentralGoogle Scholar
- Murray CJL, Lozano R, Flaxman AD, Vahdatpour A, Lopez AD: Robust metrics for assessing the performance of different verbal autopsy cause assignment methods in validation studies. Popul Health Metr 2011, 9: 28. 10.1186/1478-7954-9-28View ArticlePubMedPubMed CentralGoogle Scholar
- Lozano R, Freeman MK, James SL, Campbell B, Lopez AD, Flaxman AD, Murray CJL, the Population Health Metrics Research Consortium (PHMRC): Performance of InterVA for assigning causes of death to verbal autopsies: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011, 9: 50. 10.1186/1478-7954-9-50View ArticlePubMedPubMed CentralGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.