When physicians review VA results for individuals who died without contact with health care services, the median chance-corrected concordance ranges from -3% to 77.6% with an average value across causes of 29.7% for adults; -5% to 89.5% with an average value of 36.3% for children; and 1.6% to 72.9% with an average value of 27.6% for neonates. This basic result is the same whether one or two physicians review the VA but is lower when physicians from other locations review the VA. Performance improves when physicians are given access to household recall of health care experience and medical records retained by the household. Both results, the improvement with HCE and the difference between physicians from within the country versus physicians from another country, highlight that a substantial component of VA diagnoses are a function not of signs and symptoms but the combination of prior epidemiological views of the physician reader and filtered information on medical records provided by the household. In other words, the validity of PCVA is highly contextual. It will perform better when respondents have more access to health care and when physicians are strongly guided by their prior beliefs on the prevalence of diseases.
Performance of a VA method on estimating CSMFs is a complex function of both individual death assignment concordance and the pattern of how true negatives are larger or smaller than false positives. The median CSMF accuracy found in this study was 0.624 without HCE and 0.675 with HCE for adults; 0.632 without HCE and 0.682 with HCE for children; and 0.695 without HCE and 0.733 with HCE for neonates. The performance of PCVA must be interpreted in light of the performance of medical certification of causes of death in a functioning vital registration system. Hernández et al. (2011)  have found in Mexico, for example, that routine medical certification using the same gold standard deaths has a median chance-corrected concordance of 66.5% for adults, 38.5% for children, and 54.3% for neonates; and a CSMF accuracy of 0.780 for adults, 0.683 for children, and 0.756 for neonates. This is one of the few studies with comparable assessment of medical certification of death using the same methods and metrics. PCVA provides less accurate measurement than medical certification for adults but comparable results for children and neonates.
To many readers, the relatively modest performance of PCVA will come as a surprise. Some previously published studies [[14–20]] have reported substantially higher concordances compared to medical record review and quite small errors in estimated CSMFs. The less impressive performance reported here must be viewed taking into account two factors. First, in this study PCVA is being compared to a true gold standard. It is possible that the same signs and symptoms that lead to diagnoses in some facilities without laboratory tests or diagnostic imaging are those used by physicians reading a VA leading to falsely inflated performance when no gold standard is available. Second, by assessing PCVA performance estimating CSMFs across 500 test datasets, we get a much more robust assessment of performance at estimating CSMF performance, an assessment that is not simply the function of the CSMF composition in one particular test dataset.
The findings on PCVA must also be interpreted in light of the results of the sensitivity analysis. In the adult case with HCE, in 5% of the deaths, physicians assign the true cause somewhere on the death certificate but not as underlying cause. Our study is a fair assessment of the cause of death pattern yielded through PCVA using a rigorous protocol for coding causes of death. The sensitivity result, however, suggests that better training of physicians in completing the death certificate might improve performance. In this study, physicians were carefully trained in this part of the completion of a VA. The difference for children and neonates is less marked. In addition to the discrepancy in coding sensitivity, several of the physicians experienced difficulty in completing their assigned VAs due to the length of time involved in reading each VA. In some cases, VAs had to be reassigned to a different physician at the same site to ensure completion. The results of this study were conducted with 95% of the total VAs sent out for review.
We present results based on a single physician review of each VA. We have as part of this broader study a substudy comparing single review and double review with adjudication of conflicting reviews. For reasons of space, we have not presented the results from that substudy here. Our overall conclusions, however, presented in this paper on PCVA will not be affected by using only single review. In fact, we find that two readers do not improve performance over a single reader, confirming a result published for Andhra Pradesh . Based on purely probability theory grounds, double review should only improve the results of VA if a single physician is more than 50% likely to get the true cause correct. Given that a single physician is less than 50% likely to get the true cause correct, there is no theoretical argument in favor of double review, nor is there empirical support in our study.
Our finding that physicians vary markedly in their ability to assign the true cause controlling for cause of death, availability of HCE, and whether a physician is from the site or another location has important implications. It suggests that despite standardized training, all physicians are not equal in their ability to assign causes of death. Given that physicians vary in diagnostic skill for patients when they are alive, it should not be surprising that some physicians are better than others at reading verbal autopsies. This reality is one further challenge to implementing PCVA. The marked sensitivity of the results to the diagnostic ability of different physicians and their prior views on the prevalence of diseases suggests that more rigorous screening and training of physicians who undertake PCVA could improve the results. This highlights the major implementation challenge that many are facing: it is costly, time-consuming, and difficult to recruit and motivate physicians to read large numbers of VAs. Recruiting physicians with better diagnostic acumen and ability to accurately assign causes of death given a VA could be even more problematic. PCVA by its nature has substantially lower reproducibility than automated statistical or machine-learning methods for VA analysis.