The importance of evaluating reliability and validity of underlying causes of death in mortality statistics has been recognized for a long time in the area of public health [19, 20]. Generation of reliable statistical mortality data requires precise and consistent cause of death data, which in turn depends on the completeness and accuracy of cause of death diagnoses on the medical death certificate and its correct completion.
There are different approaches in assessing the accuracy of the diagnoses in the medical death certificates. For example, several publications have used postmortem results from autopsies as a gold standard to compare agreements and errors with medical death certificates. A meta-analysis of 53 autopsy series published in 2003 yielded a median error rate of 23.5% (range: 4.1%-49.8%). The analysis of diagnostic error rates in the same study, adjusting for the effects of case mix, country, and autopsy rate, yielded relative decreases of 19.4% (1.8%, 33.8%) for a period of 10 calendar years . Not all diseases can be diagnosed with a postmortem examination. Adequate clinical examinations prior to death are also useful for correct determination and certification of causes of death. Using medical records as a gold standard (with review by pathologists or nosologists), some studies have validated the quality of death certificates in different countries. These include a population-based study of 1,068 deaths in Valencia, Spain  and another that was a review of 2,813 medical death certificates in Finland . We calculated, for both studies, the same metrics that we used for our sample. In the first study the median chance-corrected concordance was 58.9% and in the second 60.3%. The accuracy was 0.94 and 0.90, respectively. It is important to mention that when we calculated the same metrics for only 1,284 adults we computed a mean chance-corrected concordance of 66% and a CSMF accuracy of 0.85 without sampling across the 500 Dirichlet splits.
To the best of our knowledge, this is the first study in Mexico assessing the validity of medical death certificates using a robust gold standard. Although the sample may be biased (more than 66% of the cases came from hospitals with high technical capabilities for diagnoses as well as good pathology departments) the results are consistent with other studies that used a sample of hospital deaths. Johansson and Westerling published a study of 31,785 death certificates that were linked to the national hospital discharge register and found an agreement of 46% with the main diagnosis of the hospital discharge and the underlying cause of death in medical death certificates . For deaths that occurred in the hospital, the agreement increased to 84%, but for those that occurred at home, the agreement fell to 43%. The same study found an incremental trend of the agreement by age: 43.8% in children under 1 year old, 44.7% in children from 1 to 14 years of age, and 49% in adults aged 15 and over.
Our study found a reasonably high concordance and accuracy of the assignment of individual causes of death in the underlying cause of death of medical death certificates compared to the gold standard.
For adults, the list of 34 causes of death used in our study is reasonable and captures the epidemiological pattern for causes of death in the Federal District and Morelos, but this is not the case for the 21 causes for children. There were difficulties in obtaining the quota of deaths for some diseases, particularly for children aged 1 month to 12 years. According to official statistics, there were 868 deaths in the MoH health facilities of the Federal District and the state of Morelos in 2009. That year there were no deaths due to measles, meningitis, encephalitis, hemorrhagic fever, malaria, or bites of venomous animals in those age ranges, and there were only 39 deaths (4.5%) related to injuries, 28 in the Federal District and 11 in Morelos. None of these cases fit the inclusion criteria due to the lack of quality of the medical records. In the case of neonates we did not find any deaths due to pneumonia.
This study also shows a substantial variability in the concordance and accuracy depending on cause of death. In the case of adults, it is worthy to mention that for diabetes, a highly prevalent disease considered the number one cause of death in Mexico, this analysis shows a substantial overreporting of deaths based on the death certificate. Previous studies have shown that validity and comparability of diabetes can be affected because the diagnosis usually appears in only two-thirds of death certificates for people who had diabetes before death [25, 26]. The order of the sequence of causes can also be a factor in whether or not diabetes is assigned as underlying cause of death [27, 28]. Murray et al. show that when controlling for individual and community factors, mortality from diabetes can be reduced by 10% in the US and 24% in Mexico . In this study we have seen a poor performance of diabetes CSMF prediction despite high chance-corrected concordance (86.8%) due to an overlap between diabetes in 38% of cardiovascular deaths and in 32% of pneumonia deaths.
It is clear, on the other hand, that chance-corrected concordance and CSMF prediction are good for diseases where the diagnosis should be evidence-based, such as HIV/AIDS, leukemia/lymphomas, and cervical cancer. More than 95% of the death certificates with these causes match the gold standard and have concordance over 90%. There are other causes, such as cirrhosis, homicides, and maternal deaths for which more than 95% of death certificates match the gold standard, but their chance-corrected concordance is lower than 85%. The case of maternal deaths is important to highlight because Mexico has undertaken a major effort since 2002 to improve the completeness and quality of their diagnoses. Chance-corrected concordance for maternal deaths was 80%, which included false positive cases diagnosed as HIV/AIDS and noncommunicable diseases that could have been considered indirect obstetric deaths.
The low concordance and accuracy in the case of child and neonatal deaths, as well as the variability across causes at these ages, could be associated with different factors, such as the type of causes selected as gold standard, the number of gold standard cases gathered by cause, and death certification itself. Regarding the last point, in Mexico, as in many other countries, death certification is perceived as unglamorous routine paperwork or a "burdensome task" of low priority. It is sometimes even interpreted as punishment or a task for doctors with a low level of training. This may be the case in the pediatric hospitals because the correlation between the medical death certificates and the gold standard was very low for all causes. In our study, when we considered not only the underlying cause of death, but the mention of any cause of death in the medical death certificate, the median chance-corrected concordance for children increased from 38.5% to 64.0% with a very dramatic increase in diarrhea, sepsis, and pneumonia. In neonates, the median chance-corrected concordance increased from 54.3% to 58.9%, mainly due to an increase in the concordance of birth asphyxia and preterm deaths. This is consistent with Hunt and Barr  who demonstrated in their study that including all causes written in the medical death certificates regardless of the sequence of diagnosis increased the concordance from 58% to 91% in neonatal deaths. In other words, the medical knowledge to assign a cause of death is present, but it could be used more efficiently in correctly filling out the death certificates.
These results suggest that using multiple cause of death analysis could better support decision-makers, because assigning "one cause to one death" is an exercise that is not easily understood by physicians, and this directly affects the reliability of the cause of death statistics. This problem becomes apparent when we consider all the causes reported on the medical death certificate, where the consistency of individual cause assignment and accuracy of the CSMF composition improve significantly. However, improving the quality of medical certification by using the multiple cause approach does not help to increase the validity of the cause of death statistics themselves because they are based on the underlying cause of death.
This study had various methodological strengths: in contrast to other validation studies using medical records as the gold standard, the cases selected in this study were based on robust gold standard criteria used in a multisite study; in addition, the metrics used to assess the performance of the VR system (chance-corrected concordance, CSMF accuracy, and linear regression, all estimated using a set of 500 test splits) are less sensitive to the cause composition of the test sample than other metrics traditionally used to assess performance, such as sensitivity and specificity.
The study had some limitations that should be considered in the interpretation of results. It is important to take into account that the cases included in this study are a sample of cases with complete medical records, which allowed their classification as gold standard. The cases came mostly from high-specialty hospitals in the Federal District and as a result may have better death certification than deaths taking place in nonspecialty medical units. For the same reason, the concordance and accuracy reported in this paper may be higher than one we might find in other settings. This study is based on high quality registries and cannot be extrapolated to the entire country.
It could be argued that the concordance may be affected not only by the information registered in the medical death certificate, but also by the coding procedures of the underlying cause of death. In this study, we used the coding information from INEGI, which generates the official mortality figures, and we assume that their procedures follow robust quality standards. However, the effect of possible coding problems on concordance and accuracy should be the subject of future research.
In addition, the sample size was small for child and neonatal deaths, which may have limited our ability to analyze concordance and accuracy in these age groups. The reduced sample size can be explained by the low mortality in these age groups in medical units of the Federal District, as well as by the presence of a different mortality pattern in the study area.