- Open Access
- Open Peer Review
Correcting and estimating HIV mortality in Thailand based on 2005 verbal autopsy data focusing on demographic factors, 1996-2009
Population Health Metrics volume 12, Article number: 25 (2014)
It is known that death registry (DR) underestimates HIV deaths. The objectives of this study were to examine under-reporting/misclassification and to estimate HIV mortality in Thailand during 1996-2009 from a model based on 2005 verbal autopsy (VA) data.
Logistic regression was used to predict HIV deaths from the VA dataset with and without demographic covariates. This full model was then used to predict individual HIV deaths from the DR dataset of provinces in which VA was conducted. The proportions in the remaining provinces were predicted from spatial interpolation based on coefficients of the VA provinces.
Area under Receiver Operating Characteristic curve of the full model was 0.969 compared to 0.879 of the simple cross-referencing model when demographic covariates were not included. DR-reported HIV deaths accounted for only one-third of all VA-estimated HIV deaths. The most misclassified HIV deaths were those registered as tuberculosis and mental and nervous system. Under-reporting was most common among females and people aged 20-39 years, and effect of province was highest in the upper north and upper south regions.
For approximately two-thirds of all HIV deaths estimated by the full model, the causes were reported under other categories, not HIV. Demographic variables are essential for accurately correcting causes of death from death registries.
Inaccurate and unreliable attribution of causes of death is especially high where causes are ill-defined. Under-reporting of HIV deaths is common in developing countries, and has been documented in Botswana , Brazil ,, South Africa ,, and Thailand -, due to under-registered deaths and misclassification of cause of death. These problems severely limit the value of routine mortality data for public utility and affect resource allocations by policymakers.
Death registry (DR) data in Thailand provided by the Ministry of Interior through the Bureau of Policy and Strategy, Ministry of Public Health, are of poor quality, not only lacking completeness but also providing inaccurate cause of death ,,-. Verbal autopsy (VA) surveys have been widely used in several countries including Uganda, China, Brazil, Tanzania, Bangladesh, South Africa, Zimbabwe, and Thailand to give more accurate information about causes of death ,,. The latest VA study in Thailand was carried out in 2005 in nine provinces by the SPICE (Setting Priorities using Information on Cost- Effectiveness analysis) project. The 2005 VA study ,- was used to estimate various causes of death including HIV. However, the simple cross-referencing method used in these studies ignored the effect of sex-age groups and locality of the deceased, which could give incorrect estimates due to confounding.
We hypothesized that the utility of the 2005 VA data can be substantially improved if demographic variables were included to predict the cause of death. The aims of our study were 1) to examine under-reporting and misclassification of HIV deaths, based on modeling of the 2005 verbal autopsy data and 2) to estimate HIV deaths in all provinces of Thailand during 1996-2009.
Data sources and management
This study was confined to deaths of people aged 5 years and older, for which HIV death is common and often misclassified. DR data from 1996-2009 were obtained from the Bureau of Policy and Strategy database, Ministry of Public Health. The 2005 VA study was conducted by the SPICE project, and included a sample of 9,644 deaths (3,316 in-hospital and 6,328 outside-hospital) from 28 selected districts in nine provinces of four regions, of which 9,495 were deaths of persons aged 5 years and older.
Table 1 summarizes the cause groups based on VA counts. Accordingly, the chapter-block classifications of ICD-10 codes , consisting of blocks categorized mainly by human organs, were used to create 21 major cause groups for deaths at ages 5 years and older based on the distribution of VA-assessed deaths. For statistical accuracy, groups with small counts (mainly less than 200) were combined into larger groups using medical considerations (apart from septicemia, which received special attention due to over-reporting). The proportion of all deaths represented by these categories varied from 0.8% for septicemia to 11.3% for stroke and 5.4% for HIV deaths (ICD 10 code B20-24), as shown in Table 1.
Misclassification was not at random. The effects of sex, age, and spatial variables were used to correct misclassification, using logistic regression. For efficiency, the predictors were optimally grouped to obtain sufficient sample size for relatively homogeneous risk groups. Nine provinces were included in the VA study (Bangkok, Nakhon Nayok, Suphan Buri, Ubon Ratchathani, Loei, Phayao, Chiang Rai, Chumphon, and Songkhla). The effects of age for males and females were considered separately (see Results). Sex and age were grouped together into 14 levels (with seven levels of age in years: 5-19, 20-29, 30-39, 40-49, 50-59, 60-69, and 70+).
Similarly, misclassification of cause of death was considered differently for deaths in and outside hospitals. Reported causes of death and location were grouped into 18 levels, which resulted from the combination of two levels of location (in and outside hospital) and nine major causes of death (HIV, respiratory, septicemia, tuberculosis (TB), other infectious, mental and nervous system, digestive, ill-defined, and the remainder, which were aggregated into a single group).
Through logistic regression - we estimated the logit of the probability P that a person died from HIV as a linear function of the determinant factor. The simple logistic regression model with simple cross-referencing is formulated as
where P i is the probability of death due to HIV, μ is a constant, and α i is the only parameter of DR cause-location i. The simple cross-referencing model (A) was compared with the full model (B), which includes an additive linear function of the determinant factors, which could be expressed as
where P ijk is the probability of death due to HIV and α i , β j and γ k are individual parameters specifying DR cause-location group i, sex-age group j and province k, respectively.
We used "sum contrasts" developed by Tongkumchum and McNeil  and Kongchouy and Sampantarak  instead of conventional "treatment contrasts" where the first level is left out from the model to be the reference. This method allows us to compute the estimate and the 95% confidence interval of deaths for each of the covariate levels in the VA and the DR datasets.
To assess the accuracy of model prediction, the Receiver Operating Characteristic (ROC) curve from logistic regression was drawn based on a concept described by Chongsuvivatwong  and Fan et al. . Area under the ROC curve (AUC) measures the performance of a model and represents model accuracy ,. A cut-off point in the curve, where the predicted number of HIV deaths equals the observed value in the VA dataset (512 cases), was used to report sensitivity and specificity of the model. These were compared with results from the simple cross-referencing method.
Estimation of HIV mortality
For the nine study provinces, fitting the complete logistic regression model to the 2005 VA dataset resulted in nine province coefficients, 14 sex-age group coefficients, and 18 DR cause-location coefficients and the estimate of HIV deaths and 95% confidence intervals.
For the remaining 67 provinces, we used a simple and easily implemented spatial "triangulation method" , to interpolate province coefficients. This was preferred to the "kriging" method because it uses fewer points than kriging, and there were insufficient sample provinces (only nine) to provide the basis for kriging .
Triangles were drawn linking nine provinces in the 2005 VA study. The values of province coefficients in each triangle were assigned as an average of coefficients from nearby provinces in the model. For each triangle, values a, b and c were obtained by solving three equations using linear algebra based on latitude and longitude as follows.
(Note: P = Province, β = coefficient)
The coefficient for any province j within a triangle could then be given by
Coefficients for provinces outside triangles were obtained similarly by extrapolation from nearby provinces. Province coefficients for all provinces were thus obtained and the magnitude of HIV deaths estimated.
R program version 2.15.2  was used for all statistical analysis and graphical displays.
Ubon Ratchathani, Suphan Buri, and Chiang Rai had the largest numbers of total deaths (2373, 1600, and 1437 deaths, respectively), while Chumphon had the lowest (310 deaths). The VA-assessment gave 512 HIV deaths, whereas only 164 HIV deaths (32%) were correctly DR-reported.
From the likelihood ratio test in Table 2, the logistic regression model gives the deviance reduction between the full and null models as shown. All p-values are statistically significant.
Figure 1 shows crude percentages of HIV deaths (among all deaths) by province, sex-age group, and DR cause-location group and the adjusted values with 95% confidence intervals. The values derived from the direct VA assessment and from the full model are similar, indicating variation among groups but with no substantial confounding. The plotted values above the average line reflect the groups that were more likely to die from HIV.
The 95% confidence interval for both Phayao and Chumphon is marginally higher than the mean, whereas for Loei it is marginally lower. Therefore, effect of province on misclassification of cause of death was marginal. The percentages of HIV deaths in age groups 20-49 are all substantially above the mean, with females higher than males when those aged 20-39 years were compared. Thus, age groups 20-49 were significantly more likely to have high levels of under-reporting. Finally, substantial numbers of HIV deaths were reported as TB, mental and nervous system, other infectious diseases, and respiratory for deaths in hospitals, whereas HIV deaths outside hospitals were reported as TB, other infectious diseases, and septicemia. These are the groups in which HIV deaths were often misclassified.
The full model was assessed using the ROC curve and compared with a simple cross-referencing model. Figure 2 shows the ROC curve of the simple cross-referencing model (model A) with only DR cause-location factor and the ROC curve of the full model (model B) with three factors of DR cause-location, sex-age groups, and province. The cut-off point marked by the star gives a total predicted number agreement of the number of VA-assessed HIV deaths in the model. The simple cross-referencing model represents an AUC of 0.879, 53.1% sensitivity, and 97.5% specificity, whereas the full model represents an AUC of 0.969, 69.3% sensitivity, and 98.3% specificity. When we only compared an area above the diagonal line, the simple cross-referencing had AUC of 76% and the full model had AUC of 94%. In other words, the simple cross-referencing had an error of 24%, whereas the full model had an error of 6%. This means our model reduced the error by a factor of four. It is clear that the full model has the ability to predict the correct cause of HIV deaths better than the simple cross-referencing model. Just using the contingency table without statistical modeling, DR-reported cause had 32% sensitivity and 99.9% specificity.
The full model (B) was then extended to all provinces. The left panel of Figure 3 shows the nine study province coefficients from the logistic regression model plotted in black. Values plotted in blue are averages of coefficients from nearby provinces in each triangle using the triangulation method. The right panel classifies province from the equations (C, D, E, F in methods section), according to three levels of coefficients. The highest were found in the upper north (Phayao and Phrae) and the upper south (Prachuap Khirikhan, Chumphon, Ranong, Surat Thani, and Phang Nga). This implies that HIV deaths were proportionally highest among all deaths in the upper north and the upper south.
Finally, the simple cross-referencing and the full model were then applied to the DR data for male and female deaths in 1996-2009 and plotted as area graphs in Figure 4. The area of each color strip denotes the number of HIV deaths in each age group. The total number of DR-reported HIV deaths were much lower than those estimated by cross-referencing (model A) and logistic regression (model B) by factors of 2.8 and 3.1, respectively. While model A gave large proportions of HIV deaths at ages over 50 years (light blue, golden yellow, and grey), these were substantially reduced when the full logistic regression allowing for age/sex and province (model B) was used. On the other hand, for the young adult group, HIV cause of death was already substantially improved in accuracy by the simple logistic regression model.
Discussion and conclusions
Our logistic regression analysis showed that VA-assessed HIV deaths were more likely in female young adults compared to death registration, but many of those deaths were DR-registered as deaths from TB or from mental and nervous system disorders. A logistic regression-based method allowing for age/sex and geographical effects predicted HIV with higher sensitivity and specificity when compared with those HIV-estimated deaths derived from the cross-referencing from simple tabulation. Under-reporting was most common in the upper north and the upper south of the country. DR under-reported HIV deaths by a factor of three, whereas the simple cross-referencing method distorted the age distribution and could lead to a misunderstanding that HIV death was also common among the elderly.
HIV deaths were found to be relatively common among deaths in the age group 20-39 years, in agreement with other research ,. AIDS is estimated to be the largest cause of death in Asian adults 15-44 years . Before 1990, new HIV infections were highest among those injecting drugs and clients of sex workers. During 1995-2005, they were highest among the women with the category of housewife . In other words, the most under-reporting of HIV deaths was found in females rather than males.
Most misclassifications of HIV deaths were classified as TB or mental and nervous system disorders. It is commonly known that TB and cryptococcal meningitis are the leading causes of opportunistic infections among HIV patients -. These infections were possibly recorded as the primary cause of deaths in death certificates either to avoid stigma to the family of the deceased, because the symptoms of TB and HIV are very similar, or because the people reporting the death might not have access to the results of a HIV test for the deceased. Another general condition often recorded was "immunodeficiency (D849: immunodeficiency, unspecified)." This might in fact be the more specific "HIV/AIDS (B20-B24: Human immunodeficiency virus disease)" in ICD10 coding .
Misclassification was associated with region (province). This could be due to difference in the levels of intensity of the HIV epidemic, stigmatization, and availability of qualified personnel for DR recording and their attitude toward HIV-related death across the regions. HIV mortality peaked in the upper north, especially in Phayao, because in the past two decades the HIV epidemic has been most severe in the upper north ,,-. One-third of HIV deaths were predicted in the northern region since 1987-2014 . Those HIV deaths were higher in the upper south than in the central region in spite of the less severe HIV epidemic . HIV deaths in the south were more likely to be misclassified to other causes, as the area was perceived to have low levels of HIV . In addition, mortality varies by geographic location, and the south has the lowest overall mortality ,.
Our full logistic regression model based on the 2005 VA data was shown to predict and estimate HIV deaths with high sensitivity, specificity, and AUC, better than the simple cross-referencing model. The specificity level from our model was higher than a verbal autopsy tool from Uganda, where sensitivity was not reported . The cross-referencing method has been used in many previous studies ,-,. Inadequate models can give misleading or incorrect inferences . Our study showed that the use of this simple method should be discouraged because it distorts the HIV death estimate in various demographic groups. This distortion can mislead priority setting and resource allocation.
There were limitations in our analysis. First, the sample survey design did not stratify by strong predictors of the outcome such as reported cause and location of report. The study sample thus did not adequately cover the population at risk for HIV and the sample size did not allow precise estimation among certain minority groups, such as the Muslim group in the far south. Second, only nine of Thailand's 76 provinces were included in the VA study.
Third, we have assumed that the 2005 VA data can inform corrections in all years between 1996 and 2009, while it is clear that the coverage of antiretroviral treatment was near zero in 1996, 12% in 2003, 41% in 2005, and 76% in 2009 . There would therefore be differences in misclassification of HIV-related deaths across the years, which are not captured by our methods. Finally, VA itself has limitations, in terms of inaccuracy of informants and recall bias. The results must therefore be carefully interpreted.
Taffa N, Will JC, Bodika S, Packel L, Motlapele D, Stein E, Roels TH, Kennedy G, Shenaaz EH: Validation of AIDS-related mortality in Botswana. JIAS 2009, 12: 24. 10.1186/1758-2652-12-24
Fazito E, Cuchi P, Fat DM, Ghys PD, Pereira MG, Vasconcelos AMN, Pascom ARP: Identifying and quantifying misclassified and under-reported AIDS deaths in Brazil: a retrospective analysis from 1985 to 2009. Sex Transm Infect 2012,88(Suppl 2):i86-i94. 10.1136/sextrans-2012-050632
Pacheco AG, Saraceni V, Tuboi SH, Lauria LM, Moulton LH, Faulhaber JC, King B, Golub JE, Durovni B, Cavalcante S, Harrison LH, Chaisson RE, Schechter M: Estimating the extent of underreporting of mortality among HIV-infected individuals in Rio de Janeiro, Brazil. AIDS Res Hum Retroviruses 2011,27(1):25-28. 10.1089/aid.2010.0089
Yudkin PL, Burger EH, Bradshaw D, Groenewald P, Ward AM, Volmink J: Deaths caused by HIV disease under-reported in South Africa. AIDS 2009,23(12):1600-1602. 10.1097/QAD.0b013e32832d4719
Birnbaum JK, Murray CJL, Lozano R: Exposing misclassification HIV/AIDS deaths in South Africa. Bull World Health Organ 2011, 89: 278-285. 10.2471/BLT.11.086280
Tangcharoensathien V, Faramnuayphol P, Teokul W, Bundhamcharoen K, Wibulpholprasert S: A critical assessment of mortality statistics in Thailand: potential for improvements. Bull World Health Organ 2006,84(3):233-238. 10.2471/BLT.05.026310
Khonhan K: Quality of mortality data of HIV/AIDS surveillance reporting system in Mukdahan Province, Thailand (in Thai). Thai Popul J 2009,1(1):125-135.
Rao C, Porapakkham Y, Pattaraarchachai J, Polprasert W, Swampunyalert N, Lopez AD: Verifying causes of death in Thailand: rationale and methods for empirical investigation. Popul Health Metr 2010, 8: 11. 10.1186/1478-7954-8-11
Mathers CD, Fat DM, Inoue M, Rao C, Lopez AD: Counting the dead and what they died from: an assessment of the global status of cause of death data. Bull World Health Organ 2005,83(3):171-177.
Prasartkul P, Vapattanawong P: The completeness of death registration in Thailand: evidence from demographic surveillance system of the Kanchanaburi project. World Health Popul 2006, 8: 43-51. 10.12927/whp.2006.18054
Hill K, Vapattanawong P, Prasartkul P, Porapakkham Y, Lim SS, Lopez AD: Epidemiologic transition interrupted: a reassessment of mortality trends in Thailand, 1980-2000. Int J Epidemiol 2007, 36: 374-384. 10.1093/ije/dyl257
Prasartkul P, Porapakham Y, Vapattanawong P, Rittirong J: Development of a verbal autopsy tool for investigating cause of death: the Kanchanaburi project. JPSS 2007,15(2):1-22.
Vapattanawong P, Prasartkul P: Under-registration of deaths in Thailand in 2005-2006: results of cross-matching data from two sources. Bull World Health Organ 2011, 89: 806-812. 10.2471/BLT.10.083931
Lopez AD, Lozano R, Murray CJL, Shibuya K: Verbal autopsy: innovations, applications, opportunities improving cause of death measurement. In Popul Health Metr 2011, 9: 128-254.
Choprapawon C, Porapakkham Y, Sablon O, Panjajaru R, Jhantharatat B: Thailand's national death registration reform: verifying the causes of death between July 1997 and December 1999. Asia Pac J Public Health 2005,17(2):110-116. 10.1177/101053950501700209
Pattaraarchachai J, Rao C, Polprasert W, Porapakkham Y, Poa-in W, Singwerathum N, Lopez AD: Cause-specific mortality patterns among hospital deaths in Thailand: validating routine death certification. Popul Health Metr 2010, 8: 12. 10.1186/1478-7954-8-12
Polprasert W, Rao C, Adair T, Pattaraachachai J, Porapakkham Y, Lopez AD: Cause-of-death ascertainment for deaths that occur outside hospitals in Thailand: application of verbal autopsy methods. Popul Health Metr 2010, 8: 13. 10.1186/1478-7954-8-13
Porapakkham Y, Rao C, Pattaraachachai J, Polprasert W, Vos T, Adair T, Lopez AD: Estimated causes of death in Thailand, 2005: implications for health policy. Popul Health Metr 2010, 8: 14. 10.1186/1478-7954-8-14
ICD-10 International Statistical Classification of Diseases and Related Health Problems. WHO, Geneva; 2004.
McNeil D: Epidemiological Research Methods. John Wiley & Sons Ltd, New York; 1996.
Venables WN, Ripley BD: Modern Applied Statistics with S. Springer, New York; 2002.
Chongsuvivatwong V: Graphs, Tables and Equations for Health Research (in Thai). Chulalongkorn University Press, Bangkok; 2007.
Tongkumchum P, McNeil D: Confidence intervals using contrasts for regression model. Songklanakarin J Sci Technol 2009,31(2):151-156.
Kongchouy N, Sampantarak U: Confidence intervals for adjusted proportions using logistic regression. Mod Appl Sci 2010,4(6):2-7. 10.5539/mas.v4n6p2
Fan J, Upadhye S, Worster A: Understanding receiver operating characteristic (ROC) curves. Can J Emerg Med 2006,8(1):19-20.
Sakar S, Midi H: Importance of assessing the model adequacy of binary logistic regression. J of Appl Sci 2010,10(6):479-486. 10.3923/jas.2010.479.486
Takahashi K, Uchiyama H, Yanagisawa S, Kamae I: The logistic regression and ROC analysis of group-based screening for predicting diabetes incidence in four years. Kobe J Med Sci 2006,52(6):171-180.
Li J, Heap AD: A Review of Spatial Interpolation Methods for Environmental Scientists. Geoscience Australia, Record 2008/23, Canberra; 2008.
Yang CS, Kao SP, Lee FB, Hung PS: Twelve Different Interpolation Methods: A Case Study of SURFER 8.0.[http://www.isprs.org/proceedings/XXXV/congress/comm2/comm2.aspx]
Murphy M: The Advantages of Kriging vs Triangulation Contour Mapping Methods.[http://www.ehow.com/info_12002607_advantages-kriging-vs-triangulation-contour-mapping-methods.html]
R Development Core Team: R program: A Language and Environment for Statistical Computing and Graphics , [http://cran.r-project.org/bin/windows/base/old/2.15.2/]
Punyacharoensin N, Viwatwongkasem C: Trends in three decades of HIV/AIDS epidemic in Thailand by nonparametric backcalculation method. AIDS 2009,23(9):1143-1152. 10.1097/QAD.0b013e32832baa1c
Kerr S, Phanuphak P: An Asian perspective on HIV/AIDS. Asian Biomed 2009,3(1):9-14.
The Asian Epidemic Model (AEM) Projections for HIV/AIDS in Thailand: 2005-2025. Family Health International (FHI) and Bureau of AIDS,TB and STIs, Department of Disease Control, Ministry of Public Health, Bangkok; 2008.
Kantipong P, Murakami K, Moolphate S, Aung MN, Yamada N: Causes of mortality among tuberculosis and HIV co-infected patients in Chiang Rai, Northern Thailand. HIV AIDS Res Palliat Care 2012, 4: 159-168.
Cain KP, Anekthananon T, Burapat C, Akksilp S, Mankhatitham W, Srinak C, Nateniyom S, Sattayawuthipong W, Tasaneeyapan T, Varma JK: Caused of death in HIV-infected persons who have Tuberculosis. Thailand Emerg Infect Dis 2009,15(2):258-264. 10.3201/eid1502.080942
Kitkungvan D, Apisarnthanasak A, Plengpart P, Mundy LM: Fever of unknown origin in patients with HIV infection in Thailand: an observational study and review of the literature. Int J STD AIDS 2008, 19: 232-235. 10.1258/ijsa.2007.007191
Likittanasombut P: Opportunistic central nervous system infection in human immunodeficiency virus infected patients in Thammasat hospital, Thailand. Neurology Asia 2004, 9: 29-32.
Surasiengsunk S, Kiranandana S, Wongboonsin K, Garnett GP, Anderson RM, Griensven GJP: Demographic impact of the HIV epidemic in Thailand. AIDS 1998,12(7):775-784. 10.1097/00002030-199807000-00014
AIDS Situation. 2010.
Chariyalertsak S, Sirisanthana T, Saengwonloey O, Nelson KE: Clinical presentation and risk behaviors of patients with acquired immunodeficiency syndrome in Thailand, 1994-1998: regional variation and temporal trends. Clin Infect Dis 2001, 32: 955-962. 10.1086/319348
Jones H, Pardthaisong L: Demographic interactions and developmental implications in the era of AIDS: findings from Northern Thailand. Appl Geography 2000,20(3):255-275. 10.1016/S0143-6228(00)00007-2
Faramnuayphol P, Chongsuvivatwong V, Pannarunothai S: Geographical variation of mortality in Thailand. J Med Assoc Thai 2008,91(9):1455-1460.
Odton P, Choonpradub C, Bundhamcharoen K: Geographical variations in all-cause mortality in Thailand. Southeast Asian J Trop Med Public Health 2010,41(5):1209-1219.
Mayanja BN, Baisley K, Nalweyiso N, Kibengo FM, Mugisha JO, Van der Paal L, Maher D, Kaleebu P: Using verbal autopsy to assess the prevalence of HIV infection among deaths in ART period in rural Uganda: a prospective cohort study, 2006-2008. Popul Health Metr 2011, 9: 36. 10.1186/1478-7954-9-36
2012 Thailand AIDS Response Progress Report. National AIDS Management Center, Department of Disease Control, Ministry of Public Health, Bangkok; 2013.
The authors gratefully acknowledge Prof. Dr. Don McNeil, Professor Emeritus of Statistics at Macquarie University, Australia for his valuable and helpful guidance and Greig Rundle for his assistance and suggestions. We also thank the SPICE project team for collecting the 2005 VA data.
The authors declare that they have no competing interests.
All authors participated in the design of the study and the interpretation of the results. AC and PT performed the study analysis and drafted the manuscript and prepared all tables and graphs. VC critically reviewed the manuscript. All authors read and contributed to the final manuscript.