This article has Open Peer Review reports available.
The Summary Index of Malaria Surveillance (SIMS): a stable index of malaria within India
© Cohen et al; licensee BioMed Central Ltd. 2010
Received: 2 July 2009
Accepted: 11 February 2010
Published: 11 February 2010
Malaria in India has been difficult to measure. Mortality and morbidity are not comprehensively reported, impeding efforts to track changes in disease burden. However, a set of blood measures has been collected regularly by the National Malaria Control Program in most districts since 1958.
Here, we use principal components analysis to combine these measures into a single index, the Summary Index of Malaria Surveillance (SIMS), and then test its temporal and geographic stability using subsets of the data.
The SIMS correlates positively with all its individual components and with external measures of mortality and morbidity. It is highly consistent and stable over time (1995-2005) and regions of India. It includes measures of both vivax and falciparum malaria, with vivax dominant at lower transmission levels and falciparum dominant at higher transmission levels, perhaps due to ecological specialization of the species.
This measure should provide a useful tool for researchers looking to summarize geographic or temporal trends in malaria in India, and can be readily applied by administrators with no mathematical or scientific background. We include a spreadsheet that allows simple calculation of the index for researchers and local administrators. Similar principles are likely applicable worldwide, though further validation is needed before using the SIMS outside India.
Malaria in India has a long and tumultuous history. Apparently not widespread before British agricultural projects created ideal breeding conditions for the mosquito vectors, by the end of the 19th century, malaria had become a severe public health concern: a constant endemic problem in northeastern regions such as Bengal and a periodically ravaging epidemic in the northwestern states such as Punjab, where a single epidemic killed in excess of 300,000 people in late 1908 [1–3]. During this time, falciparum malaria - substantially more severe and deadly than the other species - became widespread. After independence, a control program nearly succeeded in eliminating malaria entirely, but in 1965, on the verge of success, funding was cut, and there was a substantial rebound of the disease in the following years[4, 5]. Currently, malaria is much less severe than before the control program, but it continues to be a major public health concern, accounting for perhaps 1-2% of all deaths in India (AAC and PJ, unpublished data). In some states, particularly Orissa, disease burden is much worse.
As part of the National Malaria Eradication Programme (which became the National Vector-born Disease Control Program, or NVBDCP), a surveillance system was set up in 1958 to measure malaria incidence based on examination of blood smears at Primary Health Centers (PHCs). However, because most of the surveillance is passive, slides are much more likely to come from people who have malaria than expected from a random sample of the population. These measures are thus not a reliable way to estimate overall incidence, morbidity, or mortality. Better anti-malarial treatment and surveillance in high-malaria areas may also result in relative underestimation of malaria in low-malaria areas.
Statistics are compiled yearly for each district. Convention and the nature of the data collection have resulted in the calculation of seven different indices for each district in each year (see Methods for details), each with a slightly different interpretation. Some measures are for all malaria; others just for the more severe species, falciparum. This has resulted in the need to present many different graphs or columns to show trends for so many indices, and it is not always clear how to interpret countervailing trends in different indices. Further, each index has strengths and weaknesses, and none alone seems to adequately summarize malaria levels for an area.
Here, using principal components analysis (PCA), we combine the existing measures into the Summary Index of Malaria Surveillance (SIMS), a single summary index of malaria trends. This index is scaled between 0 and 100, with higher numbers indicating more malaria, making it easy for laymen to interpret. We confirm the validity of this index using both internal and external validation. Internal validation includes confirming that (a) all measures load in the same direction on the index (the first PCA axis); (b) the first PCA axis explains a substantial portion of the variation; (c) the axis is robust when generated from different subsets of the data; and (d) the axis is robust when generated from different combinations of the measures. External validation is conducted by assessing the correlation of our index and the original measures with independent measures of malaria mortality and morbidity in India from the Million Death Study (MDS) and District-level Household Survey (DLHS), respectively. Lastly, we provide a Microsoft Excel spreadsheet that can easily be used by researchers and local officials to calculate the SIMS from raw data.
NVBDCP (National Vector-born Disease Control Program)
Current protocol for malaria surveillance and treatment in India
Private sector response
Household/where there is no doctor
• Active surveillance of fever cases (home visit of HW)
• Presumptive treatment with chloroquin
• Peripheral blood smear
• If the result is positive then radical treatment with Primaquin for appropriate duration based on whether it is Pv or Pf.
• Over-the-counter incomplete treatment by pharmacists
PHC or health facility (doctor available)
• Passive surveillance of fever cases (attendees of the facility)
• If there is no facility for blood smear examination then presumptive treatment with chloroquin, with a peripheral blood smear or rapid test for Pf taken for subsequent analysis. If the result is positive then radical treatment with Primaquin for appropriate duration based on whether it is Pv or Pf.
• If there is a facility for blood smear examination (malaria clinic), peripheral blood smear and decide course of treatment based on the results (PT/PT +RT/Post RT/IPT).
• Case management based on clinical impression, Peripheral Blood smear/Rapid test for Pf/Quantitative Buffy coat/Indirect tests to detect Malaria.
• Use of Mefloquin/ACT
Referral hospital (specialist doctor available)
• Case management of walk-in as well as referred malaria fever based on clinical impression, peripheral blood smear, rapid test for Pf, quantitative buffy coat, and indirect tests to detect malaria.
• Decide course of treatment based on the results (PT/PT +RT/Post RT/IPT)
• Use of Mefloquin/ACT is common
• Case management of walk-in as well as referred malaria fever based on clinical impression, peripheral blood smear, rapid test for Pf, quantitative buffy coat, and indirect tests to detect malaria.
• Use of Mefloquin/ACT
These results are then compiled at the district level - there are currently about 600 districts in the 35 Indian states and union territories - and used to generate a series of statistics. The raw numbers collected include: population of the district in thousands ("Pop"); blood smears collected ("BSC"); blood smears examined ("BSE"); # of slides positive for P. vivax or malariae ("Pv"); # of slides positive for P. falciparum ("Pf"); and # of slides positive for both ("mixed"). These raw numbers are then used to calculate several indices: total number of positive slides ("positive" = Pv + Pf + mixed); percent of positive slides that are positive for P. falciparum ("%Pf" = (Pf + mixed)/positive); annual blood examination rate ("ABER" = BSE/Pop/10); annual parasite index ("API" = positive/Pop); annual falciparum index ("AFI" = (Pf + mixed)/Pop); slide positivity rate ("SPR" = 100 × positive/BSE); and slide falciparum rate ("SFR" = 100 × (Pf + mixed)/BSE). The number of malaria deaths certified by the NVBDCP ("deaths") is also recorded. The measures traditionally used to monitor malaria levels are %Pf, ABER, API, AFI, SPR, SFR, and deaths.
Each of the measures above has a particular interpretation. ABER measures coverage of the surveillance program, and potentially also local fever incidence. Convention suggests that when ABER is less than 10%, coverage is poor enough that population-referent measures such as API and AFI should be viewed with skepticism. However, ABER may be associated with malaria rates to the extent that sick people seek treatment and therefore have slides examined. Falciparum-positivity is important to distinguish from overall positivity because falciparum malaria is more deadly. API and AFI, though not true measures of population prevalence or incidence, do provide an approximation of disease burden in the population, because presumably many who fall ill do come to PHCs or health facilities for treatment and thus have slides taken. SPR and SFR are measures of disease burden that avoid the problem of referencing population size when only a small portion of the population is sampled, but could be biased in their own right by the incidence of nonmalarial fevers that would lead people to come to clinics and thus boost the denominator (BSE). %Pf should be a good measure of the relative occurrence of falciparum and non-falciparum malaria but provides no information on absolute occurrence. Because the standards are high and rigid for labeling of a death as malaria by the NVBDCP (i.e., only when a peripheral blood smear or rapid diagnostic test is positive; even quantitative buffy coat and indirect antibody tests are not recognized), most malaria deaths are not recorded by the surveillance program.
Data were generally collected for districts that were recognized administrative units at time of collection; thus, new districts often have data for only some of the years, and district boundaries change over time within the dataset. We thus aggregated districts as needed to ensure that units were consistent over time, resulting in a final list of 499 districts. We used data from 1995-2005, considering each district in each year as an independent data point for our purposes ("district-years," 5,386 used in this analysis). Data were missing for certain districts in certain years; 103 such district-years were ignored.
MDS (Million Death Study)
The MDS gives estimates of cause-specific death rates throughout India, and we used it here to generate estimates of district-year-specific malaria mortality as an external check on the validity of our indices from the NVBDCP. The study was conducted in 1.1 million homes in 6,671 small areas chosen from all parts of India (about 1,000 persons per area) to be representative at the state level. The Sample Registration System was established by the Registrar General of India to monitor all births and deaths in these areas[6, 7]. Each home in which a death had been recorded between 2001 and 2003 was visited by one of 900 nonmedical field workers, and the underlying causes of all deaths were sought by verbal autopsy (a structured investigation of events leading to the death conducted by at least two trained physicians) [8–10]. Details of the methods, quality-control checks, and validation results have been reported previously[8, 10–12].
For the purpose of this study, we limited our sample of the MDS to deaths occurring at ages 1-69, when misdiagnosis is less problematic and when the bulk of malaria mortality occurs. For each district-year (2001-2003), we calculated the percentage of total deaths in this age range that were attributable to malaria based on the verbal autopsy results. As a check, we also included percentage of deaths due to fever of unknown origin. For some analyses, we included only district-years when there was at least one malaria death.
DLHS (District-level Household Survey)
The DLHS is an India-wide, door-to-door household survey that contains questions about whether household members have suffered from malaria recently. Full details of the methodology are available at the Web site of the International Institute for Population Sciences http://www.rchiips.org/ and publications therein[13, 14]. We used it here as a way to estimate district-specific malaria morbidity in the years of the survey as an external check on our NVBDCP indices. The DLHS was conducted in three rounds - in 1998-99, 2002-2003, and 2005-06 - but we use data from only the first two rounds. Each round had hundreds of questions, one of which was whether any member of the household had suffered from malaria in the past three months (round 1) or past two weeks (round 2). If the answer was yes, data were collected on the age and sex of the people with malaria and whether they received treatment, for up to five people per household (round 1) or all with malaria (round 2). By combining this with the number of members in the household, we can generate estimates of morbidity as the number of individuals in a district who suffered from malaria in the specified period divided by total individuals in the district. The two rounds provide somewhat different measures of morbidity, but because they are internally consistent across India, they can each be used to validate the NVBDCP indices. Our DLHS 1 sample included 3.2 million individuals, with a mean ± SD of 6380 ± 1680 individuals per district. Our DLHS 2 sample covered 3.5 million individuals, with 5860 ± 760 individuals per district. Thus, even at relatively low malaria levels, our district morbidity estimates should be fairly robust.
For each of the above PCA analyses, we expected the first PCA axis, which explains the most variation, to be the one of interest. Thus, after assessing the variance that the first PCA axis explained and confirming that this was much higher than for any of the other PCA axes, we took the component loadings for this axis from each of the 51 PCA analyses and used them to calculate PCA scores for the whole dataset. This resulted in 51 new variables, potential indices generated from the subsets of the data. For example, the loadings for the first axis calculated with only the 1995-1996 district-years were used to create a variable from the whole dataset, including all district-years. Comparison between this and similar indices created from the PCAs run on other years allowed assessment of whether the index had changed over time or was stable. We assessed this by calculating the correlations among these 51 axes and with the seven original variables. This was also used to choose a best axis for the Summary Index of Malaria Surveillance based on strength of correlations with the original variables and comprehensiveness. For external validation, we calculated the correlation between MDS-recorded deaths and this axis in the years 2001-2003, and between DLHS-recorded morbidity and this axis in the years 1998-1999 and again for 2002-2003.
For the 51 PCAs we ran, the variance explained by the first axis was between 49% and 89% (Additional file 2). As predicted, all measures loaded in the same direction on the first axis of each PCA. Even on mutually exclusive portions of the dataset (e.g., Additional file 2, columns 5-9), the variable loadings were nearly identical, suggesting substantial stability of the relationships. The correlations of the indices over these subsets were more variable (see Additional file 1). Additional axes had minimal variance explained and inconsistent loadings, so we retained only the first axis in all analyses. This means that there was no second axis needed to explain relative abundance of falciparum and vivax; in other words, falciparum-to-vivax ratio tracks overall malaria levels, as also seen from correlations of the base indices (%PF with API: r = 0.51, p < 0.0001; %PF with SPR: r = 0.49, p < 0.0001).
SPR, SFR, and MR include a minor correction factor of 0.001 because they are frequently zero, and log(0) = -∞; see Additional file 1. Each of the five indices is then log or square-root transformed for normality. They are each turned into standard normal random variables by subtracting the mean and dividing by the standard deviation, then multiplied by the appropriate axis loadings from PCA B1. The additional adjustments are for the purpose of scaling and are described in detail below. We provide a spreadsheet online that can be used to calculate the SIMS either from raw data or from the existing indices (Additional file 3).
We have shown that principal components analysis can be used to generate a robust index of malaria incidence, the SIMS, based on summary measures of blood data collected by district and year throughout India. This index will provide a simpler way to quantify and interpret temporal and geographic variation in malaria in India because multiple measures need not be considered simultaneously. In some cases, the individual measures will still be more appropriate - for example, to compare relative trends of falciparum and vivax malaria. However, in most cases, a single, more comprehensive measure will be preferable. Even for prediction of mortality according to the MDS, the SIMS fares as well as the falciparum-specific measures.
It is possible to generate a summary measure (valid or not) from almost any set of variables, so a rigorous standard must be used to validate any such summary and to clarify its interpretation. Here, there are multiple strands of evidence suggesting that the SIMS is valid and has a clear interpretation:
(1) All of our individual variables load onto the SIMS in the same direction, as would be predicted.
(2) The SIMS explains a substantial portion of the overall variation, 58%.
(3) The results were essentially unchanged after omitting observations with zeroes from the dataset.
(4) Results were almost exactly identical even when the analysis was run on mutually exclusive subsets of the data from different years or geographic regions.
(5) Results were quite similar even when using different combinations of the indicator variables.
(6) The SIMS correlates as well or better than the individual measures with external measures of malaria mortality and morbidity.
It is unusual that summary measures generated from mutually exclusive subsets of the data would correlate so well with each other, and this result has some implications. First, SIMS is likely to be stable over time and space. The component loadings generated in this study should be applicable to new data generated in the future, without the necessity to run new PCA for the new dataset and without the potential for conflicting results. Second, the measurement error appears to be relatively consistent over time. If measurement error varied, the loadings would likely become unstable, too.
Third, it seems that malaria trends, at least as measured by the NVBDCP, are a real phenomenon that can be described adequately with a single dimension. More specifically, there appears to be a stable pattern of stages of severity in malaria transmission, as evidenced by the ability of one axis to describe both species ratio and overall abundance. At the lowest levels, malaria is essentially absent. Then vivax comes in at low levels. Only when vivax reaches moderate to high levels does falciparum appear, and at the highest levels, falciparum is usually much more common than vivax. This could be an example of ecological niche partitioning between these species, with one favored by conditions of low, stable transmission and the other favored by high transmission (i.e., high mosquito densities and bite rates) and mortality rates[17, 18]. If this is the case, it might bode poorly for the hope that falciparum will eventually evolve lower virulence as that niche is already occupied by a competitor species [19–21]. However, there are many other potential explanations for this pattern. It is possible that the species occupy different niches without competing, or that there are sampling biases against detecting vivax in high falciparum areas, for example.
The main caution in the use of the SIMS going forward is that the NVBDCP has been improving its data collection methodology, including use of rapid diagnostic tests and computerized data entry (GPS Dhillon, personal communication). This is unquestionably a laudable development, but may inadvertently cause problems with comparability of data over time and space, and it is possible that future values of the SIMS will not be fully comparable to past ones. However, it is also possible that past variation in sampling accuracy is also large, and that the SIMS is successfully detecting signal among the noise in ways that will be unaffected by these changes in data collection methodology. For example, NVBDCP surveillance data rely only on the public health sector, and private facilities and persons who do not seek care are not covered; thus, past surveillance coverage has presumably varied with time and space. Despite this, the SIMS has so far proven stable. Whether this continues to be the case will have to be validated in the future. If the future SIMS proves incomparable to past SIMS, existing surveillance measures such as API should be even harder to compare because incomparabilities would arise due to changes in surveillance methodology affecting all indices, and the SIMS has some ability to buffer these changes by extracting signal from noise. Even if past and future SIMS cannot be directly compared, it is likely that the SIMS as shown here would be a stable measure for comparison within any dataset collected with consistent methodology.
Relationship between the SIMS and existing indices
The SIMS described here should help improve the clarity of malaria surveillance in India and perhaps elsewhere. It is designed for easy use and interpretation by people with no statistical background. In particular, when looking at maps and graphical representations of malaria distribution, it will help to have a single measure rather than several of uncertain interpretation. The strong support we get for a stable measure suggests that it would be worthwhile to pursue similar indices in other settings where different measures of malaria have been collected. Within India, this measure may serve as an analytical tool for researchers assessing the progress of control and eradication programs, and also perhaps as a clear benchmark for holding local authorities accountable for progress. Internationally, it seems likely that similar principles will facilitate the ability to generate a single standard index for malaria transmission intensity.
We have demonstrated that principal components analysis can be used to construct a single measure, the SIMS, that summarizes most relevant variation in malaria surveillance measures across time and space. It can be interpreted as a relative measure of transmission intensity. Species abundance tracks overall levels, meaning that a separate measure is not needed. The SIMS is robust over time and space - alternate versions calculated from subsets of the data did not differ noticeably. We have provided a spreadsheet calculator, ensuring that even field workers with no mathematical background can accurately use the measure. We expect that the SIMS will simplify and improve malaria surveillance in India, and that similar measures should be applicable in other settings as well.
We thank GPS Dhillon, GS Sonal, and participants at the June 8, 2009, malaria working group meeting in New Delhi, India, for helpful comments. External funding is from the Fogarty International Centre of the US National Institutes of Health (grant R01 TW05991-01), Canadian Institute of Health Research (CIHR; IEG- 53506), International Research Development Centre (Grant 102172), and Li Ka Shing Knowledge Institute and Keenan Research Centre at St. Michael's Hospital, University of Toronto (CGHR support). PJ is supported by the Canada Research Chair program.
- Choudhury DS: Malaria in India: Past, present and future. Indian Journal of Pediatrics 1985, 52: 243-248. 10.1007/BF02754849View ArticlePubMedGoogle Scholar
- Singh J: Malaria and Its Control in India. Perspectives in Public Health 1952, 72: 515-525.Google Scholar
- Zurbrigg S: Re-thinking the "human factor" in malaria mortality: the case of Punjab, 1868-1940. Parassitologia 1994, 36: 121-135.PubMedGoogle Scholar
- Akhtar R, Learmonth A, Keynes M: The resurgence of Malaria in India 1965-76. GeoJournal 1977, 1: 69-80. 10.1007/BF00704965View ArticleGoogle Scholar
- Kumar A, Valecha N, Jain T, Dash AP: Burden of Malaria in India: Retrospective and Prospective View. American Journal of Tropical Medicine and Hygiene 2007, 77: 69-78.PubMedGoogle Scholar
- Sample registration system, statistical report: 2004 New Delhi, India: Registrar-General of India; 2005.Google Scholar
- Jha P, Kumar R, Vasa P, Dhingra N, Thiruchelvam D, Moineddin R: Low male-to-female sex ratio of children born in India: national survey of 1· 1 million households. The Lancet 2006, 367: 211-218. 10.1016/S0140-6736(06)67930-0View ArticleGoogle Scholar
- Gajalakshmi V, Peto R, Kanaka S, Balasubramanian S: Verbal autopsy of 48 000 adult deaths attributable to medical causes in Chennai (formerly Madras), India. BMC Public Health 2002, 2: 7. 10.1186/1471-2458-2-7View ArticlePubMedPubMed CentralGoogle Scholar
- Gajalakshmi V, Peto R, Kanaka TS, Jha P: Smoking and mortality from tuberculosis and other diseases in India: retrospective study of 43,000 adult male deaths and 35,000 controls. The Lancet 2003, 362: 507-515. 10.1016/S0140-6736(03)14109-8View ArticleGoogle Scholar
- Jha P, Gajalakshmi V, Gupta PC, Kumar R, Mony P, Dhingra N, Peto R: Prospective Study of One Million Deaths in India: Rationale, Design, and Validation Results. PLoS Medicine 2006, 3: e18. 10.1371/journal.pmed.0030018View ArticlePubMedGoogle Scholar
- Joshi R, Cardona M, Iyengar S, Sukumar A, Raju CR, Raju KR, Raju K, Reddy KS, Lopez A, Neal B: Chronic diseases now a leading cause of death in rural India--mortality data from the Andhra Pradesh Rural Health Initiative. Int J Epidemiol 2006, 35: 1522-1529. 10.1093/ije/dyl168View ArticlePubMedGoogle Scholar
- Kumar R, Thakur J, Rao B, Singh M, Bhatia S: Validity of verbal autopsy in determining causes of adult deaths. Indian Journal of Public Health 2006, 50: 90-94.PubMedGoogle Scholar
- International Institute for Population Sciences (IIPS): Reproductive and Child Health Project, Rapid Household Survey (Phase I & II). Mumbai: IIPS; 1999.Google Scholar
- International Institute for Population Sciences (IIPS): District Level Household Survey (DLHS-2), 2002-04: India. Mumbai: IIPS; 2006.Google Scholar
- National Family Health Survey (NFHS-3), 2005-06: India Volume I. (International Institute for Population Sciences (IIPS) and Macro International, ed. Mumbai: IIPS;; 2007.Google Scholar
- Rossiter JR: Reminder: a horse is a horse. International Journal of Research in Marketing 2005, 22: 23-25. 10.1016/j.ijresmar.2004.11.001View ArticleGoogle Scholar
- Chase JM, Leibold MA: Ecological Niches: Linking Classical and Contemporary Approaches. Chicago: University of Chicago Press; 2003.View ArticleGoogle Scholar
- Peterson AT: Ecologic Niche Modeling and Spatial Patterns of Disease Transmission. Emerging Infectious Diseases 2006, 12: 1822-1826.View ArticlePubMedPubMed CentralGoogle Scholar
- Abderrhaman I, Jean-Claude K, Gauthier S, Jean-Jules T: Global Analysis of New Malaria Intrahost Models with a Competitive Exclusion Principle. SIAM Journal on Applied Mathematics 2006, 67: 260-278. 10.1137/050643271View ArticleGoogle Scholar
- de Roode JC, Culleton R, Cheesman SJ, Carter R, Read AF: Host heterogeneity is a determinant of competitive exclusion or coexistence in genetically diverse malaria infections. Proceedings of the Royal Society B: Biological Sciences 2004, 271: 1073-1080. 10.1098/rspb.2004.2695View ArticlePubMedPubMed CentralGoogle Scholar
- Ewald PW: Evolution of virulence. Infectious Disease Clinics of North America 2004, 18: 1-15. 10.1016/S0891-5520(03)00099-0View ArticlePubMedGoogle Scholar
- Ceesay SJ, Casals-Pascual C, Erskine J, Anya SE, Duah NO, Fulford AJC, Sesay SSS, Abubakar I, Dunyo S, Sey O, et al.: Changes in malaria indices between 1999 and 2007 in The Gambia: a retrospective analysis. The Lancet 2008, 372: 1545-1554. 10.1016/S0140-6736(08)61654-2View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.