The impact of individual-level heterogeneity on estimated infectious disease burden: a simulation study
© The Author(s). 2016
Received: 28 October 2015
Accepted: 2 December 2016
Published: 8 December 2016
Disease burden is not evenly distributed within a population; this uneven distribution can be due to individual heterogeneity in progression rates between disease stages. Composite measures of disease burden that are based on disease progression models, such as the disability-adjusted life year (DALY), are widely used to quantify the current and future burden of infectious diseases. Our goal was to investigate to what extent ignoring the presence of heterogeneity could bias DALY computation.
Simulations using individual-based models for hypothetical infectious diseases with short and long natural histories were run assuming either “population-averaged” progression probabilities between disease stages, or progression probabilities that were influenced by an a priori defined individual-level frailty (i.e., heterogeneity in disease risk) distribution, and DALYs were calculated.
Under the assumption of heterogeneity in transition rates and increasing frailty with age, the short natural history disease model predicted 14% fewer DALYs compared with the homogenous population assumption. Simulations of a long natural history disease indicated that assuming homogeneity in transition rates when heterogeneity was present could overestimate total DALYs, in the present case by 4% (95% quantile interval: 1–8%).
The consequences of ignoring population heterogeneity should be considered when defining transition parameters for natural history models and when interpreting the resulting disease burden estimates.
KeywordsInfectious diseases Heterogeneity Disability-adjusted life years Markov model
Disease burden, whether computed for infectious or for chronic diseases, is not evenly distributed within a population, or even among members of a particular stratum of the afflicted population. Relatively few afflicted individuals carry a disproportionate amount of the burden. This fact is obscured by the “population-averaged” approach to calculating and reporting standard epidemiological indicators, such as incidence, as well as composite measures of disease burden, such as disability-adjusted life years (DALYs). This individual-level heterogeneity in disease risk, often referred to as “frailty” [1–3], represents variation beyond that explained by known and measurable risk factors; such variation may be attributed to genetic, epigenetic, environmental, and/or stochastic factors. Unmeasured variation, when labeled as “randomness in degree of susceptibility,” has long been recognized as important for interpreting historical patterns in mortality and for improving the fit of demographic models , as well as for explaining age-dependent patterns of incidence for diseases such as testicular cancer . Infectious diseases such as HIV, hepatitis C, and tuberculosis are also relevant candidates for such an analysis approach. However, although individual heterogeneity has been discussed in the context of health economic cost-effectiveness models [6, 7], and variability in transition rates has been modeled by specifying a distribution function fitted to clinical data , its impact has yet to be explicitly addressed in current disease burden estimation exercises.
The question to be addressed in the present paper is: Does ignoring individual-level heterogeneity in the rate of progressing from acute infection to more severe disease stages result in biased estimates of disease burden when using a disease progression pathway modeling approach to compute DALYs? Unobserved, unmeasured individual heterogeneity cannot be captured by covariates. Therefore, neither adjustment nor stratification are analysis options, and analytical or simulation methods are required to quantify the expected effects of ignoring unmeasured heterogeneity. The fundamental issue at stake concerns the impact of ignoring individual heterogeneity when ranking diseases according to their disease burden, which is a useful form of presentation for public health policymakers. If two diseases differ widely in terms of degree of individual heterogeneity – for instance if variation in individual-level susceptibility to rhinovirus infection differed greatly from susceptibility to Campylobacter – then their relative ranking may change substantially if individual heterogeneity is taken into account when computing DALYs. If such a ranking informs the prioritization of public health services, then appropriate computation of the absolute disease burden is vital. Thus, a first step towards understanding the effect of unmeasured heterogeneity on measures such as the DALY is to employ computational simulations to compare the expected disease burden in scenarios with and without such heterogeneity.
A potentially serious concern for the computation and interpretation of disease burden estimates relates to individual heterogeneity in rates of disease progression. Especially for chronic diseases, persons observed to be in the same disease stage may represent a wide range of individual disease progression rates, with consequences for the evaluation of interventions . “Population-averaged” progression rates are typically employed in the disease progression pathway models that form the basis for pathogen-based disease burden estimation [10–12] However, a given patient population plausibly may contain relatively few fast-progressors, and many more slow-progressors; the oft-used population-averaged transition probability obscures the potential skewedness in the rate distribution, and disregards the extent of any variability as well as potential correlations between transition probabilities between successive health states. Although heterogeneity in infectivity or susceptibility to infection is also plausible, the current simulations do not consider this further, as transmission of infection is not modeled in the present study.
Below, we present simple natural history models (“outcome trees”) for two fictitious infectious diseases, “X1” and “X 2,” and report the impact on estimated disease burden (in DALYs) when a priori assumed distributions of individual heterogeneity are incorporated into the model. The two hypothetical diseases are broadly representative of infectious diseases with short (e.g., Q fever) and long (e.g., hepatitis C virus infection) natural histories, but are not intended to correspond to specific diseases. Rather than implementing (possibly quite complex) disease progression pathways of actual infectious diseases, we simulate disease burden in simplified natural history models to facilitate interpretation of the results.
Our primary objective is to compare, for each of the two disease models, the disease burden for an infected cohort in which individual heterogeneity in progression probabilities between disease stages is present (heterogeneity variants), to the disease burden if this heterogeneity is ignored (no-heterogeneity variants). As a secondary objective, we investigate the impact of ignoring heterogeneity in disease progression rates on the disease burden averted due to a simulated public health intervention, namely high-coverage age-targeted vaccination.
In the current study, we employ the term “frailty” more broadly than used in the statistical literature, where frailty refers to an unobserved random factor that modifies an individual’s hazard function. We use frailty to indicate an individual’s position within a population distribution of disease progression rates (specified as a Gamma distribution); individuals with higher frailty values progress more quickly than individuals with lower frailty values.
In disease model X1, acutely infected individuals develop chronic infection with an age-independent transition probability of 20% (for simulation purposes, this transition is assumed to effectively occur immediately). The transition probability for the risk of death following chronic infection is dependent on age, with case-fatality ratios of 5, 1, 2, and 15% specified for the <15, 15–44, 45–64, and 65+ years age-groups, respectively.
Disease model X 2 simulates a disease with a long natural history. As for disease model X1, acutely infected individuals develop chronic infection with an age-independent probability of 20% (assumed to effectively occur immediately). Average progression from the chronic infection health outcome to severe sequela is assumed to be slow, with a transition probability of 2% per year. The annual probability of death following development of this sequela was set to 4%. Both of these transition probabilities are specified as age-independent. In this simulation, we needed to track individuals over time, and to simulate ageing of the acutely infected cohort. All individuals were assumed to die after reaching their 86th birthday, if they did not reach the death stage before this time.
To represent individual heterogeneity in the probability of transitioning from acute to chronic infection, from chronic infection to the severe sequela disease stage (model X 2 only), and from severe sequela to death (model X 2 only), we first assigned frailty values to each individual by random sampling from an age-independent frailty distribution. These frailty values are considered to be assigned at birth, and therefore did not change through an individual’s lifetime (see below). As a result, the more frail individuals were modeled to have higher transition probabilities for the relevant transitions, and the less frail to have lower transition probabilities. For model X1 only, individual heterogeneity in the progression from acute to chronic infection was assumed to be age-related, with mean frailty increasing with age. This leads to a stochastic tendency for developing chronic infection being more likely for older compared with younger individuals.
Simulating disease progression and computing disease burden
We simulated disease progression in both disease models separately using an individual-based modeling approach, whereby each infected case was followed throughout disease progression, and the burden associated with each health outcome (and the sum over all outcomes) was computed using standard pathogen-based DALY methodology . By using an individual-based modeling approach, we are thus able to account for the correlation in transition probabilities, which would be lost when using a population-averaged approach. In both disease models, all individuals are assumed to start in the acute infection disease stage. In disease model X 2, identically sized cohorts of incident cases (n = 5000) entered the model each simulation year.
In the no-heterogeneity simulations, the expected YLD, YLL, and DALYs were computed from the expected number of cases progressing through an outcome tree defined by the transitional probabilities and Dutch male life expectancies for the year 2000 , and given assumed disability weights and durations (Fig. 1). Disease stage duration was truncated if the simulated individual reached their 86th birthday while in that disease stage (relevant for model X 2 only). YLD and YLL measures were summed over all relevant health outcomes, with the DALY measure defined as the simple sum of YLD and YLL.
In the heterogeneity simulations, the central idea implemented was that the infected individuals who are most likely to transition to a subsequent disease stage, such as a complication or death, are those with the highest frailty. For these simulations, we first randomly sampled from the pre-defined frailty distributions (see below) and assigned frailty values to each individual. For disease model X1, the number of cases transitioning from acute to chronic infection was constrained to equal the expected cases (N) determined using the no-heterogeneity variant of the same model (to permit comparability between heterogeneity and no-heterogeneity variants). For disease model X 2, the number of cases transitioning from a given health outcome to the subsequent health outcome in each simulation year was also constrained to equal the expected cases (N) based on the “population-averaged” transition probability. Stochastic sampling methods were used to determine which individuals transitioned from each health outcome. Specifically, N individuals were sampled without replacement, with the probability of being selected weighted according to each individual’s assigned frailty value. This procedure was then repeated for a total of 1000 times, with the median and 2.5 and 97.5% percentiles of the distributions of YLD, YLL, and DALYs reported.
For disease model X 2, the disease burden will be largely determined by the number of individuals who reach the death stage; the risk of death is dependent on the annual progression probabilities from the chronic infection and severe sequela stages. In sensitivity analysis, the effect of the initial choice of these parameter values on the simulated burden and on the overestimation of DALYs due to assuming population-averaged transition probabilities is explored. Additional file 1 reports the results of simultaneously varying the annual transition probabilities for the final two transitions in model X 2 across a limited range. In a second sensitivity analysis involving model X 2, two further frailty distributions are specified, and burden in DALYs compared with that obtained using the rightward-skewed distribution. In the first, skewedness was reversed (i.e., corresponding to a disease with few slow-progressors and many fast-progressors); in the second a peaked symmetrical distribution was tested (i.e., corresponding to a disease with equal (low) numbers of slow- and fast-progressors).
Age-targeted vaccination scenario
We estimated the effect of a single simulated public health intervention, age-targeted vaccination with a simulated high coverage of 80%, and calculated DALYs averted. This was accomplished by retaining only 20% of the potential acute infection cases aged ≤19 years, very crudely simulating the effects of herd immunity on older age groups (see Fig. 2), and re-running the no-heterogeneity simulation. Then, the heterogeneity variant was run, to assess any change in the size of the vaccination effect. Note that a more accurate simulation of the impact of an age-targeted vaccination program would employ a dynamic modeling approach to simulate the time-dependent influence of herd immunity on successive birth cohorts entering the model.
Simulations were carried out in the R statistical programming environment, version 3.1.0 .
Results of simulations using disease models X1 and X 2, comparing the estimated disease burden between no-heterogeneity and heterogeneity variants. Results indicate the total burden for individuals acutely infected in simulation year 1, with 95% quantile intervals
Disease model [–Variant]
YLL (95% interval)
DALY (95% interval)
Overestimation of DALY (95% interval)
X 1 (3 health outcomes, 4 broad age-groups specified for transition from chronic infection to death)
– No heterogeneity
X 2 (4 health outcomes)
– No heterogeneity
For disease model X 2, in which a disease with a long natural history was simulated via specification of annual transition probabilities, overestimation of total disease burden by the no-heterogeneity variant was by a factor of 1.04 (95% interval: 1.01–1.08) (Table 1). This difference was driven by YLL (overestimated by a factor of 1.12), as YLD was actually larger for the heterogeneity variant. The “trade-off” between YLD and YLL is due to a greater proportion of infected persons spending more time in the chronic infection and severe sequela stages in the heterogeneity compared with the no-heterogeneity variant (leading to a higher YLD), and the corresponding fewer deaths in the former variant (leading to lower YLL).
The results of the first sensitivity analysis indicated that the values initially chosen for the progression probabilities from the chronic infection and severe sequela stages resulted in a disease burden overestimation factor on the high end for the range of parameter values investigated (Additional file 1). This factor tended to increase as either annual probability increased (leading to more mortality at a younger age), with a range of 1.01 to 1.08. In the second sensitivity analysis, leftward-skewed and peaked symmetrical frailty distributions were investigated; the resulting greater DALYs compared with the rightward-skewed distribution (main analysis) obtained with both alternatives lends support to our central finding from disease model X 2: burden is lower when there are a relatively greater number of slow- than fast-progressors, because of the smaller number of premature deaths.
Simulation results: estimated burden under vaccination and no-vaccination scenarios using disease model X 2. Results indicate the total burden for individuals acutely infected in simulation year 1, with 95% quantile intervals
Model variant [–Vacc. scenario]
YLL (95% interval)
DALY (95% interval)
Burden averted DALY (%)
X 2 No heterogeneity
– No vaccination
– Vaccination <20 year
X 2 Heterogeneity a
– No vaccination
– Vaccination <20 year
To what extent does individual heterogeneity in disease progression rates affect the computation of composite disease burden measures, such as the DALY? Our principal finding is the following: if the degree of individual heterogeneity that we simulated in transition probability distributions mimics the extent of unmeasured heterogeneity in the population, then ignoring this heterogeneity can result in inflated disease burden estimates. In the case of disease model X1, the simulated dependence of mean frailty on age in the heterogeneity variant is responsible for the lower disease burden compared with the no-heterogeneity variant. With a skewed frailty distribution, a minority of patients die young, with the majority living to an older age, compared with application of a population-averaged transition probability. This resulted in a smaller YLL – and a consequent 14% lower total disease burden – being estimated for the heterogeneity than for the no-heterogeneity variant.
For disease model X 2, in which frailty distributions were specified as age-independent (i.e., assumed fixed at the age of acute infection), the 4% lower burden estimated for the heterogeneity variant is due entirely to the simulated heterogeneity in disease progression rates. This is because even though the most frail individuals progress the most rapidly through the disease course, and therefore have a higher probability of developing severe sequelae and dying at a younger age, on average disease progression is slower than if heterogeneity is ignored. Due to the skewedness of the frailty distribution, only a minority of patients are fast progressors; for the majority of patients, disease progression is slow, and the severe disease stages, if experienced during their lifetime, are reached at a later age.
The lower estimated burden for the heterogeneity variant is therefore due to fewer members of an acutely infected cohort reaching the age at which severe sequela or death due to the disease can occur (and thus resulting in a lower YLL); however, individuals in this variant tended to spend longer in chronic infection and severe sequela stages compared with the no-heterogeneity model, which resulted in a higher YLD. Despite this YLD/YLL “trade-off,” there is an overall reduced burden in the heterogeneity variant, most apparent in the younger age-groups (5- to 39-year-olds) (Fig. 6) due to their lower risk of dying from the disease before reaching their life expectancy.
It might be argued that the X1 simulations only demonstrate that availability of age-dependent transition probabilities in place of a single age-independent transition probability is vital, if the incident case population covers a wide age range and the risk of developing a complication or dying is greater for older than for younger patients. Because in disease model X1 the frailest patients are the most likely to transition, assuming increasing mean frailty with age effectively translates to a statistical preference for older patients transitioning to chronic infection before younger patients. Disease model X 2 – which explicitly simulates aging of an acutely infected cohort simultaneously with progression through the various disease stages – illustrates that the assumption of age-dependent mean frailty is unnecessary for longer natural history diseases.
For disease model X1, ignoring age-dependent heterogeneity leads to overestimation of disease burden; but is it plausible that mean frailty would increase with age? Although there are health states for which the young are at the greatest risk, frailty in general may be roughly monotonic with age. Cumulative occasions of ill health from birth (the “insult accumulation” model ) would lead to an individual’s frailty – and so his/her susceptibility to disease progression and/or death – also increasing with age. Also, declining mortality rates or increasing life expectancy over a period of time would give rise to an age-related frailty effect . Finally, in the case of infectious diseases, immunosenescence (age-associated decline in immune function) could contribute to an increasing susceptibility to development of complications and death .
For diseases with a long natural history, as exemplified by disease model X 2, comparison of simulations assuming age-independent heterogeneity with the no-heterogeneity variant suggested that ignoring a plausible degree of individual heterogeneity in disease progression when computing DALYs would lead to a 4% overestimation of the total expected burden among a cohort of acutely infected persons. Although the magnitude of DALY overestimation is small, mortality burden was overestimated by 14%. If the prioritization of public health resources are informed by a ranking of diseases according to overall burden or mortality burden, differential overestimation (i.e., individual heterogeneity affecting burden estimates for a subset of the ranked diseases) may have important consequences. In addition, if such a disease model is used to project the impact of a prevention initiative such as age-targeted vaccination, heterogeneity could influence the size and/or direction of the intervention effect, and therefore investigation of the impact of homogeneity assumptions is important for decision-making . However, in our X 2 simulation incorporating individual heterogeneity, the vaccination effect size (on DALYs) was virtually identical to the projected effect size for the no-heterogeneity variant.
Application of the concepts investigated in the current paper to epidemiological studies in which disease burden is estimated is relatively unexplored. Estimation of the extent of unmeasured heterogeneity can in principle be done by fitting a statistical model to a longitudinal dataset that records disease state transition times among a cohort of infected individuals, but certain assumptions are required , and interpretation must be made with caution. If a variance parameter can be validly estimated for a given transition, then the calculation of DALYs, for instance, could incorporate this variability, by specifying a relevant distribution in place of a single “population-averaged” transition probability .
In conclusion, the current findings corroborate what has been reported regarding the influence of heterogeneity in Markov models for cost-effectiveness [6, 7]: ignoring heterogeneity can produce either optimistic or pessimistic cost-effectiveness ratios, with consequent impact on the use of such ratios for the planning of interventions. The heterogeneity issue could apply to every transition in a Markov model used for disease burden calculation; therefore, when selecting parameters for this type of model and interpreting the resulting burden estimates, the analyst should consider the consequences of assuming that the population within each health outcome is homogenous with regard to transition rates. If this homogeneity assumption cannot be made, an individual-based modeling approach is the most appropriate solution.
SM conceived the study, carried out the simulations, and drafted the manuscript. BD advised on simulation and DALY methods, interpreted the results, and revised the manuscript. JW advised on statistical methodology, interpreted the results, and revised the manuscript. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Vaupel JW, Manton KG, Stallard E. The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography. 1979;16:439–54.View ArticlePubMedGoogle Scholar
- Aalen OO. Effects of frailty in survival analysis. Stat Methods Med Res. 1994;3:227–43.View ArticlePubMedGoogle Scholar
- Aalen OO, Valberg M, Grotmol T, Tretli S. Understanding variation in disease risk: the elusive concept of frailty. Int J Epidemiol. 2014;44:1408–21.View ArticlePubMedPubMed CentralGoogle Scholar
- Alter G, Riley JC. Frailty, sickness, and death: models of morbidity and mortality in historical populations. Popul Stud (Camb). 1989;43:25–45.View ArticleGoogle Scholar
- Moger TA, Aalen OO, Halvorsen TO, Storm HH, Tretli S. Frailty modelling of testicular cancer incidence using Scandinavian data. Biostatistics. 2004;5:1–14.View ArticlePubMedGoogle Scholar
- Kuntz KM, Goldie SJ. Assessing the sensitivity of decision-analytic results to unobserved markers of risk: defining the effects of heterogeneity bias. Med Decis Making. 2002;22:218–27.View ArticlePubMedGoogle Scholar
- Zaric GS. The impact of ignoring population heterogeneity when Markov models are used in cost-effectiveness analysis. Med Decis Making. 2003;23:379–96.View ArticlePubMedGoogle Scholar
- Havelaar AH, Van Duynhoven YT, Nauta MJ, Bouwknegt M, Heuvelink AE, De Wit GA, Nieuwenhuizen MG, van de Kar NC. Disease burden in The Netherlands due to infections with Shiga toxin-producing Escherichia coli O157. Epidemiol Infect. 2004;132:467–84.View ArticlePubMedPubMed CentralGoogle Scholar
- Ioannidis JP, Cappelleri JC, Schmid CH, Lau J. Impact of epidemic and individual heterogeneity on the population distribution of disease progression rates. An example from patient populations in trials of human immunodeficiency virus infection. Am J Epidemiol. 1996;144:1074–85.View ArticlePubMedGoogle Scholar
- Mangen MJ, Plass D, Havelaar AH, Gibbons CL, Cassini A, Muhlberger N, van Lier A, Haagsma JA, Brooke RJ, Lai T, et al. The pathogen- and incidence-based DALY approach: an appropriate [corrected] methodology for estimating the burden of infectious diseases. PLoS One. 2013;8:e79740.View ArticlePubMedPubMed CentralGoogle Scholar
- Plass D, Mangen MJ, Kraemer A, Pinheiro P, Gilsdorf A, Krause G, Gibbons CL, van Lier A, McDonald SA, Brooke RJ, et al. The disease burden of hepatitis B, influenza, measles and salmonellosis in Germany: first results of the burden of communicable diseases in Europe study. Epidemiol Infect. 2014;142:2024–35.View ArticlePubMedGoogle Scholar
- van Lier EA, Havelaar AH, Nanda A. The burden of infectious diseases in Europe: a pilot study. Euro Surveill. 2007;12:E3–4.PubMedGoogle Scholar
- Statistics Netherlands (CBS). Life expectancy; sex and age, from 1950 [Levensverwachting; geslacht en leeftijd, vanaf 1950]. Voorburg, NL: CBS; 2011. http://statline.cbs.nl/.Google Scholar
- R Development Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2014.Google Scholar
- Verbrugge LM. Longer life but worsening health? Trends in health and mortality of middle-aged and older persons. Milbank Mem Fund Q Health Soc. 1984;62:475–519.View ArticlePubMedGoogle Scholar
- Ginaldi L, Loreto MF, Corsi MP, Modesti M, De Martinis M. Immunosenescence and infectious diseases. Microbes Infect. 2001;3:851–7.View ArticlePubMedGoogle Scholar