- Open Access
- Open Peer Review
Validation of a new predictive risk model: measuring the impact of the major modifiable risks of death for patients and populations
Population Health Metrics volume 13, Article number: 27 (2015)
Modifiable risks account for a large fraction of disease and death, but clinicians and patients lack tools to identify high risk populations or compare the possible benefit of different interventions.
We used data on the distribution of exposure to 12 major behavioral and biometric risk factors inthe US population, mortality rates by cause, and estimates of the proportional hazards of risk factor exposure from published systematic reviews to develop a risk prediction model that estimates an adult’s 10 year mortality risk compared to a population with optimum risk factors. We compared predicted risk to observed mortality in 8,241 respondents in NHANES 1988-1994 and NHANES 1999-2004 with linked mortality data up to the end of 2006.
Predicted risk showed good discrimination with an area under the receiver operating characteristic (ROC) curve of 0.84 (standard error 0.01) for women and 0.84 (SE 0.01) for men. Across deciles of predicted risk, mortality was accurately predicted in men ((Χ2 statistic = 12.3 for men, p=0.196) but slightly overpredicted in the highest decile among women (Χ2 statistic = 22.8, p=0.002). Mortality risk was highly concentrated; for example, among those age 30-44 years, 5.1 % (95 % CI 4.1 % - 6.0 %) of the male and 5.9 % (95 % CI 4.8 % - 6.9 %) of the female population accounted for 25 % of the risk of death.
The risk model accurately predicted mortality in a representative sample of the US population and could be used to help inform patient and provider decision-making, identify high risk groups, and monitor the impact of efforts to improve population health.
The aim of medicine is to reduce the burden of disease . This aim can be achieved by taking actions to promote health and prevent health problems or treat diseases and disabilities after they impose their burden on patients and populations. The evidence suggests a relatively small number of modifiable risks account for a large fraction of the burden of chronic diseases and premature death in the United States as well as the developed world [2–4]. Poor health due to modifiable risks and the costs of treating the resulting disease and injury threaten the affordability of health care. Efforts at prevention or disease modification require not only accurate information on modifiable risks but also the availability of valid, reliable, practical, and actionable measures of these modifiable risks so that those at risk can be identified and interventions appropriately targeted [5–8].
Substantial progress has been made on both fronts in recent years. The Global Burden of Disease initiative has completed systematic reviews identifying and quantifying the modifiable risks of death, disease, and disability in developing and developed countries [9, 10]. Many useful health risk measures have also been developed. Most, however, focus on specific diseases [11, 12] or families of related diseases, such as the widely used Framingham cardiovascular risk index,  or on patients in specific care settings who may be at risk for rapid deterioration, such as the APACHE score for intensive care patients  or risk indices for frail elderly patients who are hospitalized and may be at risk for decubitus ulcers . Interest in measures of general health risks is also substantial, and many employers and some health systems have adopted health risk appraisals (HRAs) to help their health promotion and disease prevention initiatives. Existing HRAs, however, are based on risk models that have not been validated and published in the literature, or have “black box” scoring algorithms that are not open to scrutiny [16, 17].
To address these limitations, we developed a new, non-proprietary, health risk model based on the most recently available systematic reviews of the modifiable risks of death in order to predict all-cause mortality for adults in the United States. In this report, we describe the validation of this model in a sample of US adults. The findings suggest that the risk prediction model could help individuals and clinicians by allowing them to identify and compare potential clinical and behavioral interventions, while allowing those responsible for defined populations (such as primary care practices, accountable care organizations, health plans, and employers) not only to identify those individuals at greatest risk but also to track changes in health risks over time.
The risk model computes an individual’s total risk of mortality over the next 10 years based on exposure to 12 major risk factors (Table 1) for adults aged 30 years or older. These risk factors were included based on reviews of the scientific literature and represent a parsimonious set of the most substantial, modifiable risk factors that contribute to the probability of dying . All risk factors selected had to be (a) actionable: subject to modification by clinical or behavioral interventions, (b) substantial: contribute at least 0.20 years to mortality risk, and (c) evidence-based: supported by recent meta-analyses . We envisioned that the full survey instrument (provided in Additional file 1) could be completed in multiple settings, ranging from clinical visits to online surveys.
Risk score development and calculation
Overall mortality risk
An individual’s risk of mortality is determined by first calculating, for each cause of death separately, the individual’s overall relative risk of mortality compared to no exposure to the 12 risk factors. We used previously published systematic reviews to determine the relative risk per unit increase in exposure to the 12 risk factors by age and sex (Additional file 2). A multiplicative risk model was used to calculate the relative risk of mortality by cause across the 12 risk factors. The model takes into account risk factor correlation (individuals having higher/lower exposure to multiple risk factors due to common socioeconomic or behavioral determinants), and risk mediation (part of the risk associated with factors such as obesity may be mediated through other risk factors such as blood pressure). An individual’s relative risk of mortality by cause is multiplied by the annual background mortality risk by cause to estimate an individual’s overall mortality risk.
Avoidable mortality risk
An individual’s avoidable mortality risk, i.e., the mortality risk that could be avoided by reducing exposure to the 12 risk factors to their optimum level, is calculated as the individual’s overall mortality risk less the individual’s background mortality risk based on age and sex. The background mortality risk is an estimate of the risk of mortality over the next 10 years for an individual of the same age and sex who is not exposed to any of the 12 risk factors. We use the currently observed age- and sex-specific background mortality risk to predict an individual’s future background risk of mortality following standard life-table methodology; [18, 19] that is, a woman currently aged 55 is exposed to the background mortality risk of 55-year-old females for the next year, and the background mortality risk of 56-year-old females in the subsequent year, and so on. An individual’s relative risk of mortality by cause is assumed to be constant over all future periods. An individual’s overall risk of mortality from all causes over the next 10 years and their remaining life expectancy are calculated using the standard competing risk model .
Background mortality risk by cause
To determine the background mortality risk by cause for the current period we combined information on (a) the current distribution of exposure to the 12 risks by age and sex using data from the National Health and Nutrition Examination Survey (NHANES, 2003–2010) and the Behavioral Risk Factor Surveillance System (BRFSS, 2006–2008); (b) age-, sex-, and cause-specific mortality rates in 2010 from the Global Burden of Disease Study 2010;  and (c) relative risks by age, sex, and cause associated with exposure to the 12 risk factors from systematic reviews. The mortality rates have been adjusted for errors in cause of death assignment using previously described methods .
Currently observed age-, sex-, and cause-specific mortality rates represent the rate at which individuals of that age and sex group will die from a cause in a given year. These rates reflect the current exposure to risk factors in the population and their hazardous effects on mortality. The fraction of mortality that is due to current exposure to the 12 risk factors can therefore be determined by calculating a population attributable fraction (PAF) by age, sex, and cause; this was done by calculating the overall relative risk due to the 12 risk factors for each respondent in NHANES (2003–2010), taking into account risk factor correlation and mediation . The PAF for each age, sex, and cause is calculated as the sample weighted sum of the excess risk, i.e., the relative risk minus one, divided by the sample weighted sum of the relative risk across all NHANES respondents. To address selection bias we imputed missing risk factor values using multiple imputations and took the average across the 10 imputations. The background mortality rate by cause is calculated as one minus the PAF multiplied by the current mortality rate. Mortality rates were converted to annual probabilities of dying using the standard life table calculation [18, 19].
Risk score validation
We performed an out-of-sample validation test using established methods and NHANES linked mortality data through December 31, 2006, for respondents interviewed between 1988 and 1994 and between 1999 and 2004. These data were not used in the construction of the risk score. For each individual in the cohort we calculated the predicted risk of mortality over the available follow-up time period up to 10 years. The validation assessed (a) discrimination: the ability of the risk model to distinguish between those who die during the follow-up period and those who survive by calculating the area under the curve (AUC) for the receiver operating characteristic curve;  and (b) calibration: the ability of the risk model to predict the observed level of risk across deciles of the population using the Hosmer-Lemeshow Χ2 statistic. The validation was performed for men and women and by age group.
Impact on life expectancy and distribution of risk
We examined life expectancy and distribution of risk for the NHANES 2003–2010 cohort, a sample that is representative of the US population. We calculated the increase in life expectancy that would result from reducing exposure to the optimum distribution for each individual risk factor as well as the 12 risk factors jointly as previously described . We also used the risk model to estimate the 10-year total and avoidable risk of death for each respondent from NHANES 2003–2010 and present results on the concentration of risk.
All analyses were conducted in Stata 11 (Stata Corporation, Texas).
Accuracy of the mortality prediction: Risk model validation
Additional file 1 summarizes the characteristics of the 8,241 NHANES respondents included in the validation dataset. By the end of 2006, 696 deaths (419 men and 277 women) occurred in this cohort. The risk model was able to discriminate well between individuals who died and those who survived, with an area under the curve (AUC) of 0.84 (SE = 0.01) for women and 0.84 (SE = 0.01) for men (Fig. 2) for deaths from any cause. The risk model also accurately predicted the risk of death across deciles (Fig. 3) among men (Χ2 = 12.3, p = .196). Risk was slightly overestimated in the highest risk decile among women (Χ2 = 22.8, p = .002). These results indicate that the risk model is sufficiently accurate for use as a predictor of mortality risk.
What matters most: Impact of specific risk factors on US and individual mortality risk
Table 2 shows the estimated effect on life expectancy of shifting risk exposures to their optimum distribution (i.e., no excess risk) within the NHANES 2003–2010 cohort, a proxy for the US population. The combined impact of the 12 risk factors is substantial; life expectancy would increase by more than nine years among men and more than eight years among women. For the US adult population as a whole, the model indicates that tobacco smoking, high blood pressure, and excess body weight are the modifiable risk factors with the largest effect on current adult life expectancy. The model also enables different risks to be identified, quantified, and compared for counseling individual patients.
Identifying high-risk patients: Distribution of risk in a population
The avoidable risk of death is heavily concentrated in a relatively small fraction of the population, particularly at younger ages (Fig. 4). Among males aged 30 to 44 years, 5.1 % (95 % CI 4.1–6.0 %) of males account for 25 % of the avoidable risk of death in that age group. This is similar in females aged 30 to 44 years, with 5.9 % (95 % CI 4.8–6.9 %) of females accounting for 25 % of the avoidable risk of death. In general, as the population ages, this fraction tends to increase. For example, among males aged 70 to 79, 13.9 % (95 % CI 11.1–16.6 %) of males account for 25 % of avoidable mortality risk in that age group, and among females aged 70 to 79 years, 11.7 % (95 % CI 9.2–14.1 %) of females account for 25 % of avoidable mortality risk in that age group.
While the need for accurate and actionable information on modifiable health risks is well recognized, the risk models currently available have important limitations, largely because they either focus on a specific, important cause of death  or because they are based on proprietary algorithms that may not be based on the most up-to-date evidence and have not been validated in the general US population. Individuals, clinicians, and others therefore currently lack a broadly available, evidence-based tool that would support more accurate identification of risks, assessment and comparison of the potential impact for individuals of different risk-modification strategies, or identification of high-risk subgroups of the population. The risk model described and validated here attempts to overcome these limitations.
We developed a new risk model based on 12 major behavioral and biometric risks to health that predict an individual’s probability of dying over the next 10 years compared to a population with an optimum distribution of risk factors. The model is based on current scientific evidence on risks to health and contemporaneous, high-quality data on the distribution of risk exposures and mortality rates in the US population. The model has excellent discrimination in a sample of the US population . The excellent and widely used Framingham Index [24–26] typically has areas under the curve of between 0.75 and 0.80 for predicting cardiovascular endpoints only . This newly developed model has areas under the curve of greater than 0.80 for both men and women for the more difficult task of predicting mortality from any cause and is constructed using the best available evidence on the major modifiable causes of mortality.
The risk model offers promise as a tool to support individual decision-making. A recent systematic review demonstrates the benefits of providing cardiovascular risk information to individuals for discussion with their families and physicians . The results suggest that if individuals understand the magnitude of their risk, they are more likely to adopt or maintain healthy behaviors. The review also found that information on overall risk is likely to have a greater impact when it is paired directly with education or counseling. As illustrated in Additional file 3, this risk model can facilitate counseling and decision-making by providing a systematic way for patients and clinicians to compare how different behavioral or clinical interventions would likely influence risk. For example, the importance of smoking cessation for most smokers becomes obvious. The risk model can show patients how they might avoid medication use by increasing physical activity or reducing weight, or conversely, the lost benefits of failing to take prescribed medications. The risk model also takes into account the effect of behavioral modification on other biological risk factors included in the risk score, e.g., the effect of reductions in body weight on blood pressure. Presenting accurate, holistic, and balanced information about the risks that patients face, in conjunction with counseling about the importance of different options for change, could help align decision-making with patients’ preferences – an important national aim . Implementing the model in a way that provides an attractive and accessible tool for the general population, e.g., through Internet or phone-based applications, would also provide a way for individuals to self-assess their risk and encourage contacts with healthcare providers to decrease risk.
Broad adoption of the risk model could offer other important benefits. First, clinical practices, health systems, and workplace health programs that obtain completed surveys from their populations can accurately stratify people according to their level of risk and develop epidemiologically informed programs to reduce risks. In the highest-risk subpopulation, individualized multiple-risk-factor interventions, such as case management and health coaching, could be a wise investment . Second, the measure could contribute to clinical and public health research by providing a validated composite endpoint for clinical trials of multiple-risk-factor intervention programs. Third, the measure offers a potential improvement over current quality indicators that focus largely on intermediate outcomes (e.g., levels of blood sugar, blood pressure, and cholesterol) that have little intuitive meaning to patients and encourage well-recognized hazards among providers. For example, if the proportion of diabetics with well-controlled HbA1c is used as a quality measure, some may be encouraged to label those with pre-diabetes as diabetic (inflating the denominator), and treatment efforts may be focused on those with mild disease, in whom achieving target levels is easier but less important. Performance measured on the basis of improvement in predicted risk could give greater credit to meaningful progress for those at greatest risk, even if specified targets were not achieved. In addition, the risk measure would in all likelihood be more parsimonious (i.e., one broad measure of risk status versus many narrow measures) . Finally, if broadly adopted within specific geographic regions and mapped in ways that preserve confidentiality, the measure could provide a basis for collaboration among providers and community stakeholders on initiatives to improve population health.
At the same time, the risk model has important limitations. We made judgments about which risks to include. For example, we did not include depression, believing that screening and intervention (especially for those at risk of suicide) has a different and more pressing time horizon, and we judged trans fats to be akin to an environmental risk that is difficult for an individual to control or modify. The risk model is based on the average American, and, although differential risk exposure explains a large fraction of geographic, racial, and socioeconomic factors, there are likely to be residual differences in the underlying risk of death between these groups. Although the proportional hazards of risk exposure are largely generalizable across different populations,  several risk factors are based on self-report, and these responses may not be accurate or comparable across populations. While all included risks have strong evidence as predictors of mortality, the strength of evidence for risk factor modification on reduction in mortality differs across risks, with a greater body of evidence available for risk factors such as high blood pressure. Related to this, some of the evidence for these risk factors is based on observational studies, which are prone to potential confounding. Although we split the development and validation datasets, our use of the same broad data source (NHANES) for the analysis may have caused us to overestimate the predictive validity. The risk score appears to be mainly driven by cardio-metabolic factors and has not yet been validated either for specific clinical populations (e.g., cancer, arthritis, dementia) or for different racial, ethnic, or socioeconomic groups. The risk score also does not include a prediction of morbidity, which is an important consideration particularly among older-aged individuals, nor does it presently include a quantification of the potential adverse effects, for example of medication, to reduce exposure to risk factors. It will be important for patients and clinicians to understand these limitations, both about the relative magnitude of specific risks and the potential benefits of risk factor modification and to conduct further validation studies and to update the model as new evidence accumulates. Finally, while the focus of the use of the risk score presented in this paper is on individual-level modification, this should also be balanced against population-wide approaches for reducing risk exposure.
Some of these limitations can be addressed or mitigated by implementing the risk model in diverse practices and populations and by linking respondents’ original and subsequent risk factor scores to data on survival. This would facilitate continued improvement and validation of the model. This will be increasingly possible as electronic health record-enabled environments are adopted for large patient populations .
The need to balance downstream treatment of diseases with upstream prevention is well recognized [6, 17, 29]. Legislative and other upstream actions are now a cornerstone of risk reduction and in some countries they play a key role in reducing non-communicable diseases. Patients, clinicians, and employers are faced with a wide array of commercial health risk appraisal tools that aim to catalyze prevention and health promotion. Incorporating a standardized, validated, freely available, transparent, and continuously refined method to measure, summarize, and track the most important modifiable risks of death within these tools would offer benefits to patients, clinicians, and policymakers.
Committee on Quality of Health Care in America - Institute of Medicine. Crossing the Quality Chasm. A New Health System for the 21st Century. Washington, DC: National Academies Press; 2001.
Danaei G, Ding EL, Mozaffarian D, Taylor B, Rehm J, Murray CJ, et al. The preventable causes of death in the United States: comparative risk assessment of dietary, lifestyle, and metabolic risk factors. PLoS Med. 2009;6(4):e1000058. PubMed Pubmed Central PMCID: PMC2667673. Epub 2009/04/29. eng.
Danaei G, Rimm EB, Oza S, Kulkarni SC, Murray CJ, Ezzati M. The promise of prevention: the effects of four preventable risk factors on national life expectancy and life expectancy disparities by race and county in the United States. PLoS Med. 2010;7(3):e1000248. PubMed Pubmed Central PMCID: PMC2843596. Epub 2010/03/31. eng.
Thorpe KE, Florence CS, Howard DH, Joski P. The impact of obesity on rising medical spending. Health Affairs (Project Hope). 2004;Suppl Web Exclusives:W4–480-6. PubMed Epub 2004/10/22. eng.
Anderson KM, Odell PM, Wilson PW, Kannel WB. Cardiovascular disease risk profiles. Am Heart J. 1991;121(1 Pt 2):293–8. PubMed Epub 1991/01/01. eng.
Fineberg HV. The paradox of disease prevention: celebrated in principle, resisted in practice. JAMA. 2013;310(1):85–90. PubMed Epub 2013/07/04. eng.
Real Age I. Real Age [cited 2013 July 16]. Available from http://www.sharecare.com/static/realage-test.
Yourman LC, Lee SJ, Schonberg MA, Widera EW, Smith AK. Prognostic indices for older adults: a systematic review. JAMA. 2012;307(2):182–92. PubMed Epub 2012/01/12. eng.
Lim SS, Vos T, Flaxman AD, Danaei G, Shibuya K, Adair-Rohani H, et al. A comparative risk assessment of burden of disease and injury attributable to 67 risk factors and risk factor clusters in 21 regions, 1990–2010: a systematic analysis for the Global Burden of Disease Study 2010. Lancet. 2012;380(9859):2224–60. PubMed Epub 2012/12/19. eng.
Murray CJ, Lopez AD. Alternative projections of mortality and disability by cause 1990–2020: Global Burden of Disease Study. Lancet. 1997;349(9064):1498–504. PubMed Epub 1997/05/24. eng.
Akhoundi FH, Ghorbani A, Soltani A, Meysamie A. Favorable functional outcomes in acute ischemic stroke patients with subclinical hypothyroidism. Neurology. 2011;77(4):349–54. PubMed Epub 2011/07/01. eng.
Fillinger M. Who should we operate on and how do we decide: predicting rupture and survival in patients with aortic aneurysm. Semin Vasc Surg. 2007;20(2):121–7. PubMed Epub 2007/06/21. eng.
D’Agostino RB, Russell MW, Huse DM, Ellison RC, Silbershatz H, Wilson PW, et al. Primary and subsequent coronary risk appraisal: new results from the Framingham study. Am Heart J. 2000;139(2 Pt 1):272–81. PubMed Epub 2000/01/29. eng.
Goel A, Pinckney RG, Littenberg B. APACHE II predicts long-term survival in COPD patients admitted to a general medical ward. J Gen Intern Med. 2003;18(10):824–30. PubMed Pubmed Central PMCID: PMC1494923. Epub 2003/10/03. eng.
Schoonhoven L, Grobbee DE, Donders AR, Algra A, Grypdonck MH, Bousema MT, et al. Prediction of pressure ulcer development in hospitalized patients: a tool for risk assessment. Qual Saf Health Care. 2006;15(1):65–70. PubMed Pubmed Central PMCID: PMC2563999. Epub 2006/02/04. eng.
Loeppke R, Taitel M, Haufle V, Parry T, Kessler RC, Jinnett K. Health and productivity as a business strategy: a multiemployer study. J Occup Environ Med. 2009;51(4):411–28. PubMed Epub 2009/04/03. eng.
O’Donnell MP. Health promotion in the workplace: CengageBrain.com; 2001.
Preston SH, Heuveline P, Guillot M. Demography: Measuring and modeling population processes. Pop Dev Rev. 2001;27:365.
Coale A, Guo G. Revised regional model life tables at very low levels of mortality. Population Index. 1989 Winter;55(4):613-43. PubMed Epub 1989/01/01. eng.
Ezzati M, Hoorn SV, Rodgers A, Lopez AD, Mathers CD, Murray CJ. Estimates of global and regional potential health gains from reducing multiple major risk factors. Lancet. 2003;362(9380):271–80. PubMed Epub 2003/08/02. eng.
Lozano R, Naghavi M, Foreman K, Lim S, Shibuya K, Aboyans V, et al. Global and regional mortality from 235 causes of death for 20 age groups in 1990 and 2010: a systematic analysis for the Global Burden of Disease Study 2010. Lancet. 2012;380(9859):2095–128. PubMed Epub 2012/12/19. eng.
Naghavi M, Makela S, Foreman K, O’Brien J, Pourmalek F, Lozano R. Algorithms for enhancing public health utility of national causes-of-death data. Popul Health Metrics. 2010;8:9. PubMed Pubmed Central PMCID: PMC2873308. Epub 2010/05/13. eng.
Hosmer DW, Lemeshow S. Applied logistic regression: Wiley-Interscience. 2000.
Framingham Heart Study. Framingham Heart Study Bibliography 1960s [cited 2013 July 16]. Available from: (http://www.framinghamheartstudy.org/.
Kannel WB, Dawber TR, Kagan A, Revotskie N, Stokes 3rd J. Factors of risk in the development of coronary heart disease--six year follow-up experience. The Framingham Study. Ann Intern Med. 1961;55:33–50. PubMed Epub 1961/07/01. eng.
Truett J, Cornfield J, Kannel W. A multivariate analysis of the risk of coronary heart disease in Framingham. J Chronic Dis. 1967;20(7):511–24. PubMed Epub 1967/07/01. eng.
Lloyd-Jones DM. Cardiovascular risk prediction: basic concepts, current status, and future directions. Circulation. 2010;121(15):1768–77. PubMed Epub 2010/04/21. eng.
Agency for Healthcare Research and Quality, National Quality Strategy. Working For Quality [cited 2013 July 16]. Available from: http://www.ahrq.gov/.
Ma J, Berra K, Haskell WL, Klieman L, Hyde S, Smith MW, et al. Case management to reduce risk of cardiovascular disease in a county health care system. Arch Intern Med. 2009;169(21):1988–95. PubMed Pubmed Central PMCID: PMC3000904. Epub 2009/11/26. eng.
Meyer GS, Nelson EC, Pryor DB, James B, Swensen SJ, Kaplan GS, et al. More quality measures versus measuring what matters: a call for balance and parsimony. BMJ Qual Saf. 2012;21(11):964–8. PubMed Pubmed Central PMCID: PMC3594932. Epub 2012/08/16. eng.
Yusuf S, Hawken S, Ounpuu S, Dans T, Avezum A, Lanas F, et al. Effect of potentially modifiable risk factors associated with myocardial infarction in 52 countries (the INTERHEART study): case–control study. Lancet. 2004;364(9438):937–52. PubMed Epub 2004/09/15. eng.
Blumenthal D, Tavenner M. The “meaningful use” regulation for electronic health records. N Engl J Med. 2010;363(6):501–4. PubMed Epub 2010/07/22. eng.
We thank Kelsey Pierce for programmatic assistance and Goodarz Danaei, Majid Ezzati, and Andrew Kartunen for technical insights.
The authors declare that they have no competing interests.
ESF, CJLM, ECN and SSL conceived of the study and designed the risk index. EC performed the data analysis. AHM and CG contributed to the design of the risk index. SSL wrote the first draft of the paper. All authors contributed to revisions of the paper.
Web Table 1. Characteristics of the validation cohort (NHANES 1988–1994 and 1999–2004) (DOCX 21 kb)
Web Appendix B: Technical information on risk score development and derivation. (DOCX 49 kb)
Web Appendix C: Example of risk calculator applied to a particular person: male age 35, smoker, uncontrolled blood pressure, limited physical activity. (DOCX 246 kb)