Estimating summary measures of health: a structured workbook approach
© Flanagan et al; licensee BioMed Central Ltd. 2005
Received: 20 September 2004
Accepted: 11 May 2005
Published: 11 May 2005
Summary measures of health that combine mortality and morbidity into a single indicator are being estimated in the Canadian context for approximately 200 diseases and conditions. To manage the large amount of data and calculations for this many diseases, we have developed a structured workbook system with easy to use tools. We expect this system will be attractive to researchers from other countries or regions of Canada who are interested in estimating the health-adjusted life years (HALYs) lost to premature mortality and year-equivalents lost to reduced functioning, as well as population attributable fractions (PAFs) associated with risk factors. This paper describes the workbook system using cancers as an example, and includes the entire system as a free, downloadable package.
The workbook system was developed in Excel and runs on a personal computer. It is a database system that stores data on population structure, mortality, incidence, distributions of cases entering a multitude of health states, durations of time spent in health states, preference scores that weight for severity, life table estimates of life expectancies, and risk factor prevalence and relative risks. The tools are Excel files with embedded macro programs. The main tool generates workbooks that estimate HALY, one per disease, by copying data from the database into a pre-defined template. Other tools summarize the HALY results across diseases for easy analysis.
The downloadable zip file contains the database files initialized with Canadian data for cancers, the tools, templates and workbooks that estimate PAF and a user guide. The workbooks that estimate HALY are generated from the system at a rate of approximately one minute per disease. The resulting workbooks are self-contained and can be used directly to explore the details of a particular disease. Results can be discounted at different rates through simple parameter modification.
The structured workbook approach offers researchers an efficient, easy to use, and easy to understand set of tools for estimating HALY and PAF summary measures for their country or region of interest.
Over the past century, advances in public health and population health have dramatically increased life expectancy. Canadians now live longer, but during these added years, they may be affected by disease or chronic conditions. For this reason, indicators used to monitor changes in population health and guide policy decisions need to include how health conditions affect the day-to-day functioning of Canadians over their lifetime.
Summary measures of health that include both mortality and morbidity are being estimated for Canada . Building on prior burden of disease studies by the World Health Organization  and Australia  that estimated disability-adjusted life years (DALY), the Canadian study will estimate the health-adjusted life years (HALY) lost to premature mortality and reduced functioning for approximately 200 diseases. HALY is computationally identical to DALY; however, it reflects a shift in terminology away from disability towards the broader term health, following recommendations originating from the International Network on Health Expectancy .
Estimating summary measures of health requires a wide variety of data including: population counts; incidence and mortality rates; life expectancies; cause-specific and observed survival; distributions, durations, and preference scores across a multitude of health states; and risk factor data to estimate population attributable fractions (PAF). Disaggregating by age group and sex further explodes the quantity of data. To efficiently manage such a large amount of information, we developed a database system, with a set of easy-to-use tools to automatically generate the summary measure estimates.
The main tool generates workbooks, one per disease, that calculate HALY, by importing the data from the database into a generic template. This makes it easy to update the database and quickly regenerate the results. The template is highly structured, which makes the generated workbooks easy to understand and use. There are also tools that summarize the HALY results across diseases for easy analysis. Parameters, such as the rate at which to discount future events, the population of study, and the reference life table, can be specified in the tools to evaluate different scenarios. Furthermore, the generated workbooks are self-contained and can be used directly for specific analysis of a disease. Finally, the tools were built generically to incorporate any number of diseases. Overall, we expect this workbook system will be attractive to other researchers, since it streamlines the process of estimating HALY and PAF, and at the same time provides an organized framework to document the work.
Cancer sites by ICD-9 code
Gall bladder cancer
Bone and connective tissue cancer
Non-melanoma skin cancer
All sites between 140–208 not listed above
All other cancers
The tools (identified by ovals in Figure 1) contain imbedded Visual Basic macros that perform three principal functions: generate workbooks to estimate HALY for each of the 26 cancer sites; summarize the HALY results across cancer sites for comparative analysis; and extract HALY results to be attributed to risk factors using population attributable fractions. The tools are called "Builder", "Summary" and "Extract", respectively. Two additional tools discussed below but not shown in Figure 1, are the "Master" and "UpdatePAF" tools. Each tool contains a command button that launches the macro and each has a set of options to control the macro's actions.
Each tool uses a template. A template is simply a pre-defined structure that contains formulae and place-holders for data. For instance, the template for the HALY workbooks contains formulae and formatting to calculate the HALY, but it does not contain data. The Builder tool copies the data from the database into the template for a selected disease.
Some of the data in the generated workbooks are linked to the source files (shown as dashed arrow lines in Figure 1) using a feature of Excel called "links". This means that the data are stored externally to the workbook, but are shown and used in the workbook. The advantage of this approach is that it allows users to quickly change the source of data to easily update the workbook. For instance, the workbooks that estimate HALY can link to any one of the three reference life tables included (or users can create their own life table) to automatically update results. There is complete flexibility in the tool to choose which files to maintain as links. By default, only the population and life table database files are maintained as links. The rest of the data are simply copied from the database to minimize complexity.
The database is a collection of 17 files organized by the type of data and include: mortality rates; incidence rates; population counts; life expectancy estimates; stage distributions; observed and cause-specific survival; case-fatality estimates; duration and distribution of common cancer health states (diagnosis, treatment, remission, palliative and terminal care); preference scores used to weight for the severity of each health state; utilities that describe the starting health state of the population; risk factor prevalence and relative risk of disease from risk exposure. In addition, three sets of life expectancies have been included in the database: a Canadian multi-cohort life table (2001), a Canadian period life table (1995–1997) and a model life table used by the World Health Organization . The workbooks that estimate HALY link to the multi-cohort life table by default. In general, the database files have been structured by age group, sex, disease, stage, and health state.
To illustrate the workbook system, the database has been populated with Canadian data (or data representative of Canada) for 26 cancer sites and five related risk factors. Cancers were classified by ICD-9 code because of data availability at the time of study. Updating to ICD-10 would not require any change in the structure of the workbook system, only in the data entered into it. The data sources are identified in each of the database files and repeated in the generated workbooks. It is beyond the scope of this paper to discuss the methods used to arrive at this set of data (details available upon request from the authors).
Workbooks to estimate HALY
The workbooks contain imbedded formulae for calculating the summary measures. HALY is a summary measure that includes both the impact of mortality and morbidity in a single indicator. The mortality component measures the years of life lost due to premature mortality (YLL); the morbidity component quantifies the year equivalents of reduced functioning from living with the disease (YERF). YERF is analogous to years of life lived with disability (YLD) used by the World Health Organization and their collaborators in their burden of disease study; thus, HALY = YLL+YERF is synonymous with DALY = YLL+YLD.
For each cancer site, the HALY, YLL and YERF are estimated by age group and sex according to the following formulae:
HALYa,s = YLLa,s + YERFa,s (Eq1)
YLLa,s = Ma,s * La,s (Eq2)
YERFa,s = Σg Σe [la,s,g,e* Da,s,g,e* Wg,e] (Eq3)
where a represents the age group, s represents the sex, g represents the stage at diagnosis, e represents the state of progression of the cancer.
The YLLs are calculated from the number of cases that die from the cancer (M) and the estimated years of remaining life at the age of death (L). The latter is estimated from survival in the general population, by age and sex, and comes from the life table. The death counts are calculated from the mortality rates and the population counts.
The YERFs are calculated by health state and stage at diagnosis. They are estimated from the number of cases entering the health state (I), the duration in the state (D) and the weight for severity of the health state (W). The number of cases entering the health state is derived from the cancer incidence rates, the population counts, the stage distribution and the estimated proportion that experience the health state. For example, the number of women aged 50–54 that receive radiotherapy for cure of localized breast cancer is the product of the number of women in this age group (1,060,244 in Canada in 2001), the incidence rate (229 per 100,000), the estimated proportion that are diagnosed with localized disease (63.6%) and the proportion of these that receive radiotherapy for cure (43.0%), which amounts to 663 cases (these numbers can be found in the breast cancer workbook). The duration of the health state is a direct input parameter, except for the remission/on-going care states, which are calculated as the residual of the overall survival duration less the duration spent in the diagnostic, treatment and palliative/terminal phases.
The weight for severity of the health state is expressed in terms of preference scores (u), as W = 1-u. This assumes full health prior to entering the health state. However, the workbooks allow the population to start in partial health (u1) and persist co-morbidly with the cancer state (u2). The co-morbidity rule for combining preference scores, u1 and u2, of two conditions, was defined as:
u1,2 = (1- k) * minimum (u1, u2) + k * (u1 * u2)
The value of the comorbidity coefficient k was estimated at 0.34, based on a best-fit analysis of Health Utility Index  scores for conditions reported in the Canadian Community Health Survey, 2000–01 (CCHS)  (details available from the authors). Since we are interested in the reduced functioning relative to the initial health state (u1), the weight for severity of the cancer health state is given by W = u1 - u1,2.
The workbooks include a parameter for discounting the durations of health states that occur at some time T after diagnosis. When a discount rate, r > 0, is specified, the YLL and YERF are estimated according to the modified functional forms:
YLLa,s = Ma,s * (1-e-rLa,s)/r (Eq4)
YERFa,s = Σg Σe [la,s,g,e* (1-e-rDa,s,g,e)*e-rTa,s,g,e]/r * Wg,e] (Eq5)
Although the timing and order of treatment, which determines the value of T, may vary from case to case in practice, we assume that treatments occur separately in time and in the following order: diagnosis; surgery; chemotherapy or hormonal therapy; radiotherapy; remission; palliative care; terminal care; and death. The palliative and terminal phases only apply to cases dying of the cancer. The duration preceding them is estimated from the cause-specific survival duration.
For ease of use, all data elements and parameters that can be modified in the workbooks are identified as green-filled cells. Blue-filled cells are used to highlight labels and violet-filled cells highlight the summary measures.
Workbooks to estimate PAF
The population attributable fraction (PAF) is an estimate of the proportion of disease in the general population that is due to a particular risk factor. For the study of cancers, workbooks have been developed to estimate the population attributable fraction for five risk factors: alcohol, obesity, lack of fruit and vegetable consumption, physical inactivity, and smoking.
Given the lag time between the exposure to tobacco and the incidence of cancer, and given that the prevalence of smoking has been declining, using current prevalence of smoking will likely produce an underestimation of the population attributable fraction of smoking. In order to quantify this potential bias, we developed three workbooks to estimate the impact of smoking: the first uses current (2001) prevalence of smoking, a second uses prevalence reported in 1991 and the third is an indirect method developed by Peto and Lopez.
For a given risk factor, the PAF is estimated by age group (a), sex (s) and cancer (c) according to the formula:
PAFa,s,c = Σi [ Pea,s,i * (RRa,s,i,c -1) / (1 + Pea,s,i * (RRa,s,i,c -1)) ] (Eq6)
where Pe is the proportion of the population exposed to the risk factor, RR is the relative risk of developing or dying of cancer due to the exposure, and index i represents the risk category. For instance, the risk categories for obesity are underweight, normal weight, overweight and obese (base on BMI values).
To obtain a more global view of the impact of a risk factor, we produced summary estimates showing the proportion of the total number of cancer deaths, HALY, YLL and YERF attributable to each risk factor by applying the PAF estimates of equation 6 to each of these outcomes. For instance, the impact on deaths for a particular risk factor is given by the formula:
PAFDeathss = [ Σc Σa PAFa,s,c *DEATHSa,s,c] / [Σc Σa DEATHSa,s,c ] (Eq7)
The outcomes are first extracted from the HALY workbooks for a specific discount rate, life table and population choice. They are stored in a separate file and maintained as a link to each of the PAF workbooks. This allows the summary PAF estimates to be easily updated for different parameter choices.
The tools, database files, templates and workbooks that estimate PAF are all available for download in this article's companion zip file. The workbooks to estimate the HALY need to be generated from the Builder tool after download. After the HALY workbooks have been built for all cancers, the Summary and Extract tools can be used to summarize the HALY results for specific parameter choices, and UpdatePAF tool can be used to update the file links in the PAF workbooks. A higher level tool, the Master tool, has been included to automatically execute these four tasks with the push of one button.
The database is currently populated with cancer data for Canada to illustrate usage, but can be easily adapted for other diseases and updated with data for other countries or regions. To update the database, simply open the database file(s) in Excel and replace the data using standard editing techniques. When adding other diseases, the structure of the database files may be changed to accommodate the number and naming of stages and health states (refer to the user guide for more details).
Here is a brief description of each of the components of the system. More details can be found in the user guide.
The workbooks that estimate population attributable fractions link to the file generated by the Extract tool. The UpdatePAF tool was created to facilitate the update of this link across all eight of these workbooks. The name of the file to be linked is specified in the tool. As with the other tools, UpdatePAF is run automatically by the Master tool.
Workbooks to estimate HALY
The Instructions worksheet offers basic guidance on using the workbooks. It identifies the cancer by name and ICD-9 code. The choice of the reference population and the discount rate are specified in this sheet and applied in all subsequent worksheets. The population counts are displayed in the HALY sheet.
The Algorithm worksheet contains the distribution of treatment and remission associated with the cancer, the preference scores for each of the cancer's health states, and utilities that describe the starting health state of the population. It also contains the comorbidity coefficient as a parameter that can be changed. Changing the values here automatically updates the YERF estimates.
The HALY worksheet calculates the health-adjusted life years lost as the sum of the YLL and YERF values. The population counts, chosen in the Instructions sheet, are displayed here. The mortality rates are input to the YLL worksheet. Population counts and life expectancy estimates are linked data elements used in the calculation of the YLL.
The YERF worksheet calculates the total YERF values by summing across stages. The incidence rates for the cancer are found in this sheet. They are combined with the population counts to generate incidence counts, which are then distributed by stage.
The YERF local, YERF regional, and YERF distant worksheets calculate the year-equivalents of reduced functioning by health state for each stage, respectively. The stage distribution and the various durations (cause-specific survival, observed survival, duration of treatments, and duration of palliative and terminal care) can be modified. The distribution of treatment and the preference scores are more easily modified in the Algorithm sheet.
The Sources worksheet lists every data element used in the workbook, its source and in which worksheet it is found.
The Notes worksheet highlights anything exceptional or noteworthy about the cancer.
Workbooks to estimate PAF
The workbook that calculates the PAF associated with smoking by the indirect method includes two additional worksheets with data on the number of lung cancer deaths in a reference population (American Cancer Society, CPS-II, 1984–1988) and in Canada. They are used to estimate the hypothetical proportion that would have to have been exposed to smoking to account for the lung cancer mortality observed in 2001.
HALY Summary Workbook
PAF Summary Workbook
The structured approach of the workbook system provides researchers and policy makers with an easy to use and easy to understand tool for estimating HALY and PAF summary measures. Developed for use on personal computers using Excel, it is widely accessible to all levels of researchers. The database can easily be updated with data for other regions or countries and the entire set of results quickly regenerated. Parameter choices in the tools and in the resulting workbooks offer great flexibility to create alternative scenarios. Counterfactuals, used to evaluate the impact of basic health policy interventions, can be created by modifying any of the data elements. For instance, the population attributable fraction associated with obesity could be re-evaluated by reducing the prevalence of obese individuals by an amount that might be achieved by an intervention strategy.
The workbook system has a number of limitations. First, we have not causally linked disease incidence to mortality. Instead, the mortality in 2001 is taken as a proxy for the mortality that would result from the incidence observed in 2001. However, a system that linked incidence to mortality would be more realistic, especially when looking at interventions that reduce incidence. Similarly, survival is not causally linked to treatment in the workbook model. Scenarios that alter treatment patterns would not impact survival time.
A second limitation is that the model of cancer progression does not include the treatment of local or distant recurrence. This means we have not incorporated the weight for severity of these conditions, which would occur during the period labeled remission/on-going care. The preference scores associated with distant cancer are lower than for other stages, so we would expect some underestimation of the HALY by the omission of distant recurrences. This is not expected to have much impact on the ranking of cancers.
Third, the workbook model assumes that cancer treatment follows a fixed sequential order: surgery, chemotherapy, radiotherapy. While the implications for individuals may be extremely important, from a population perspective, and more practically, from the perspective of data availability and model complexity, simplifying assumptions are required. Since the durations of these treatments are relatively short compared to the observed survival of cancer patients, we would expect them to have little impact on the overall outcomes and we would expect their order of occurrence to have even less impact. As a crude sensitivity analysis, we changed the order of occurrence of chemotherapy and radiotherapy in the workbook system, which led to a negligible impact on the estimate of morbidity: overall YERF changed by 0.00025%.
Fourth, clustering of risk factors cannot be easily modeled in the workbooks. This means that the proportion of cancer attributable to one risk factor may also be attributed to another risk factor, even though the two risk factors collectively contribute to the cancer. For instance, alcohol and smoking may be clustered risk factors with respect to death from laryngeal cancer, which may explain why our estimate of the total proportion of laryngeal cancer deaths exceeds 100%. In general, we expect we have overestimated the population attributable fractions across all cancers.
Finally, the data have been obtained at levels of disaggregation to represent the heterogeneity of the cancer population. However, in the case of the calculated duration of remission, it has not been sufficient to avoid logical inconsistencies. We found that in older age groups, it was possible to generate negative durations of remission, because the observed survival of people in this age group was less than the duration spent in diagnostic, treatment and terminal phases. The input data could be refined to avoid this, but as a rare occurrence with small impact, we opted to check for negative durations and set them to zero when they occur. This is done automatically by the imbedded formulae and requires no intervention by the user.
These limitations can be overcome through more advanced modeling techniques, such as microsimulation modeling. The Population Health Model, a continuous time, competing risk microsimulation model developed at Statistics Canada, is being adapted to implement all of the functionality of these workbooks. Our experience with both types of modeling suggests that it is not necessarily more difficult to develop the microsimulation model, although it is often considered less transparent. The benefit of developing both models is that the workbooks provide a benchmark against which the microsimulation results can be compared. Of course, we expect small differences, not only from random noise introduced by the stochastic nature of the microsimulation model, but also because it avoids the limitations outlined above.
The workbook system presented here focused on cancers. However, it was developed more generally, so that it can produce workbooks for other chronic diseases or injuries, once the data have been assembled. The main criterion is that the disease(s) can be decomposed into a series of health states from diagnosis to death. The number of health states is virtually unlimited and can include disease progression and sequelae, and the diseases can be staged at diagnosis or not. By changing a few labels in the database files to reflect health state names, an entire new system of HALY workbooks can be generated (refer to user guide for detailed steps). The underlying macro code has been built with this generalizability in mind.
As with any generalized system, exceptions may arise that do not fit within its structured framework. As more diseases are studied in the Canadian study, the workbook structure will be modified or expanded to address any exceptions that arise. This may take the form of minor modifications to the current structure, a separate structure to accommodate multiple disease exceptions that fall into a common framework, or a series of ad-hoc, stand-alone workbooks for unique exceptions. We expect that most diseases will fit within the structure described here. Future releases of the workbook system will be made through the Public Health Agency of Canada's website.
The structured workbook approach offers researchers an efficient, easy to use, and easy to understand set of tools for estimating HALY and PAF summary measures for their country or region of interest. The estimation of summary measures for cancers presented here highlights the functionality of the system; however, the tool is easily expanded to other diseases. The workbooks are transparent in their calculations, but are limited in their ability to model the impact of clustered risk factors and competing risks of disease. These limitations can be overcome by more advanced modeling techniques such as microsimulation.
This work is part of the Population Health Impact of Disease in Canada (PHI) research program, a collaboration of Statistics Canada, Public Health Agency of Canada, and researchers from McGill University, the University of Ottawa, the University of Manitoba, the Institute for Clinical Evaluative Sciences (ICES) and the Montérégie Regional Board of Health and Social Services. The PHI is funded by Statistics Canada and Public Health Agency of Canada. The authors acknowledge Kathy White for her editorial input to this manuscript, Hélène Roberge and Serge Tanguay for their contribution to the assembly of the data, and Dr Bill Evans for review of the cancer model and for providing the cancer treatment algorithm.
- Population Health Impact of Disease in Canada[http://www.phac-aspc.gc.ca/phi-isp/index.html]
- Murray CJ, Lopez AD: Global health statistics. Global Burden of Disease and Injury Series. Volume 2. Harvard: Harvard School of Public Health; 1996.Google Scholar
- Mathers C, Vos T, Stevenson C: The burden of disease and injury in Australia. AIHW cat. no. PHE 17. Canberra: AIHW; 1999.Google Scholar
- Mathers CD, Robine J-M, Wilkins R: Health expectancy indicators: recommendations for terminology. In Advances in health expectancies. Edited by: Mathers C, McCallum J, Robine J-M. Canberra: Australian Institute of Health and Welfare; 1994:34-41.Google Scholar
- Coale A, Guo G: Revised regional model life tables at very low levels of mortality. Popul Index Winter 1989,55(4):613-43.View ArticleGoogle Scholar
- Furlong WJ, Feeny DH, Torrance GW, Barr RD: The Health Utilities Index (HUI) System for assessing health-related quality of life in clinical studies. Ann Med 2001,33(5):375-84.View ArticlePubMedGoogle Scholar
- Health Statistics Division, Statistics Canada: Canadian Community Health Survey 2000–01. Statistics Canada survey 3226 Google Scholar
- Barendregt JJ, Bonneux L, Van der Maas PJ: DALYs: the age-weights on balance. Bull World Health Organ 1996,74(4):439-43.PubMedPubMed CentralGoogle Scholar
- Barendregt JJ: Disability-adjusted life years (DALYs) and disability-adjusted life expectancy (DALE). In Determining life expectancies. Edited by: Robine JM, Jagger C, Mathers D. Chichester (UK): Wiley; 2003:247-261.Google Scholar
- Peto R, Lopez AD, Boreham J, Thun M, Heath C Jr: Mortality from tobacco in developed countries: indirect estimation from national vital statistics. Lancet 1992, 339: 1268-78. 10.1016/0140-6736(92)91600-DView ArticlePubMedGoogle Scholar
- Jekel JF, Katz DL, Elmore JG: Epidemiology, Biostatistics, and Preventive Medicine. second edition. Philadelphia (PA): WB Saunders Company; 2001.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.