- Research
- Open access
- Published:
Quantifying temporal trends of age-standardized rates with odds
Population Health Metrics volume 16, Article number: 18 (2018)
Abstract
Background
To quantify temporal trends in age-standardized rates of disease, the convention is to fit a linear regression model to log-transformed rates because the slope term provides the estimated annual percentage change. However, such log-transformation is not always appropriate.
Methods
We propose an alternative method using the rank-ordered logit (ROL) model that is indifferent to log-transformation. This method quantifies the temporal trend using odds, a quantity commonly used in epidemiology, and the log-odds corresponds to the scaled slope parameter estimate from linear regression. The ROL method can be implemented by using the commands for proportional hazards regression in any standard statistical package. We apply the ROL method to estimate temporal trends in age-standardized cancer rates worldwide using the cancer incidence data from the Cancer Incidence in Five Continents plus (CI5plus) database for the period 1953 to 2007 and compare the estimates to their scaled counterparts obtained from linear regression with and without log-transformation.
Results
We found a strong concordance in the direction and significance of the temporal trends in cancer incidence estimated by all three approaches, and illustrated how the estimate from the ROL model provides a measure that is comparable to a scaled slope parameter estimated from linear regression.
Conclusions
Our method offers an alternative approach for quantifying temporal trends in incidence or mortality rates in a population that is invariant to transformation, and whose estimate of trend agrees with the scaled slope from a linear regression model.
Background
Monitoring incidence and mortality rates in a population allows stakeholders in health care to track the burden of the disease. Changes in the population rates over time can help to assess the effectiveness of interventions in public health or health care and also inform projections for future health services. Recent years have seen extensive work in analyses of the global burden of disease, with published estimates of global, regional, and national incidence and prevalence rates of several hundred diseases for a majority of countries, both sex-standardized [1] and for specific sex and age groups [2]. Established methods of assessing trends include age-period-cohort models [3, 4] and the estimated annual percentage change [5]. The estimated annual percentage change (EAPC) has been in use for many years by cancer registries to quantify changes in cancer rates over time and to project future rates [3, 5,6,7]. Conceptually, EAPC represents the average change in the age-standardized rate (ASR) per year. It is usually computed by estimating the slope of a linear regression (LR) model fitted to the log-transformed ASR. Under this framework, for every one-year increase in calendar time, the ASR is assumed to change by a constant factor when expressed as a percentage of the previous year’s rate. However, the LR model can also be used to model the ASR without log-transformation and the slope term will then correspond to the change in ASR for each calendar year [8]. There is no simple relationship between the slopes from these two models and data analysts need to assess whether the increase is linear or exponential when deciding whether the untransformed or log-transformed ASR is the most appropriate.
In epidemiology, the odds ratio is a commonly used measure of association between a binary outcome and an exposure. In this paper, we propose to use odds to quantify time trends in annual ASRs to eliminate the need to consider whether transformation of ASR is necessary when testing for a temporal trend. This approach involves modeling the ranked ASR values across calendar years using the rank-ordered logit (ROL) regression model to obtain the relevant estimates [9]. We illustrate the method by applying it to data from the Cancer Incidence in Five Continents plus (CI5plus) database, and comparing the estimates we obtain to the scaled estimates from the usual LR models, where the scale parameter is estimated from the standard deviation of the error terms.
Material and methods
The usual approach used to compute EAPC in incidence rates assumes the log-transformed ASR is linearly related to time and a LR model is fitted to the log-transformed ASR with calendar year as the (continuous) independent variable:
where the subscript i represents the i-th year (i = 1, 2, …, n) and the error terms, εis, are assumed to be independent and normally distributed with mean 0 and variance \( {\sigma}_i^2 \) [5, 6]. If the error terms have equal variance (i.e., \( {\sigma}_i^2={\sigma}^2 \)), then simple unweighted least squares provides an estimate of the slope term, β1. As incidence is represented as a count, the assumption of equal variances may not be reasonable, especially for rare diseases, and a weighted least squares may be more appropriate, where the weight for yi is \( {w}_i=\frac{1}{{\sigma_i}^2} \) (see Supplementary materials and methods for details). In practice, when fitting such models to sparse data, there is a need to account for age strata with no events as the log of zero is undefined.
When a LR of the ASR (i.e., no log-transformation) is used to estimate trend [8], the parameter β1 in Eq. (1) provides an estimate of the annual increment in the incidence rate. On fitting a LR model to log-transformed rates, the EAPC is given by the following transformation of the coefficient (β1):
The rank-ordered logit model
The ROL model was originally developed in marketing research for modelling an individual’s preferences for n products [9]. The model is linear as in Eq. (1), but the error terms are assumed to be extreme value type 1 (EVT1) distributed with location μ = 0 and scale λ = 1 (i.e., standard EVT1 distributed). Under these assumptions, β1 can be estimated from the ranked observations based on:
In marketing research applications, the β1 parameter indicates the association between a feature of the products and the individual’s preference: for example, if a decrease in the price of a product is associated with an increase in its preference, then exp{β1} represents the odds of a higher rank (or preference) when the price decreases by one unit. When the error term assumption is fulfilled, the estimate of β1 also has the usual linear interpretation as in Eq. (1). Note that Eq. (3) is the familiar partial likelihood of a Cox-regression model [10,11,12]. Hence, the ROL model can be implemented using standard statistical software by using the commands provided for Cox regression analysis.
In applying ROL models to time trend analysis of incidence rates, the ASR (i.e., y) is used to rank the calendar years. Thus the calendar year is the explanatory variable (i.e., x) and the ROL model provides an estimate of the association between calendar year and the magnitude (or rank) of the ASR. Since the ROL is indifferent to any transformation of the outcome that preserves the ordering, the odds of the subsequent calendar year having a higher value (or rank) than the current year is exp{β1}, regardless of whether or not the ASR is log transformed.
The scale parameter, λ, for the slope term, β 1, from the linear regression model
The ROL model specifically assumes standard EVT1 distributed error terms, thus the variance equals π2/6. In contrast, the variance of the error terms in the LR model is not specified a priori but estimated from the data. Because of this, the β1 estimates from the two regression models are not comparable. We can overcome this by scaling the outcome in the LR model (and thus scaling β1).
For a linear model such as that in Eq. (1), if the error terms are independently and identically distributed with an EVT1 distribution with μ = 0 and λ > 0, the variance of the error terms (and consequently the variance of the outcome) is given by,
Hence, we can estimate a scale-like parameter, λ, from the error terms obtained from the usual LR (assuming these are independent and normally distributed with mean 0 and σ > 0) by equating the variance expression in Eq. (4) with the estimate of σ from the LR model and solving for λ, i.e., \( \uplambda =\sqrt{6}\sigma /\pi \).
Scaling the outcome variable y from Eq. (1) by λ gives \( {y}_i^{\ast }={y}_i/\uplambda ={\beta}_0^{\ast }+{\beta}_1^{\ast }{x}_i+{\varepsilon}_i^{\ast } \) where \( {\varepsilon}_i^{\ast }={\varepsilon}_i/\uplambda \) mimics the standard EVT1 distribution assumption of the error terms in the ROL in Eq. (3). Hence, the scaled slope parameter \( {\beta}_1^{\ast }={\beta}_1/\uplambda \) from Eq. (1) represents the slope parameter in the ROL model in Eq. (3). Thus, the proposed scaled slope from LR has a similar interpretation to the log-odds in Eq. (3). Thus, we have provided a heuristic argument for scaling the slope from a simple (unweighted) LR where the error variance is represented by a single parameter, σ. Extending this to weighted LR would require a single value to represent the variation of the error terms. For simplicity, we propose using the mean of the standard deviations in the different calendar years (i.e., \( \sigma =\sum \limits_{i=1}^n{\sigma}_i/n \)) to represent the overall underlying variation over the time-period of study.
Application to cancer data
The CI5plus database has annual incidence rates for 27 cancer sites in 118 populations from 1953 to 2007 with calendar periods of coverage varying for different populations. With the exception of cancers of the breast, cervix uteri, corpus uteri, and ovary and other uterine adnexa in females, and cancer of the prostate and testis in males, cancers at all sites are reported separately for males and females. Yearly incident cancer cases, cijs, and population denominators, nijs, aggregated by five-year age groups provide incidence rates suitable for performing time trend analysis where i and j denote the i-th calendar year and j-th age group. We harmonized all incidence rates and denominators using 16 age-groups (0–4, 5–9, …, 70–74, 75+), and used the Segi world standard population, sjs, to compute the ASRs [13]. We replaced any ASR of zero with half the value of the smallest non-zero ASR in the database for the cancer site(s) being analyzed. From the 27 cancer sites (four of them gender-specific) in 118 populations, we had a total of 5900 trends for analysis. In addition to site and sex-specific cancers, we also considered all sites excluding non-melanoma skin cancer.
We applied the three approaches outlined in the previous section to these worldwide cancer rates. The first approach was the LR of the log-transformed rates (LR-ln), the second approach was the LR of the untransformed rates (LR-un), and the third approach was our proposed ROL regression model on the ranked rates. The LR models were fit using weighted least squares. The estimates of trend obtained from the three approaches and the corresponding scaled-estimates for LR-ln and LR-un were compared. We inspected the concordance in sign with respect to p-values for the scaled and unscaled estimates. Additionally, we reported the results from analysis of the trends stratified by sex to demonstrate the consistency with published work and to highlight important trends. To corroborate the contrasting trends that have been reported for breast cancer in Singapore and Sweden [14, 15], we conducted a specific analysis that compared the incidence rates to illustrate the ROL model’s indifference to transformation and to demonstrate the comparability of the estimates obtained.
All analyses were performed with the statistical package R, version 3.1.2 [16] and the commands are provided in the Supplementary material, together with the commands for implementation in other widely-used statistical software packages (SAS, Stata, SPSS).
Results
Figure 1 presents the results from the application of the three approaches to the CI5plus database. The scatterplots in the left column of Fig. 1 provide a pairwise comparison of the estimates of the slope, β1, from (a) the LR of log-transformed and untransformed ASRs (b) the log-transformed ASRs and the ROL, and (c) the untransformed ASRs and the ROL. As expected, these plots did not exhibit a clear relationship between the estimates, although there was a high concordance in the signs of the estimates across the three approaches, with 5325 out of 5900 combinations (90.3%) having the same sign across all three approaches. With regard to inference concerning the direction of temporal trends, the p-values corresponding to these concordant scenarios were lower than those from scenarios where the signs were discordant (see Fig. 2). Examining the scaled-estimates, \( {\beta}_1^{\ast } \), from the linear regression of untransformed and log-transformed rates and comparing them to each other (Fig. 1 (d)) and comparing each of these estimates to the β1 estimate from the ROL analysis (Fig. 1 (e) and (f)), we see that the scatterplots exhibit a pronounced linear relationship along the line-of-identity (i.e., the grey diagonal line corresponding to y = x).
On inspection of the divergent points in Fig. 1 (d), (e), and (f), we found several of these were for prostate cancer where the introduction of screening resulted in the familiar “screening effect” feature in the incidence profile so that it is not reasonable to consider a linear fit. One unexpected disagreement was for thyroid cancer in New York, whose incidence curve had an apparent screening effect in 2000–2005, and we found that indeed thyroid cancer screening had been offered in New York after the events of 9/11 [17, 18]. For disagreements not due to screening, we found that where the estimates from LR models of untransformed and log-transformed rates disagree, the ROL estimate tends to agree well with the most appropriate LR estimate. These and other divergent points from Fig. 1 are presented in detail in Additional file 1: Figures S1 and S2.
The numerical results from the ROL and LR of the log-transformed rates of sex-specific rates are presented in Table 1, where we do not report results for any cancers where 25% or more of the yearly ASRs were less than 3 per 100,000: cancers of the eye, bone, testis, gallbladder, Hodgkin’s lymphoma and multiple myeloma. The remaining cancers were sorted by the concordance in the significance between the two approaches across the 118 populations. For cancers that affect both genders, the average concordance was used. For “All sites but non-melanoma skin,” the overall concordance between LR-ln and ROL was 84.7% among the 118 populations for both males and females, with an increasing trend in at least 75% of the 118 populations as indicated by the interquartile range excluding an odds of 1 in the ROL and excluding an EAPC value of 0 in the LR-ln analyses respectively. Among the nine cancer sites with more than 70% concordance in significant findings among the 118 populations, two of the three sex-specific cancers (prostate and breast) had an increasing trend in the majority of the populations (≥ 75%) while cancer of the cervix had a decreasing trend in the majority of the populations. For the six cancers affecting both sexes, there was evidence in a majority of populations of an increasing trend for both males and females in cancer of the thyroid and kidney, non-Hodgkin’s lymphoma and non-melanoma skin cancer and a decreasing trend in both sexes for stomach cancer. For lung cancer, there was evidence of an increasing trend in females and decreasing trend in males.
For cancer sites with lower concordance in significant findings between LR-ln and ROL, the evidence of an increasing or decreasing trend among the 118 populations is weaker. Only liver cancer in men and uterine cancer in women had an increasing trend of reasonable magnitude (median odds 1.14 and 1.10 respectively). For most of the rarer cancers, the odds estimates from the different populations were close to 1 and the EAPC close to 0.
Figure 3 displays the untransformed and log-transformed ASR of female breast cancer incidence in Singapore and Sweden, suggesting that a linear trend was reasonable for both the untransformed (a) or log-transformed (b) data in both populations. In Table 2, we report the estimates from the LR analysis of both the untransformed and log-transformed rates. The scaled-slope estimates from both analyses were close to the estimates from the ROL analysis in both populations, with slightly better agreement for untransformed rates in the Swedish data. All analyses indicated an increasing trend in breast cancer incidence in both Singapore and Sweden, with a steeper trend in Singapore than in Sweden, consistent with Fig. 3 and with previously published work [14, 15].
Discussion
We have described an alternative approach to quantifying temporal trends that is comparable to current practice but with some important advantages. In contrast to much of the published disease trends, which are estimated with specialized models and software [1, 2], our approach uses simple commands available in any standard statistical package and implements a familiar model (Cox proportional hazards regression) to yield an estimate of trend using a measure (the odds) that is familiar in epidemiology. We have provided detailed instructions in the Supplementary material for implementation in several commonly used statistical software packages. The method uses the ROL model, which is commonly used in marketing research but is not a mainstream analytical tool in traditional epidemiology. The usefulness of the model in assessing trends is that it is indifferent to transformations of the age-standardized rates, so there is no need to assess whether the untransformed or log-transformed rates are the most appropriate before proceeding with estimation. This can simplify comparisons across populations where the decision to transform differs.
We applied the method to investigate evidence of temporal trends in site-specific cancer incidence rates in the 118 populations represented in the CI5plus database and compared our results to those from the usual regression models. We found strong concordance in the signs of the estimates and the significance of temporal trends across the three approaches: linear regression (LR) analysis of the untransformed (LR-un) or transformed rates (LR-ln), and ROL. In particular, we found the scaled slopes from the weighted LR analyses to be highly correlated with, and similar to, the β1 estimates from the ROL model. Unlike the weighted LR whose weights require age-specific population counts and incident cases, our method can be implemented with only annual ASR data. To compare our estimates to those that could be obtained from LR of such data, we conducted a sensitivity analysis using unweighted least squares and obtained very similar results (see Additional file 1: Table S1 and Figure S3) and a high concordance (93.7%: 5526 out of 5900 combinations) in the signs of the estimates across all three approaches (see Additional file 1: Figure S4).
Our analysis demonstrated an increasing trend in many cancers for both men and women, consistent with what has been reported previously [14]. Exceptions, which have also been noted previously, were stomach cancer which had a decreasing trend in both sexes [19], and lung cancer which had an increasing trend in women but decreasing trend in men in a majority of the populations [20]. This lung cancer pattern has been recently observed in many countries and has been attributed to increased smoking among women [21]. The decrease in stomach cancer is harder to explain, but may be due in part to increased exposure to antibiotics [22]. We also found evidence of a decreasing trend in cervical cancer, which has been observed in many populations and been attributed to population-based screening programs [14, 23].
Our comparative analysis of trends can offer additional insights into the health situation within or between specific populations. Our analysis of worldwide cancer incidence rates highlighted a number of interesting features, including the effects of population screening programs (e.g., for prostate cancer), unexpected screening as in New York after the events of 9/11, and the lung cancer profile in Russia (Additional file 1: Figure S2(h)) due to the lack of progress in tobacco control [24].
Conclusions
The consistency of our estimates from ROL with those from least squares provides empirical evidence that temporal trends in cancer incidence can be represented by odds. The method, which can be seamlessly implemented in standard software, provides a transformation-free alternative that facilitates comparison of trends across different populations in the incidence or mortality rates for any disease or the prevalence rates of known risk factors [25]. For trends that are routinely assessed and reported using regression models, using transformed or untransformed rates, simply including an estimate of the error variance with the reported slope would allow population estimates to be compared with estimates from ROL and all estimates to be combined in meta-analyses, simplifying communication and comparison across populations.
Abbreviations
- ASR:
-
Age-standardized rate
- EAPC:
-
Estimated annual percentage change
- LR:
-
Linear regression
- LR-ln:
-
Linear regression of log-transformed
- LR-un:
-
Linear regression of untransformed
- ROL:
-
Rank-ordered logit
References
GBD 2016 Disease and Injury Incidence and Prevalence Collaborators. Global, regional, and national incidence, prevalence, and years lived with disability for 328 diseases and injuries for 195 countries, 1990–2016: a systematic analysis for the Global Burden of Disease Study 2016. Lancet. 2017;390:1211–59.
GBD 2016 Causes of Death Collaborators. Global, regional, and national age-sex specific mortality for 264 causes of death, 1980–2016: a systematic analysis for the Global Burden of Disease Study 2016. Lancet. 2017;390:1151–210.
Rosenberg PS, Anderson WF. Age-period-cohort models in Cancer surveillance research: ready for prime time? Cancer Epidemiol Biomark Prev. 2011;20:1263–8.
Masters RK, Tilstra AM, Simon DH. Explaining recent mortality trends among younger and middle-aged white Americans. Int J Epidemiol. 2018;47:81–8.
Fay MP, Tiwari RC, Feuer EJ, Zou ZH. Estimating average annual percent change for disease rates without assuming constant change. Biometrics. 2006;62:847–54.
NORDCAN: Glossary of statistical terms http://www-dep.iarc.fr/nordcan/English/glossary.htm. Accessed April 14 2018.
Rahib L, Smith BD, Aizenberg R, Rosenzweig AB, Fleshman JM, Matrisian LM. Projecting cancer incidence and deaths to 2030: the unexpected burden of thyroid, liver, and pancreas cancers in the United States. Cancer Res. 2014;74:2913–21.
Jensen OM, Parkin DM, MacLennan R, Muir CS, Skeet RG. Cancer registration: principles and methods. IARC Sci Publ. 1991;95:1-288.
Beggs S, Cardell S, Hausman J. Assessing the potential demand for electric cars. J Econ. 1981;17:1–19.
Therneau TM, Grambsch PM. Modeling survival data: extending the cox model. New York: Springer; 2000.
Allison PD, Christakis NA. Logit-models for sets of ranked items. Sociol Methodol. 1994;24:199–228.
Tan CS, Støer NC, Chen Y, Andersson M, Ning Y, Wee HL, Khoo EYH, Tai ES, Kao SL, Reilly M. A stratification approach using logit-based models for confounder adjustment in the study of continuous outcomes. Stat Methods Med Res. 2017; Accepted.
Segi M. Cancer mortality for selected sites in 24 countries (1950–1957). Sendai: Tohoku University School of Medicine; 1960.
Jemal A, Center MM, DeSantis C, Ward EM. Global patterns of cancer incidence and mortality rates and trends. Cancer Epidemiol Biomark Prev. 2010;19:1893–907.
Chia KS, Reilly M, Tan CS, Lee J, Pawitan Y, Adami HO, Hall P, Mow B. Profound changes in breast cancer incidence may reflect changes into a westernized lifestyle: a comparative population-based study in Singapore and Sweden. Int J Cancer. 2005;113:302–6.
R Core Team: R: A language and environment for statistical computing. 2013.
Boffetta P, Zeig-Owens R, Wallenstein S, Li J, Brackbill R, Cone J, Farfel M, Holden W, Lucchini R, Webber MP, et al. Cancer in world trade center responders: findings from multiple cohorts and options for future study. Am J Ind Med. 2016;59:96–105.
Li J, Brackbill RM, Liao TS, Qiao B, Cone JE, Farfel MR, Hadler JL, Kahn AR, Konty KJ, Stayner LT, Stellman SD. Ten-year cancer incidence in rescue/recovery workers and civilians exposed to the September 11, 2001 terrorist attacks on the world trade center. Am J Ind Med. 2016;59:709–21.
Bertuccio P, Chatenoud L, Levi F, Praud D, Ferlay J, Negri E, Malvezzi M, La Vecchia C. Recent patterns in gastric cancer: a global overview. Int J Cancer. 2009;125:666–73.
Lortet-Tieulent J, Soerjomataram I, Ferlay J, Rutherford M, Weiderpass E, Bray F. International trends in lung cancer incidence by histological subtype: adenocarcinoma stabilizing in men but still increasing in women. Lung Cancer. 2014;84:13–22.
Jemal A, Thun MJ, LAG R, Howe HL, Weir HK, Center MM, Ward E, Wu XC, Eheman C, Anderson R, et al. Annual Report to the Nation on the Status of Cancer, 1975-2005, Featuring trends in lung Cancer, tobacco use, and tobacco control. J Natl Cancer I. 2008;100:1672–94.
Parkin DM. The global health burden of infection-associated cancers in the year 2002. Int J Cancer. 2006;118:3030–44.
Mathew A, George PS. Trends in incidence and mortality rates of squamous cell carcinoma and adenocarcinoma of cervix--worldwide. Asian Pac J Cancer Prev. 2009;10:645–50.
Holmes D. Smoking in Russia: will old habits die hard? Lancet. 2011;378:973–4.
GBD 2016 Risk Factors Collaborators. Global, regional, and national comparative risk assessment of 84 behavioural, environmental and occupational, and metabolic risks or clusters of risks, 1990–2016: a systematic analysis for the Global Burden of Disease Study 2016. Lancet. 2017;390:1345–422.
Acknowledgements
Not applicable.
Funding
This work was supported by the Centre for Health Services and Policy Research SBRO14/NS01G from the National University Health Systems Pte Ltd., National University of Singapore Start-up Grant (WBS: R-608-000-059-133), and the grant (contract 16 0497) from the Swedish Cancer Society (Cancerfonden).
Availability of data and materials
The dataset supporting the conclusions of this article is available at: http://ci5.iarc.fr/CI5plus/Default.aspx
Author information
Authors and Affiliations
Contributions
CST conceptualized the project, performed the data analysis, and drafted the manuscript. Author NS and MR contributed to development of the project, interpreted the findings and revised the manuscript. YN and YC managed and processed the data, participated in the data analysis and revised the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
The study was exempted from full Institutional Review Board review by the National University of Singapore Institutional Review Board because it involved use of existing data that is publicly available.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional files
Additional file 1:
Supplementary materials. (DOC 5245 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Tan, C.S., Støer, N., Ning, Y. et al. Quantifying temporal trends of age-standardized rates with odds. Popul Health Metrics 16, 18 (2018). https://doi.org/10.1186/s12963-018-0173-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12963-018-0173-5