 Research
 Open Access
 Published:
Evaluation of four gammabased methods for calculating confidence intervals for ageadjusted mortality rates when data are sparse
Population Health Metrics volume 20, Article number: 13 (2022)
Abstract
Background
Equaltailed confidence intervals that maintain nominal coverage (0.95 or greater probability that a 95% confidence interval covers the true value) are useful in intervalbased statistical reliability standards, because they remain conservative. For ageadjusted death rates, while the Fay–Feuer gamma method remains the gold standard, modifications have been proposed to streamline implementation and/or obtain more efficient intervals (shorter intervals that retain nominal coverage).
Methods
This paper evaluates three such modifications for use in intervalbased statistical reliability standards, the Anderson–Rosenberg, Tiwari, and Fay–Kim intervals, when data are sparse and sample sizebased standards alone are overly coarse. Initial simulations were anchored around small populations (P = 2400 or 1200), the median crude allcause US mortality rate in 2010–2019 (833.8 per 100,000), and the corresponding agespecific probabilities of death. To allow for greater variation in the ageadjustment weights and agespecific probabilities, a second set of simulations draws those at random, while holding the mean number of deaths at 20 or 10. Finally, countylevel mortality data by race/ethnicity from four causes are selected to capture even greater variation: all causes, external causes, congenital malformations, and Alzheimer disease.
Results
The three modifications had comparable performance when the number of deaths was large relative to the denominator and the age distribution was as in the standard population. However, for sparse countylevel data by race/ethnicity for rarer causes of death, and for which the age distribution differed sharply from the standard population, coverage probability in all but the Fay–Feuer method sometimes fell below 0.95. More efficient intervals than the Fay–Feuer interval were identified under specific circumstances. When the coefficient of variation of the ageadjustment weights was below 0.5, the Anderson–Rosenberg and Tiwari intervals appeared to be more efficient, whereas when it was above 0.5, the Fay–Kim interval appeared to be more efficient.
Conclusions
As national and international agencies reassess prevailing data presentation standards to release ageadjusted estimates for smaller areas or population subgroups than previously presented, the Fay–Feuer interval can be used to develop intervalbased statistical reliability standards with appropriate thresholds that are generally applicable. For data that meet certain statistical conditions, more efficient intervals could be considered.
Background
The number of deaths reported for any given age group and time period can be assumed to follow a Poisson distribution, which leads to exact confidence intervals (CIs) for agespecific mortality rates [1, 2]. Further, because the sum of independent Poisson random variables is Poissondistributed, the crude mortality rate also has an exact CI. However, no exact CI is known for ageadjusted mortality rates (i.e., directly standardized rates), because those are based on a weighted sum of Poisson random variables [3].
Various methods have been proposed to calculate approximate CIs for directly standardized rates, and recent simulation studies have continued to compare those methods based on metrics such as coverage probability and expected width; see [4,5,6] for three such simulation studies. To date, only the gammabased method of Fay and Feuer [7] has been shown empirically to guarantee nominal coverage (e.g., 0.95 or higher probability that a 95% CI covers the true rate) in all simulation and realworld settings considered, though it often results in overly wide CIs. Tiwari et al. [8] developed a modification of the Fay–Feuer method to address the need for more efficient intervals (i.e., shorter intervals that retain nominal coverage) to accompany the estimates of ageadjusted rates that are published by the National Cancer Institute (NCI) at the US National Institutes of Health [9]. However, in some cases, CIs based on the Tiwari method can fail to retain nominal coverage [4]. Fay and Kim [10] proposed a midp modification to the Fay–Feuer CI, which does not guarantee nominal coverage but achieves it in many situations while remaining narrower than the Fay–Feuer or Tiwari CIs.
The Division of Vital Statistics at the US National Center for Health Statistics (NCHS), Centers for Disease Control and Prevention (CDC), had developed a gammabased approximation to the Fay–Feuer method for the ageadjusted mortality rates that it published; see technical notes in Anderson and Rosenberg [11] as well as the methods section, below, for a description. Whereas the US Cancer Statistics (which include cancer registry data from NCI’s Surveillance, Epidemiology, and End Results Program as well as from CDC’s National Program of Cancer Registries) currently use the Fay–Feuer and Tiwari CIs [12], NCHS publications (e.g., National Vital Statistics Reports) and CDC WONDER use the Anderson–Rosenberg method when the number of events is less than 100; for 100 events or more, the normal CI is used [11, 13]. By design, the Anderson–Rosenberg method was simpler than the Fay–Feuer method to implement at NCHS as well as in the 57 state and local vital registration jurisdictions [14] because it could use pretabulated standard values for the upper and lower CI limits and allowed the user to more easily replicate calculations from published data.
With the wider availability of computing resources, the simplicity of the Anderson–Rosenberg method can no longer be the standalone rationale for its continued adoption in federal, state, or local agencies. Additionally, over the past 9 years, NCHS has been in the process of critically evaluating the hitherto prevailing statistical standards for the presentation of estimates in NCHS publications with an eye toward releasing statistically reliable estimates for sparse data (e.g., smaller geographical areas or population subgroups) that would have previously been suppressed due to sample size alone or other statistical considerations. Current statistical reliability standards for proportions at NCHS include samplesize based requirements (minimal sample or effective sample size) and intervalbased criteria (thresholds for maximal length and relative width of “exact” confidence intervals) [15]. Similar criteria are under discussion for rates [16].
As of the writing of this manuscript, the prevailing NCHS standard for vital rates was sample sizebased. Estimates would be suppressed or flagged as statistically unreliable if they were based on less than 20 events [17]. The intervalbased thresholds discussed in [16] had not been adopted. For ageadjusted rates that are based on 20 or more events, and when the underlying atrisk population is large, the aforementioned gammabased methods result in comparable CIs, all with at least nominal coverage, though, as will be seen below, the Anderson–Rosenberg CIs tend to be narrower (i.e., more efficient) than those from the original Fay–Feuer method and the Tiwari modification and can sometimes also be narrower than the Fay–Kim midp CIs. If the sample size threshold for presentation of rates was lowered from 20 to just 10 events, consistent with the minimum threshold required for disclosure protection of subnational vitals data at NCHS [13] and elsewhere [18], but also with recent findings in [6] about the sufficient stability of estimates that are based on 10 events or more, then additional data presentation criteria could be required. If intervalbased thresholds were to be used, then it would be necessary for the continued use of the Anderson–Rosenberg method to formally evaluate it in comparison with the other three gammabased methods, specifically in terms of coverage probability and expected width, because, like the Tiwari and the Fay–Kim methods, it may also result in CIs that fail to maintain nominal coverage.
This paper conducts such a comparative evaluation, which, to our knowledge, had not previously been conducted. When data are sparse, our aim is to better understand the conditions that lead to the coverage probability of those a priori conservative CIs to fall below the desired level (for example, 0.95) or to CIs that are overly wide and less useful to assess precision. Our ultimate goal is to inform a CIbased statistical reliability threshold to use in conjunction with a sample sizebased threshold of 10, say, as basis for the decision to suppress or present official estimates. Many other CI methods appear in the literature, and we do not aim to study them all here. We focus instead on the relative performance of the four aforementioned gammabased methods because they are most relevant when a conservative approach to the assessment of statistical reliability of ageadjusted rates is desired.
Methods
With n age groups, let D_{i} denote the number of deaths for group i. The D_{i} are assumed to be independent Poisson random variables, and the agespecific rates R_{i} are defined as the ratios D_{i}/P_{i}, with means \({\mathbb{E}}\)(R_{i}) = λ_{i} and variances \({\mathbb{V}}\)(R_{i}) = λ_{i}/p_{i}.
Let π_{i} denote the size of group i in the reference population; see "Technical Appendix". Let the w_{i} denote the relative proportions for group i in the reference population: w_{i} = π_{i}/∑π_{j}. The ageadjusted death rate R′ is defined as
Given the parameters λ_{i} and denominators P_{i} = p_{i}, the ageadjusted rate R′ has mean \({\mathbb{E}}\)(R′) = λ′ = ∑w_{i} λ_{i} and variance \({\mathbb{V}}\)(R′) = ∑w_{i}^{2} λ_{i}/p_{i}.
Fay–Feuer interval
As explained in "Technical Appendix", Fay and Feuer [7] conjecture that tail probabilities for the ageadjusted rate R′ can be approximated by those of a gammadistributed random variable Z with \({\mathbb{E}}\)(Z) = y and \({\mathbb{V}}\)(Z) = v, i.e., with α = y^{2}/v and β = v/y, where y = ∑(w_{i}/p_{i}) x_{i} and v = ∑(w_{i}/p_{i})^{2} x_{i}:
As a result, the lower limit L(y) of an equaltailed 100(1 − a) percent CI for the parameter λ′ can be resolved approximately from the lower tail probability of a gamma distribution with parameters α = y^{2}/v and β = v/y, with the convention that L(0) = 0.
For the upper bound, the observed number of deaths x_{j} within group j is incremented by 1, resulting in the addition of the quantity w_{j}/p_{j} to the ageadjusted rate y = ∑(w_{i}/p_{i}) x_{i}. Because such a unit increment could be realized in any of the n groups,
where κ_{0} = max{w_{j}/p_{j}}. Thus, an upper CI limit U(y) can be resolved from the upper tail probability of a gamma distribution with shape parameter α = y′^{2}/v′ and scale parameter β = v′/y′ where y′ = y + κ_{0} and v′ = v + κ_{0}^{2}.
Fay and Feuer [7] conjecture that the approximate gamma CI thus constructed remains conservative. Although this conjecture remains unproven, findings from the many simulation studies to date continue to support it, e.g., [4,5,6].
Tiwari modification
Tiwari et al. [8] developed a modification to the Fay–Feuer method described above by distributing an average increment 1/n uniformly across the n age groups instead of a unit increment in a single age group. Thus, with κ_{1} = n ^{−1} ∑w_{i}/p_{i} and κ_{2} = n ^{−1} ∑(w_{i}/p_{i})^{2}, the gamma random variable Z′ above now has mean y′ = y + κ_{1} and variance v′ = v + κ_{2}. The Tiwari modification reduces the CI width relative to the Fay–Feuer method; see "Technical Appendix". However, the resulting CI sometimes fails to retain the nominal coverage level; see [4].
Fay–Kim modification
Fay and Kim [10] more recently developed a midp version of the Fay–Feuer CI, as detailed in "Technical Appendix". Drawing B = b from a Bernoulli distribution with Pr(B = 1) = 1/2, the midp version uses the following gamma distribution:
where y′ = y + κ_{0} and v′ = v + κ_{0}^{2} are as in the Fay–Feuer construction. Thus, the lower and upper limits are defined as the (a/2)th and (1−a/2)th quantiles of gamma_{midp}. R syntax is provided to find numerical solutions L(y) and U(y) [10].
Anderson–Rosenberg approximation
Anderson and Rosenberg [11] had introduced an approximation to the Fay–Feuer upper CI limit that alleviated the need to calculate κ_{0} = max{w_{j}/p_{j}}; see "Technical Appendix". A “standardized” gamma random variable G_{adj} is defined as Z/(v/y), where the gammadistributed Z has mean y and variance v. As a result, G_{adj} has mean and variance equal to y^{2}/v. Define x_{adj} = y^{2}/v and 1/p_{adj} = v/y. If x_{adj} was an integer, then there would exist a Poisson random variable D_{adj} with mean and variance equal to λ′ p_{adj} such that
Because y^{2}/v will generally not be integer, x_{adj} is defined as the nearest integer instead (although this is not strictly necessary), and the equality in this last equation is assumed to hold approximately. Either way, CI limits L(y) and U(y) for λ′ are derived as the (a/2)quantile of the gamma(x_{adj}, 1/p_{adj}) distribution and the (1 − a/2)quantile of the gamma(x_{adj} + 1, 1/p_{adj}), respectively.
Comparisons among the four gammabased CI methods
The Anderson–Rosenberg method can be seen not just as an approximation to, but as a modification of the Fay–Feuer CI that, like the Tiwari modification, reduces CI width. Further, a sufficient (but not necessary) condition exists that, when it holds, ensures the Anderson–Rosenberg CI is narrower than the Tiwari CI; see "Technical Appendix". Of course, as it is theoretically possible for both the Anderson–Rosenberg and the Tiwari CIs to be so narrow as to fail to retain nominal coverage, the empirical simulations, below, investigate situations where this may occur. In addition, these two CI methods are compared to the more recent Fay–Kim midp modification.
Several simulation scenarios were considered, each consisting of 500 simulations with 10,000 replicates. For each replication, the 95 percent CI limits were calculated according to the Fay–Feuer, Tiwari, Fay–Kim, and Anderson–Rosenberg methods. For each simulation, the coverage probability and expected CI width were tracked and plotted against the coefficient of variation (CV) of the weights u_{i} = w_{i}/p_{i}, as variability of the latter is known to contribute to undercoverage [7]. To account for simulation error, nominal 95% coverage was considered to have been achieved if the simulated coverage probability was ≥ 0.9449, which is the onesided 99% confidence limit for a binomial with size 10,000 and success probability 0.95.
NCHS conventionally rounds the agespecific mortality rates, expressed as rates per 100,000 population, to one decimal point prior to calculating the ageadjusted rate for dissemination. In the simulations, unrounded values, including for x_{adj} and p_{adj}, were retained for comparability with the other two gamma methods.
All simulations and data analyses were conducted in R version 4.1.2 [19].
Scenario 1
In the first set of simulations, counts were anchored to the median annual crude allcause mortality rate in the USA from 2010 to 2019, estimated at 833.8 per 100,000 population, and the corresponding median annual probabilities of death in each age group, namely 0.009, 0.001, 0.002, 0.011, 0.018, 0.028, 0.066, 0.132, 0.181, 0.239, and 0.313 for < 1 year, 1–4, 5–14, 15–24, 25–34, 35–44, 45–54, 55–64, 65–74, 75–84, and 85 years and over, respectively. An overall population size of P = 2400 was selected to target a small overall mean number of events \({\mathbb{E}}\)(D) = 20. The total number of deaths D was drawn from a Poisson distribution with mean \({\mathbb{E}}\)(D). The agespecific event counts D_{i} were generated according to a multinomial distribution with ∑D_{i} = D and cell probabilities drawn from a Dirichlet distribution with concentration parameters equal to 833.8 times the above probabilities for each group. Finally, group sizes were generated according to a multinomial with ∑P_{i} = P and cell probabilities anchored at the median annual US values for the period 2010–2019, namely (0.012, 0.050, 0.129, 0.137, 0.137, 0.127, 0.135, 0.126, 0.084, 0.043, and 0.019) for the 11 age groups listed above.
Another simulation was conducted using the same scenario 1, but with a smaller target mean \({\mathbb{E}}\)(D) = 10. Here, because counts under 10 may be suppressed for disclosure protection (e.g., state or countylevel estimates in NCHS vital statistics releases), the statistical properties of CIs that accompany presented (nonsuppressed) estimates will be impacted. Thus, a truncated Poisson distribution was used in the simulation to maintain the overall number of deaths ∑D_{i} = D ≥ 10, with the resulting true values of the crude, agespecific, and ageadjusted rates having been recalculated accordingly.
Because the year 2000 US standard population weights w_{i} were held constant and the agespecific population sizes p_{i} were generated in proportion to the overall US national age distribution, the CV for the weights u_{i} in scenario 1 remained in a relatively narrow range and was typically no larger than 0.30. To evaluate the performance of the four gamma CIs in situations where the weights u_{i} varied more widely, the settings in Fay and Feuer [7] were implemented, as described next.
Scenario 2
The second set of simulations mimicked the settings in [7] and [8], with the weights u_{i} = w_{i}/p_{i} drawn at random from the uniform distribution on the unit interval. The total number of deaths D in the population was generated from a Poisson distribution with mean \({\mathbb{E}}\)(D) = 20. The agespecific probabilities of death were drawn independently from the uniform distribution on the unit interval, and the counts D_{i} were drawn jointly from a multinomial distribution with ∑D_{i} = D. Again, to study the effect of a smaller overall mean number of events and assess the impact of disclosure protection on the statistical properties of CIs for estimates that are not suppressed, we also experimented with \({\mathbb{E}}\)(D) = 10 using a truncated Poisson distribution to maintain ∑D_{i} = D ≥ 10.
Scenario 3
Finally, the gamma CI methods were evaluated in countylevel mortality data from four causes of death, selected to capture varying age distributions: all causes; external causes of morbidity and mortality (ICD10 codes: V01–Y89); congenital malformations, deformations, and chromosomal anomalies (Q00–Q99); and Alzheimer disease and other degenerative diseases of the nervous system, not elsewhere classified (G30–G31).
Countylevel data were queried using CDC WONDER as 20year aggregate counts over the 1999–2019 period for 3147 US counties (boundary changes notwithstanding). Data were tabulated by age group (< 1 year, 1–4, 5–14, 15–24, 25–34, 35–44, 45–54, 55–64, 65–74, 75–84, and 85 years and over), race (American Indian or Alaska Native; Asian or Pacific Islander; Black or African American; and White), and Hispanic origin (Hispanic or Latino and not Hispanic or Latino).
Some counties had numerator case or population denominator counts under 10 for selected combinations of age and race and Hispanic origin, which were suppressed in CDC WONDER due to the NCHS confidentiality protection rules. Those missing cell case and/or population counts were imputed for this analysis, holding fixed the marginal counts by age and race and Hispanic origin, to obtain a complete, semisynthetic dataset to use in simulations.
To investigate the impact of high CV on CI coverage for those sparse countylevel data, each county's observed overall numerator count and ageadjusted death rate were taken as the “truth” and 10,000 replicates were generated according to a Poisson distribution for that county with the mean equal to the observed numerator count. The county’s overall population denominator was kept fixed. Agespecific numerator counts were assigned according to a multinomial distribution conditional on the crude total, with assignment probabilities for the various age groups taken proportional to the observed counts for that county. As in scenario 1, ageadjusted rates were calculated relative to the year 2000 US standard population.
The analyses for scenario 3 were restricted to data by race and Hispanic origin instead of sex or other demographic characteristics because disparities in health and mortality outcomes by race and Hispanic origin remain an important public health concern in the US [20], and because countylevel estimates by race and Hispanic origin can be based on sparse data (less than 20 deaths) and suppressed or flagged as statistically unreliable in official publications when sample size is the only criterion used to define statistical reliability.
Results
Scenario 1
The top row in Fig. 1 shows the result of the first set of simulations, with the Fay–Feuer, Tiwari, and Anderson–Rosenberg CIs retaining nominal coverage over the limited range of variability of the weights u_{i} = w_{i}/p_{i}, whereas the coverage of the Fay–Kim CIs dipped below the 0.95 threshold in some cases, although those were within the simulation error bound of 0.9449. Additionally, the strength of the Anderson–Rosenberg approach is demonstrated in CIs that were consistently narrower (i.e., more efficient) than the Tiwari and Fay–Feuer CIs. The Fay–Kim method resulted in even narrower CIs for smaller CV values, albeit at the occasional cost of coverage probability falling below 0.9449.
The bottom row in Fig. 1 shows the results of a second set of simulations conducted using the same scenario 1, but with a smaller target mean \({\mathbb{E}}\)(D) = 10 and a truncated Poisson distribution. The results here were similar to the ones in the top row of Fig. 1 and highlight the relative efficiency of the Anderson–Rosenberg CI, even in this sparser setting, compared with the Fay–Feuer and Tiwari methods for all values of CV(u_{i}) shown, and with the Fay–Kim method for larger values of CV(u_{i}).
Scenario 2
The top row in Fig. 2 shows the result of the second set of simulations, which allow for an increased variability in the weights u_{i} = w_{i}/p_{i}, with both the Tiwari and Anderson–Rosenberg methods in close agreement and resulting in narrower intervals than the Fay–Feuer method while retaining 0.95 coverage, except for a handful of instances where the weights u_{i} = w_{i}/p_{i} had CV close to 1.00. The Fay–Kim method performed relatively well when CV ≈ 1.00 compared to the Tiwari and Anderson–Rosenberg methods, increasing coverage probability (albeit with slightly wider CIs).
The bottom row in Fig. 2 shows the impact of a smaller target mean \({\mathbb{E}}\)(D) = 10 and a truncated Poisson distribution, with the results that were again similar to the ones in the top row and showed adequate coverage for all three modifications to the original Fay–Feuer method, although coverage declined as weights variability increased.
Scenario 3
Because high variability of the weights w_{i}/p_{i} is a known contributor to undercoverage [7], as shown in Fig. 2, the distribution of these weights was examined using the countylevel data, where the p_{i} are the age and race and Hispanic originspecific population denominators for each county. Boxplots are shown in Fig. 3, with a CV as high as 3.0 for some counties and race and Hispanic origin groups.
Ageadjusted mortality rates for counties where the overall count D was less than 10 would be suppressed in accordance with NCHS confidentiality protection. Thus, comparisons among the four CI methods were most informative in counties with 10 or more deaths, as shown in Table 1. For those, when the CV of the u_{i} = w_{i}/p_{i} was below 0.5, the Anderson–Rosenberg and Tiwari CIs almost always achieved nominal coverage, just like the Fay–Feuer CI, even for counties with 10–19 deaths. On the other hand, the Fay–Kim CI failed to achieve nominal coverage in cases where the other CIs did, notably for counties with 100 or more deaths. When the CV was larger than 0.5, there was a marked undercoverage for the Anderson–Rosenberg CIs, and, to a lesser extent, the Tiwari CI, in counties with 10–19 deaths but also in those with 20–99 deaths; in comparison, the Fay–Kim CI performed better in those cases, almost on par with the Fay–Feuer CI. Undercoverage of the Anderson–Rosenberg CI was more pronounced for rarer causes of death, e.g., ICD10 codes Q00–Q99, in smaller and more clustered population subgroups than the nonHispanic white population, such as the Hispanic or Latino or the nonHispanic American Indian or Alaska Native populations, where nominal coverage was achieved for only about three in four counties with D = 10–99 and CV > 0.5.
Discussion
This paper conducted a comparative evaluation of four gammabased methods for calculating CIs for ageadjusted mortality rates to inform their possible use in setting CIbased statistical reliability standards. In addition to being easier to implement because it can use pretabulated standard values for the upper and lower CI limits and allows the user to more easily replicate calculations from published data, the Anderson–Rosenberg CI appeared in simulations to be more efficient (i.e., shorter, while retaining nominal coverage) than either the Tiwari or Fay–Feuer CI in “large scale” estimates where the numerator count was large relative to the denominator population size and the age distribution followed the age distribution in the standard population. In contrast, even though the Fay–Kim method could result in even narrower CIs in those “large scale” scenarios, this was sometimes at the expense of the coverage probability falling below 0.95. However, for “small scale” estimates like countylevel data by race and Hispanic origin for less common causes of death (scenario 3), and for which the age distribution differed sharply from the age distribution in the standard population, nominal CI coverage in both the Anderson–Rosenberg and Tiwari methods was compromised when the adjustment weights u_{i} = w_{i}/p_{i} were highly variable, and the Fay–Kim method performed better in those situations, on par with the Fay–Feuer method. Nonetheless, in situations where the CV of the weights u_{i} = w_{i}/p_{i} can be assessed in advance, when the CV is low, e.g., below 0.5, the user may still decide to use either the Anderson–Rosenberg or the Tiwari CIs instead of the Fay–Feuer CI if a shorter yet conservative interval is desired. If the user is willing to trade off subnominal coverage (e.g., coverage probability below 0.95 for 95% CIs) in some instances with low CV (e.g., below 0.5) for CIs that attain nominal coverage “on average” and are generally shorter, the Fay–Kim midp CI can be a good alternative.
Conclusions
The Fay–Feuer CI can be used universally as the basis for formulating a CIbased threshold for statistical reliability of ageadjusted rates, because it maintains the nominal (e.g., 0.95 or higher) coverage probability in a large variety of studied situations. However, alternatives exist that are more efficient and perhaps more desirable under some specific circumstances. When the CV of the ageadjustment weights is below 0.5, the Anderson–Rosenberg and Tiwari CIs appear in simulations to be most efficient, whereas in cases where the CV is above 0.5, the Fay–Kim CI appears to be most efficient among the four gammabased CI methods. In situations where the CV or the underlying distribution of the ageadjustment weights are unknown, while all four gammabased methods studied in this paper appear to perform reasonably well, the Fay–Feuer method is recommended. For setting CIbased thresholds for statistical reliability, the properties of the interval can be considered, and thresholds for less efficient (wider) conservative intervals might be set higher than thresholds for more efficient (shorter) conservative intervals. However, it should be noted that such conservative CIs may have limited use in comparisons between two rates (e.g., by looking at whether there is overlap) because, as seen in simulations, they can be overly wide and will have low power to detect differences. Instead, differences in rates should be assessed using statistical significance testing or other suitable methods [21].
Availability of data and materials
The dataset analyzed during the current study is available from CDC WONDER, https://wonder.cdc.gov. The R syntax used is available from the authors on request.
Abbreviations
 CI:

Confidence interval
 NCHS:

National Center for Health Statistics
 NCI:

National Cancer Institute
 CDC:

Centers for Disease Control and Prevention
 ICD10:

International Classification of Diseases, Tenth Revision
References
Brillinger DR. The natural variability of vital rates and associated statistics. Biometrics. 1986;42:693–734.
Garwood F. Fiducial limits for the Poisson distribution. Biometrika. 1936;28:437–42.
Dobson AJ, Kuulasmaa K, Eberle E, Scherer J. Confidence intervals for weighted sums of Poisson parameters. Stat Med. 1991;10:457–62.
Ng HKT, Filardo G, Zheng G. Confidence interval estimating procedures for standardized incidence rates. Comput Stat Data Anal. 2008;52:3501–16.
Swift MB. A simulation study comparing methods for calculating confidence intervals for directly standardized rates. Comput Stat Data Anal. 2010;54:1103–8.
Morris JK, Tan J, Fryers P, Bestwick J. Evaluation of stability of directly standardized rates for sparse data using simulation methods. Popul Health Metr. 2018;16:19.
Fay MP, Feuer EJ. Confidence intervals for directly standardized rates: a method based on the gamma distribution. Stat Med. 1997;16:791–801.
Tiwari RC, Clegg LX, Zou Z. Efficient interval estimation for ageadjusted cancer rates. Stat Methods Med Res. 2006;15:547–69.
National Cancer Institute Surveillance Research Program. SEER*Stat software, version 8.3.9.2. 2021. https://seer.cancer.gov/seerstat. Accessed 31 Jan 2022.
Fay MP, Kim S. Confidence intervals for directly standardized rates using midp gamma intervals. Biom J. 2017;59:377–87.
Anderson RN, Rosenberg HM. Age standardization of death rates: implementation of the year 2000 standard. Natl Vital Stat Rep. 1998;47:3.
Centers for Disease Control and Prevention. U.S. Cancer Statistics Data Visualizations Tool. Technical notes 2020; submission diagnosis years 1999–2018. Atlanta, GA: U.S. Dept of Health and Human Services. https://www.cdc.gov/cancer/uscs/pdf/uscsdatavisualizationstooltechnicalnotes508.pdf. Accessed 31 Jan 2022.
Centers for Disease Control and Prevention. Underlying cause of death 1999–2019. CDC WONDER Technical Reference Material. Atlanta, GA: U.S. Dept of Health and Human Services. https://wonder.cdc.gov/wonder/help/ucd.html. Accessed 31 Jan 2022.
National Research Council Committee on National Statistics. The U.S. Vital Statistics System: The Role of State and Local Health Departments. In: Vital Statistics: Summary of a Workshop. Washington, DC: National Academies Press. 2009. https://www.ncbi.nlm.nih.gov/books/NBK219870/. Accessed 31 Jan 2022.
Parker JD, Talih M, Malec DJ, Beresovsky V, Carroll M, Gonzalez JF, Hamilton BE, Ingram DD, Kochanek K, McCarty F, Moriarty C, Shimizu I, Strashny A, Ward BW. National Center for Health Statistics data presentation standards for proportions. Vital Health Stat. 2017;2:175.
National Center for Health Statistics. Data presentation standards for rates. Presented at the meetings of the Board of Scientific Counselors. https://www.cdc.gov/nchs/about/bsc/bsc_meetings.htm. 10 Feb 2022 and 9–10 Jan 2020.
Murphy SL, Xu JQ, Kochanek KD, Arias E, TejadaVera B. Deaths: Final data for 2018. Natl Vital Stat Rep. 2020;69:13.
Centers for Medicare and Medicaid Services. CMS cell suppression policy, guidance portal. Washington, DC: U.S. Dept of Health and Human Services. 2020. https://www.hhs.gov/guidance/document/cmscellsuppressionpolicy. Accessed 31 Jan 2022.
R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2021. https://www.Rproject.org/. Accessed 31 Jan 2022.
Office of Disease Prevention and Health Promotion. Healthy People 2030 framework. Washington, DC: U.S. Dept of Health and Human Services. 2021. https://health.gov/healthypeople/about/healthypeople2030framework. Accessed 31 Jan 2022.
Schenker N, Gentleman JF. On judging the significance of differences by examining the overlap between confidence intervals. Am Stat. 2001;55(3):182–6.
Blaker H. Confidence curves and improved exact confidence intervals for discrete distributions. Can J Stat. 2000;28:783–98.
Casella G, Berger RL. Statistical Inference. Belmont: Wadsworth; 1990.
Curtin LR, Klein RJ. Direct standardization (ageadjusted death rates). Stat Notes 6. Hyattsville, MD: National Center for Health Statistics; 1995.
Acknowledgements
Members of the NCHS workgroup on data presentation standards for rates and counts provided input at various stages of the development of this manuscript. The findings and conclusions in this paper are those of the authors and do not necessarily represent the official position of NCHS or CDC. The first author completed this work under a contract with Strategic Innovative Solutions, LLC, a CDC/NCHS contract holder.
Funding
None to declare.
Author information
Authors and Affiliations
Contributions
M.T. conceptualized the study, implemented mathematical derivations and simulations, compiled the figures and table, and led the manuscript writing and revisions. R.A. had developed the Anderson–Rosenberg and helped interpret the study findings. J.P. had led the investigation of CIbased data presentation standards for rates at NCHS, which motivated this study, and proposed the use of countylevel mortality data with the four causes of deaths selected. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Technical appendix
Technical appendix
The number of deaths D reported for a given area and time period is assumed to be Poissondistributed, with mean \({\mathbb{E}}\)(D) and variance \({\mathbb{V}}\)(D) satisfying the equality \({\mathbb{E}}\)(D) = λP = \({\mathbb{V}}\)(D), where P denotes the population denominator [1]. The agespecific or crude death rate R, defined as the ratio D/P, is usually multiplied by 100,000 and reported as a rate per 100,000 population.
Poissongamma relationship
For a positive integer x ≤ P, it can be shown [2] that there exists a gamma random variable G such that \({\mathbb{E}}\)(G) = x = \({\mathbb{V}}\)(G) and
Recall that if G is gammadistributed with shape parameter α > 0 and scale parameter β > 0, then its mean and variance are \({\mathbb{E}}\)(G) = αβ and \({\mathbb{V}}\)(G) = αβ^{2}. Conversely, the parameters are given by α = \({\mathbb{E}}\)(G)^{2}/\({\mathbb{V}}\)(G) and β = \({\mathbb{V}}\)(G)/\({\mathbb{E}}\)(G). Thus, with \({\mathbb{E}}\)(G) = x = \({\mathbb{V}}\)(G) in Eq. A1, the corresponding gamma distribution has α = x and β = 1.
For the rate R = D/P, with P = p, y = x/p, and v = x/p^{2}, Eq. A1 becomes
where Z = G/P is gammadistributed with mean y and variance v.
Gamma CI for agespecific and crude rates
When D = x is observed, the ratio y = x/p is an estimate of \({\mathbb{E}}\)(R) = λ. An equaltailed 100(1 − a) percent CI [L(y), U(y)] for the parameter λ, e.g., with a = 0.05, is obtained as a solution to the following two equations:
Eqs. A3a and A3b follow from looking upon L(y) as the largest λ for which Pr(R ≥ yλ) ≤ a/2 and U(y) as the smallest λ for which Pr(R ≤ yλ) ≤ a/2; see [22] and theorem 9.2.3.a in [23].
where Z is gammadistributed with mean y and variance v, i.e., with parameters α = y^{2}/v = x and β = v/y = 1/p. Thus, the lower CI limit L(y) is obtained as the (a/2)quantile of the gamma(x, 1/p) distribution. For y = 0 = x, L(0) = 0 by convention.
Similarly, from Eqs. A2 and A3b,
where the second equality is due to x being a positive integer, so that D/p > x/p if and only if D/p ≥ (x + 1)/p, and Z′ is a gamma random variable with mean y′ = y + 1/p and variance v′ = v + 1/p^{2}. Because y′^{2}/v′ = x + 1 and v′/y′ = 1/p, the upper CI limit U(y) is obtained as the (1 − a/2)quantile of the gamma(x + 1, 1/p) distribution.
Approximate gamma CIs for ageadjusted rates
With n age groups, let D_{i} denote the number of deaths for group i. The D_{i} are assumed to be independent Poisson random variables, and the agespecific rates R_{i} are defined as the ratios D_{i}/P_{i}, with means \({\mathbb{E}}\)(R_{i}) = λ_{i} and variances \({\mathbb{V}}\)(R_{i}) = λ_{i}/p_{i}.
Let π_{i} denote the size of group i in the reference population, e.g., the projected year 2000 US population [24]. Let w_{i} denote the relative proportions for group i in the reference population: w_{i} = π_{i}/∑π_{j}. The ageadjusted death rate R′ is defined as
Given the parameters λ_{i} and denominators P_{i} = p_{i}, the ageadjusted rate R′ has mean \({\mathbb{E}}\)(R′) = λ′ = ∑w_{i} λ_{i} and variance \({\mathbb{V}}\)(R′) = ∑w_{i}^{2} λ_{i}/p_{i}.
Fay–Feuer interval. Fay and Feuer [7] assume that Eq. A2 holds approximately for the ageadjusted rate R′, so that, for y = ∑(w_{i}/p_{i}) x_{i} and v = ∑(w_{i}/p_{i})^{2} x_{i},
where Z is gammadistributed with \({\mathbb{E}}\)(Z) = y and \({\mathbb{V}}\)(Z) = v, i.e., with α = y^{2}/v and β = v/y. As for the crude rate R, an equaltailed 100(1 − a) percent CI for λ′ solves the equations:
From Eqs. A4 and A5a, the lower limit L(y) can be resolved approximately from the lower tail probability of a gamma distribution with parameters α = y^{2}/v and β = v/y, again with the convention that L(0) = 0.
For the upper bound, note that a unit increment in the observed number of deaths x_{j} within group j results in the addition of the quantity w_{j}/p_{j} to the ageadjusted rate y = ∑(w_{i}/p_{i}) x_{i}. Because such a unit increment could be realized in any of the n groups,
where κ_{0} = max{w_{j}/p_{j}}. From Eq. A4, the righthand side in this last inequality is approximately equal to Pr[Z′ ≤ U(y)y′, v′], where Z′ is gammadistributed with mean y′ = y + κ_{0} and variance v′ = v + κ_{0}^{2}. Thus, an upper CI limit U(y) can be resolved from the upper tail probability of a gamma distribution with shape parameter α = y′^{2}/v′ and scale parameter β = v′/y′. Fay and Feuer [7] make the conjecture that the approximate gamma CI thus constructed remains conservative. Although this conjecture remains unproven, findings from the many simulation studies to date continue to support it, e.g., [4,5,6].
Tiwari modification. Tiwari et al. [8] developed a modification to the Fay–Feuer method described above by distributing an average increment 1/n uniformly across all age groups instead of a unit increment in a single age group:
Thus, with κ_{1} = n^{−1} ∑w_{i}/p_{i} and κ_{2} = n^{−1} ∑(w_{i}/p_{i})^{2}, the gamma random variable Z′ above now has mean y′ = y + κ_{1} and variance v′ = v + κ_{2}. The Tiwari modification reduces the CI width relative to the Fay–Feuer method, because
However, the resulting CI sometimes fails to retain the nominal coverage level, e.g., [4].
Fay–Kim modification. Fay and Kim [10] more recently developed a midp version of the Fay–Feuer CI. A modification of exact CIs from discrete data, midp CIs tradeoff guaranteed nominal coverage in all of the parameter space (which tends to result in overly wide CIs) for proximity to nominal coverage (and narrower CIs) for most parameter values.
For the midp interval, a solution to the following equations is sought:
Drawing B = b from a Bernoulli distribution with Pr(B = 1) = 1/2, Fay and Kim [10] define the midp version of the Fay–Feuer CI using the following gamma distribution:
where y′ = y + κ_{0} and v′ = v + κ_{0}^{2} are as in the Fay–Feuer construction, above. Thus, the lower and upper limits are defined as the (a/2)th and (1−a/2)th quantiles of gamma _{midp}. The special case y = 0 is addressed using L(0) = 0 and U(0) defined as the (1−a)th quantile of the gamma(y^{2}/v, v/y) distribution. R syntax is provided to solve for L(y) and U(y) numerically [10].
Anderson–Rosenberg approximation. Anderson and Rosenberg [11] had introduced an approximation to the Fay–Feuer upper CI limit that alleviated the need to calculate κ_{0} = max{w_{j}/p_{j}}. Instead, the Poissongamma relationship in Eq. A1 is assumed to hold for an appropriately defined Poisson random variable D_{adj} corresponding to a crude rate that would have been equal to the ageadjusted rate R′, i.e., such that R′ = D_{adj}/P_{adj}. Therefore, a “standardized” gamma random variable G_{adj} is defined as Z/(v/y), where the gammadistributed Z has mean y and variance v. As a result, G_{adj} has mean and variance equal to y^{2}/v. Define x_{adj} = y^{2}/v and 1/p_{adj} = v/y. If x_{adj} was an integer, then there would exist a Poisson random variable D_{adj} with mean and variance equal to λ′ p_{adj} such that
Because y^{2}/v will generally not be integer, x_{adj} is defined as the nearest integer instead (although this is not strictly necessary), and the equality in Eq. A6 is assumed to hold approximately. Either way, one proceeds as for the crude rate to derive CI limits L(y) and U(y) for λ′ as the (a/2)quantile of the gamma(x_{adj}, 1/p_{adj}) distribution and the (1 − a/2)quantile of the gamma(x_{adj} + 1, 1/p_{adj}), respectively.
Exact intervals. When there is a constant scalar c > 0 such that p_{i} = cπ_{i} for all i, the ageadjusted rate equals the overall crude rate, and the above CIs reduce to the exact gamma CI for λ = p^{−1} ∑λ_{i} p_{i} where p = ∑p_{i} and the total number of deaths D = ∑D_{i} follows a Poisson distribution with mean λp. In particular, when y = 0, v = 0 and x_{adj} is undefined. However, because the ageadjusted rate equals the crude rate in this case, the limits of all three approximate gamma CIs for the ageadjusted rate are defined to be those of the exact gamma CI for the crude rate, with p = ∑p_{i} and x = ∑x_{i} = 0. Thus, in this extreme case, L(0) = 0 and U(0) is the (1 − a/2)quantile of the gamma(1, 1/p) distribution.
Anderson–Rosenberg CI as a modification of the Fay–Feuer CI. The Anderson–Rosenberg construction can be seen to follow that of the Fay–Feuer CI, with a gammadistributed Z′′ that has mean y′′ = y + κ and variance v′′ = v + κ^{2}, where κ = κ_{3} = 1/p_{adj} instead of κ = κ_{0} = max{w_{j}/p_{j}}. Indeed, with 1/p_{adj} = v/y and x_{adj} = y^{2}/v,
Furthermore, 1/p_{adj} can be expressed as follows:
As a result,
and the Anderson–Rosenberg method is seen to result in incrementing the agespecific death counts from x_{i} to x_{i} + ξ_{i}, whereas in the Fay–Feuer method only the count x_{i*} for the age group i* for which w_{i*}/p_{i*} = max{w_{j}/p_{j}} is incremented—and in the Tiwari modification, the agespecific counts are incremented from x_{i} to x_{i} + ζ_{i}, where ζ_{i} = 1/n. Additionally,
since ∑ξ_{i} = 1. Thus, like the Tiwari modification, the Anderson–Rosenberg construction reduces the CI width relative to the Fay–Feuer method:
Two questions emerge from the above derivations:

(1)
Under what circumstances does the Anderson–Rosenberg method result in a shorter CI than the Fay–Feuer method that retains nominal coverage?

(2)
Since both the Anderson–Rosenberg and Tiwari methods result in narrower CIs than the Fay–Feuer method, when is one preferable to the other?
To partially answer question 2, note that the Anderson–Rosenberg CI would be narrower than the Tiwari CI if (but not only if) κ_{3} ≤ κ_{1}, as that ensures
By definition, the condition κ_{3} ≤ κ_{1} is realized when
which is equivalent to
This last condition indicates that the slope of the line from the simple regression of the weightadjusted agespecific death rates w_{i} (x_{i}/p_{i}) = (w_{i}/p_{i}) x_{i} onto the weights w_{i}/p_{i} is negative or zero. This could be verified upfront for any set of ageadjustment weights (w_{1}, …, w_{n}) and population distribution (p_{1}, …, p_{n}), and it would be sufficient to ensure that the Anderson–Rosenberg CI will be narrower than the Tiwari CI. Of course, this leaves the issue of efficiency unresolved, as it would theoretically be possible for either the Anderson–Rosenberg or the Tiwari CIs to be so narrow as to fail to retain nominal coverage. The empirical simulations investigate situations where this may occur. In addition, these two CI methods are compared to the more recent Fay–Kim midp modification.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Talih, M., Anderson, R.N. & Parker, J.D. Evaluation of four gammabased methods for calculating confidence intervals for ageadjusted mortality rates when data are sparse. Popul Health Metrics 20, 13 (2022). https://doi.org/10.1186/s12963022002881
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12963022002881
Keywords
 Direct standardization
 Confidence interval width
 Coverage probability
 Statistical reliability