An evaluation of the accuracy of small-area demographic estimates of population at risk and its effect on prevalence statistics

Demographic estimates of population at risk often underpin epidemiologic research and public health surveillance efforts. In spite of their central importance to epidemiology and public-health practice, little previous attention has been paid to evaluating the magnitude of errors associated with such estimates or the sensitivity of epidemiologic statistics to these effects. In spite of the well-known observation that accuracy in demographic estimates declines as the size of the population to be estimated decreases, demographers continue to face pressure to produce estimates for increasingly fine-grained population characteristics at ever-smaller geographic scales. Unfortunately, little guidance on the magnitude of errors that can be expected in such estimates is currently available in the literature and available for consideration in small-area epidemiology. This paper attempts to fill this current gap by producing a Vintage 2010 set of single-year-of-age estimates for census tracts, then evaluating their accuracy and precision in light of the results of the 2010 Census. These estimates are produced and evaluated for 499 census tracts in New Mexico for single-years of age from 0 to 21 and for each sex individually. The error distributions associated with these estimates are characterized statistically using non-parametric statistics including the median and 2.5th and 97.5th percentiles. The impact of these errors are considered through simulations in which observed and estimated 2010 population counts are used as alternative denominators and simulated event counts are used to compute a realistic range fo prevalence values. The implications of the results of this study for small-area epidemiologic research in cancer and environmental health are considered.


Introduction
In recent years, a growing demand for small-area demographic estimates has been observed. Much of this demand comes from epidemiologists, who utilize these estimates for small-area surveillance efforts in the areas of cancer and environmental epidemiology in particular [1][2][3][4][5]. The potential of small-area epidemiology has generated considerable excitement [1][2][3][4][5]; however, it has also created important challenges for the demographers who produce small-area estimates of population at risk as well as the epidemiologists who use them. At a fundamental level, it is well known that as the size of the population to be estimated decreases, errors in demographic estimates increase [6][7][8][9][10][11]. These errors can be surprisingly large [6][7][8][9][10][11], but at present their impact on small-area epidemiologic measures has been incompletely described, and the implication of these errors for small-area health tracking and analytic epidemiology has not received an adequate amount of attention [12][13][14][15][16][17]. This paper attempts to fill this gap by characterizing the errors associated with a set of single-year-of-age estimates made at the level of United States census tracts and analyzing the potential sensitivity of small-area crude prevalence measures to these errors.
This example is extreme in both its spatial scale (census tracts represent very small areas, often a single neighborhood) [18] as well as in the fine-grained age intervals to be estimated. Errors in census tract-level estimates in fiveyear age groupings reported in previous studies have ranged between as small as 10% [19] and as high as 80% or more [9]. It is known that single-year-of-age estimates can be relatively more volatile than those constructed in five-year age intervals [11,20]. A number of methods exist for making single-year-of-age estimates. Assuming monotonicity within five-year age intervals [21,22] and the stability of demographic processes over these short time intervals [18], demographers have historically made use of methods that break out five-year interval estimates into single years of age through pro-rating, osculatory interpolation, or the closely related procedure known as "spline-fitting" [11,[20][21][22][23][24][25][26][27][28][29][30]. Pro-rating involves the allocation of the five-year data based on either historical or assumed proportions; for example, one might divide five-year estimates into single years based on the known distribution of the last census or based on an assumption of rectangularity (equal proportions of one-fifth) [11]. Osculatory interpolation, in contrast, relies upon a theory in mathematics that revolves around the unique solution of simultaneous equations using linear systems designed to minimize discrepancies between observed five-year data and the re-aggregation of single-year-of-age estimates into corresponding intervals [11,20,[22][23][24][25][26][27][28][29][30]. Spline-fitting, similar to osculatory interpolation, involves the overlapping of multiple polynomials to arrive at estimates of distributions through an optimization component based on the least-squares criteria [31]. The first two procedures have been the most widely applied within applied demography; a rather long historical discussion of spline-fitting has not resulted in its general implementation by demographers working in non-academic settings (such as state government) where functionally utilized population estimates are typically made.
The purpose of this paper is not to contrast the accuracy of these methods; rather, we seek to implement commonly utilized methods to characterize the magnitude of errors associated with a typical set of estimates of population at risk likely to be utilized by small-area epidemiologists in practice. The focus, therefore, will be upon describing the range of errors that one might expect to see in such a set and analyzing how these errors might impact a set of crude-prevalence estimates made at a correspondingly fine-grained spatial scale (census tracts). To accomplish this purpose, data from the 2010 US Census are extracted (Summary file 1) for all census tracts (n = 499) within the state of New Mexico extracted from the American Factfinder website- [32]. The data extracted include a gold-standard set of single-year-of-age counts and the corresponding five-year grouped data for each census tract. Our evaluation is straightforward: we compare single-year-of-age estimates made using methods of prorating and osculatory interpolation of five-year grouped data to observed single-year-of-age 2010 Census counts and characterize the moments of the resulting ex-post facto error distributions using established methods within demography [6,8,10]. Next, we simulate a range of plausible event prevalences using published estimates of childhood obesity rates and use them to analyze the effects of observed errors in demographic estimates on estimates of prevalence per 1,000 person-years. The results are considered in light of practice in small-area epidemiologic surveillance and suggestions for further research and evaluation are made.

Materials and methods
Input data and study area New Mexico represents a diverse study area where tractlevel variation in population characteristics can vary dramatically in concordance with larger geographic trends at the county level. The state is characterized by highly urbanized and rapidly growing metropolitan areas such as the cities of Las Cruces, Rio Rancho, and Albuquerque, dynamic and steady-growing small towns such as Roswell, Alamogordo, Clovis, and Farmington (just to mention four), vast sections of rural areas and the presence of 22 tribal groups with long-standing historical presence in the state, numerous Colonias [3], and by an overlapping mosaic of historical Land Grant Communities linked to the Spanish Colonial Era and the period of Mexican Independence prior to New Mexico becoming a US territory in 1850 at the conclusion of the Mexican-American War. To review, New Mexico represents a microcosm of the demography of many communities throughout the United States as well as important and distinctive populations. Each of these dynamics will be represented at the Census tract level, providing substantial heterogeneity and material for analysis in the current context. Counts of age/sex-specific population in five-year intervals (0 to 4, 5 to 9, 10 to 14, 15 to 19, 20 to 24) and in single years (0 to 21) were extracted from the SF1 file from the 2010 Census. Data were extracted at the census tract level (n = 499) for the entire state of New Mexico. Data were not considered for specific race/ethnicity group, with the data focused only on "all race" counts.

Pro-rating and interpolation in demography
In demography, the term "pro-rating" refers to the allocation of grouped data into more fine-grained categories, such as decomposing five-year age-grouped data into single years as in the current analysis [11,20]. In this study, pro-rating serves as a baseline activity-simpler than the methods of polynomial interpolation described below but also dependent upon specific assumptions with little appealing mathematical theory underlying them [23,24]. Here, rectangular pro-rating is utilized in which the assumption is made that single-year age groups within any five-year age interval are equivalent: each singleyear comprises one-fifth of the five-year age-grouped data [11]. As pointed out by Brass [23] and others [11,20] this method assumes that population processes-such as birth, death, and migration functions-are similar from year to year within the five-year age interval in question [21,22], i.e., that the single year data are monotonic in relation to the five-year grouped data they produce [21,22]. This simplifying assumption is unlikely to be true, and rectangular pro-rating is generally considered as a strategy to be implemented when no ancillary information on population dynamics is available at an appropriate geographic level [11,20].
The use of polynomial functions to describe relationships between time-ordered inputs and function-generated outputs has a long history within mathematics [33,34]. Their use in generating intermediate and unknown values within a dataset by interpolating between known values has an equally long history in applied fields such as climatology, economics, and demography [11,[20][21][22][23][24][25][26][27][28][29][30]. Though polynomial interpolation approaches have been criticized in demography as being blind to population theory [20,23,24], in practice interpolation is easy to implement as many standardized formulas have been presented that involve only "plugging-in" of demographic data grouped in five-year intervals into predefined formulae to arrive at single-year-of-age estimates [11]. Figure 1 illustrates the relationship between single-year age structure and a polynomial function used to decompose five-year grouped data.
As in Figure 1, an nth degree polynomial of the form: may be fit to any curve for which some data points are known with certainty to arrive at estimates of intermediate values. We may think of the interpolating polynomial as a system of equations, represented in terms of the well-known Vandermonde matrix (representing known values of demographic data), premultiplied against a vector of coefficients A to An, to yield interpolated values Yi as in the linear equation. Once solved, the function defined to estimate the yi is known as the interpolant [18,19,21]. It is known that higher-degree interpolating polynomials may often provide poorer fit of intermediate points, suggesting that simpler polynomial interpolants utilized by demographers may, in fact, provide more accurate estimates of single years of age [11,34,35]. Exact solutions to such approximating polynomials are difficult to implement using demographic data in fiveyear age groups [18,19,21]; however, their approximation through differencing formulas-those that minimize differences between five-year grouped counts and estimated values thereof using a polynomial function are well known and highly accurate in implementation [3]. An example is the Lagrange formula (from reference [11], page 683): which fits a polynomial of the form presented and passing through the two points a and b (which in this case are five-year grouped age counts) by minimizing differences between estimated values from the polynomial functions and these observed counts by shifting the values of the constants A, B, C, D, etc. [11,34]. In practice, the fitting of points f (x) are accomplished by inputting values of f (a) and f (b) into established formulas. This example of a Lagrangian polynomial passing through two points may be generalized to as many points as desired, and various methods of interpolation in demography rely upon differing numbers of points to achieve the desired fit. Osculatory interpolation is similar to the method of spline-fitting, also utilized in demography [21,22]; here we choose to focus on several methods of osculatory interpolation as better representing methods that are more typically used in practice among applied demographers. This choice does not reflect methodological preference, but better suits the purpose of this paper, which is to characterize the magnitude of errors that practicing epidemiologists and demographers might expect to see in small-area, fine-grained (with respect to age) estimates of population at risk and their impacts on measures of epidemiologic risk.
In this paper, we utilize several commonly implemented osculatory interpolation procedures including: the Karup-King [25,30], Beers 1 [11], Beers 2 [26], and Sprague methods [29]. These methods differ in the number of points taken in the interpolation, with the Karup-King taking two differences, the Beers 1 and 2 focusing on four and six differences, and the Sprague method relying upon five. In general, previous studies in other fields [34,35] have suggested that the use of fewer points might enhance local accuracy in the interpolation [11,[23][24][25][26][27][28][29][30], leading to a general hypothesis that the Karup-King may tend to out-perform alternatives.

Statistical comparisons of error and model evaluation criteria
Percentage discrepancies between the single-year-of-age estimates and corresponding 2010 Census counts form the basis of the evaluation reported in this paper, in accordance with the ex-post-facto evaluation method typically utilized by demographers [6][7][8][9][10][11]. Because demographic error distributions are calculated across geographic levels with widely differing population sizes, the use of percentage error is often encouraged [8,36] and is therefore employed here. Demographic estimate error distributions are characterized by nonnormality and a frequent lack of symmetry [8,35], making it difficult to make statements about the range of variation in estimation accuracy or to determine what is or is not an extreme error value [8,10]. In this study, all statistical error distributions were found to deviate from normality using the Kolmogorov-Smirnov test at the alpha = 0.05 level. A simple non-parametric solution is to make use of the median as a summary measure of error and to utilize the percentile distribution between the 2.5th percentile and 97.5th percentile [37,38] to characterize precision; this is the strategy employed in this paper. These summary measures are computed for each age/sex group, as well as across the entire range of ages within each sex. While this approach makes sense in light of the nature of the statistical error distributions employed in demography, there is a lack of consensus in the literature about what constitutes a "better" estimate among available alternatives [8,10]. The perspective taken in this paper is to evaluate how much better one might do by employing a polynomial interpolation method than they would do by using a naive model based on simple rectangular pro-rating (assuming that one-fifth of the fiveyear age/sex count is within each single-year-of-age interval). This is the approach taken by Harper, Coleman, and Devine [39] as well as by Swanson and Tayman [40] in their "proportionate reduction in error" statistic. Because this paper relies upon summary statistics based on percentages, models are evaluated in terms of: (1) the improvement in percentage point error observed in each age/sex interval and (2) by the percentage point range between the 2.5th and 97.5th percentiles of the error distributions. The "best" fitting model, then, is determined to be the model that results in the greatest improvement in percentage accuracy over rectangular pro-rating and the lowest range of values between the 2.5th and 97.5th percentiles of the error distribution.
Previous studies [9,19] of errors associated with demographic methods at the census tract level have indicated that over 10-year periods starting at the previous census, a substantial amount of error may accumulate [8,41]. Errors in these studies have ranged between as low as 10% and as high as 80% within any age/sex five-year age grouping. For single-year-of-age estimates, it could be anticipated that errors could be larger than this, but isolating how much of this error would be due to the practices of pro-rating or polynomial interpolation would be difficult since errors in the five-year age/sex-grouped estimates would also affect the single-year-of-age estimates. To avoid this challenge, in this study we utilize polynomial interpolation and pro-rating methods on known 2010 Census fiveyear counts. This practice isolates the error associated with the method by eliminating the conflation associated with using uncertain five-year age/sex-specific estimates. The errors and error distributions reported in this study are due solely to those associated with the methods of pro-rating and polynomial interpolation that are the focus of the paper.

The effect of errors in small-area demographic estimates on epidemiologic statistics
Small-area epidemiology faces significant challenges in the geographic positioning of event data, through the process of geocoding [42][43][44][45][46][47], necessary for calculating epidemiologic statistics such as incidence, prevalence, etc. These issues should also be anticipated to be important in making inferences associated with analytic epidemiology [1][2][3][4][5][12][13][14][15][16][17][18][48][49][50], but they are beyond the scope of the current paper, which will examine only the effects of small-area demographic estimation error on surveillance statistics. To assess the impacts of errors in demographic estimates, the paper used a simple simulation-based approach to analyze the sensitivity of small-area crude prevalence estimates within each single-year-of-age grouping. The "best-performing" set of demographic estimates for each sex is utilized as a denominator in calculating risk measures. Event counts were simulated using childhood obesity (a common event whose prevalence has been estimated to be as high as 1/5 or 200/1000 persons) as an example. The distribution of prevalences was estimated using a Monte-Carlo simulation [51,52] in the R statistical package that assumed: (1) normality and symmetry of the prevalence distribution, (2) an average prevalence of 17.5%, and (3) a standard deviation of 2.5%. This distribution was resampled 10,000 times, with a burn-in period of 500 iterations and thinning to include only every 100th observation to avoid commonly known challenges related to autocorrelation of randomly generated number algorithms [51,52]. The resulting distribution of prevalence was used to estimate the 2.5th and 97.5th percentiles for use in the simulation. These points were then used to simulate case events for each census tract/sex/age grouping. Median differences between crude prevalence estimates of risk per 1,000 person-years calculated using 2010 Census counts and the demographic estimates of population at risk as alternatives were computed. Variability in terms of the errors associated with risk per 1,000 person-years were then assessed using the 2.5th and 97.5th percentiles in light of observed non-normality and asymmetry in the distributions of these differences.

Results
Errors in single-year-of-age small area estimates of population at risk For males, the simplest interpolation method-the Karup-King procedure-produced the smallest errors for the most age groups. For nine out of 21 age intervals, this method was found to be the most accurate available method ( Table 1). Use of this method would reduce error in comparison to the rectangular pro-rating method by as much as 46.30 percentage points (age 16) or as little as only 0.18 percentage points (age 6). On average, use of the Karup-King method would improve estimation accuracy by 7.82 percentage points over the rectangular pro-rating method. It is worth noting that the performance of the Karup-King method is similar to that of either the Beers 1 or Beers 2 methods, meaning that the sensitivity of epidemiologic statistics to a typical set of demographic estimates may be similar even when different methods are utilizedespecially when we consider that each of these methods differs widely in the number of data points used in the interpolation. In contrast, the Sprague method provided much less reduction in error on average when compared to rectangular pro-rating (4.49 percentage points) and in a number of cases (ages 0, 4, 6, 9, 11, 14, and 21) actually provided a less accurate estimate than that observed when using simple rectangular pro-rating. Some of the increases in error are substantial when using the Sprague method, suggesting that the increased degree of this polynomial may be associated with poorer fitting of intermediate values as noted in studies in other fields [34,35]. These results appear to hold for males in terms of the precision of the methods. While all methods showed a wider range of error percentile distributions than is desirable (frequently difference between the 2.5th and 97.5th percentiles exceeded100 percent), the Karup-King likewise was consistently the smallest (median = 131.59 percentage points, lowest = 77.74, highest = 164.44). In 18 out of 21 cases, the Karup-King estimates were more accurate than using simple rectangular pro-rating.
Among females (Table 2), much less clear differences were observed in estimation accuracy across the available methods. All of the interpolation-based methods outperformed rectangular pro-rating in most cases: 16/22 with Karup-King, Beers 1 and Beers 2, and 13/22 for the Sprague method. The average reduction in error across the ages was greatest for the Beers 2 procedure, which reduced errors by over 4% on average; however, the reductions in error were within 1 to 1.5 percentage points across all of the alternatives. It is noteworthy, however that the specific ages in which each method performed best and the magnitude of reductions at each age across the methods varied. The only estimates that appeared to significantly increase bias were those made with the Sprague interpolants (as was observed in males), which increased errors by 65 percentage points among 9-year-old females and by 39 points among 14-year-olds. Overall, the range of errors associated with each procedure were extremely similar, though the Beers 2 procedure again outperformed very marginally. For the Beers 2 procedure, the difference between the 2.5th and 97.5th percentiles ranged from a low of 99.84 percentage points (13-yearolds) to a high of 239.22 percentage points (16-year-olds).
A striking feature of the results is that demographic estimates of single-year-of-age population at risk at the Census tract level appear to be similar across the different methods utilized and to contain a surprising level of inaccuracy and a very large range of values across the set. We defined the "best" set as the alternative with the greatest reduction in error over simple rectangular prorating and the least observed spread between the 2.5th and 97.5th percentiles of the error distribution. The best-fitting set of estimates were utilized in analyzing the sensitivity of small-area crude prevalence measures to errors in these estimates. The best-fitting set for males was the Karup-King (two differences), while the Beers 2 (six differences) was utilized for females.

Impact of errors on crude prevalence estimates
The effect of demographic estimation errors (Table 3) are relatively small at the lower end of the prevalence spectrum (2.5th percentile), on average never accounting for more than a difference of a few people in a crude prevalence estimate indexed at 1,000 person-years. Though the differences vary between the sexes in terms of the specific ages in which the larger errors are observed, similar differences in general were observed for male and female estimates. For males, median differences ranged from a high of −10 persons per 1,000 person-years to a low of effectively 0. Similarly, among females the highest observed median error was 14 persons and the lowest also effectively 0. The observed error distributions in both sets were asymmetrical, with a very large amount of variability observed in terms of the range of effects of observed. This is due both to high variability and the presence of notable outliers in both sets. Among male single-year-of-age estimates, the difference between the 2.5th and 97.5th percentiles ranged from a low of 99 persons per 1,000 person-years to a high of 210 persons per 1,000 personyears. Among females, even greater large-scale variability was observed with differences between the 2.5th and 97.5th percentiles ranging from a low of 145 to a high of 334.
At higher levels of simulated prevalence (97.5th percentile), both the median differences and the range of values between the 2.5th and 97.5th percentiles were both observably larger (Table 4). While this may be accounted for by the differences in the frequency of events (the 2.5th percentile of the simulated prevalence distribution is 12.57% and the 97.5th is 22.47%-amounting to nearly a 10 person difference per 1,000 person-years), the observations are striking. Errors range among males from a low of effectively zero to a high of −18 persons per 1,000 person years. Similarly, among females errors range from between a low of effectively 0 to a high of 25 persons per 1,000 person-years. In both cases, the range of differences per 1,000 person-years is nearly double that observed among the lower prevalence-based estimates. Among males, the errors range between a low of 178 persons per 1,000 person years to a high of 378 persons per 1,000 person years. Among females the differences between the 2.5th and 97.5th percentiles are even larger, ranging between a low of 215 persons per 1,000 person years to a high of 601 persons per 1,000 person-years.

Discussion
To our knowledge, this paper represents the first published documentation of the magnitude or distribution of anticipated errors in small-area demographic estimates by single years of age or their effects upon epidemiologic statistics. The observed magnitude of errors is large; in fact, in most cases the differences are large enough that it would be difficult to rule out average differences in risk between groups since their distributions are so likely to overlap. The range of observed errors are clearly problematic for making public health decisions. While it is obvious that scaling risk to 1,000 person years would garner substantial attention, even rescaling these statistics to 100 person years (arguably more appropriate for small-area work) does not solve this issue. For example, even if rescaled to 100 person-years a difference as large as 92/1,000 personyears would suggest a difference in risk of 9.2 persons/100 person-years. This is almost certain to trigger action by public health officials. In this respect, these results are unsettling because they suggest that errors in demographic estimates are likely to frequently have important impacts on how we utilize epidemiologic statistics for small areas.
In this study, we simulated prevalence for a common condition (childhood obesity), but even after capturing a reasonable range of variation in event occurrence, the impact of demographic estimation errors was large enough to be of considerable concern. It may be of some comfort to imagine that in the case of rarer events (such as childhood cancer, estimated to impact perhaps 1/10,000 children), the accuracy of demographic estimates should have little impact on public health decision-making. In this circumstance, even a single case of cancer should be noteworthy and a clustering of events should be identifiable indifferent of estimates of population at risk. The results presented here, however, should caution epidemiologists and public health officials of the potential uncertainties introduced by the use of demographic estimates for population at risk, though it is worth noting that using the previous decennial census counts has been shown to introduce even greater error than using postcensal estimates [8][9][10][11][12].
This study has assumed that epidemiologic events are captured completely. In reality, estimates of census tract-level events depend upon the process of geocoding, by which events are placed on electronic maps and then re-aggregated to summarize them at the tract level [41,[53][54][55]. Previous studies have suggested that geocoding rates can vary from lows of 40% or less to highs approaching 90 to 95% [56][57][58]. These results vary across rural/urban strata and it is known that incomplete geocoding is systematic, spatially-dependent, and can bias estimates of important demographic characteristics such as race and ethnicity [42][43][44]58]. Haining [45] has pointed out that such incomplete geocoding is unignorable in the statistical sense [46] and a large number of studies have attempted to fill in spatially-dependent gaps in coverage through a variety of methods [6,47]. At least one study [7] has attempted to quantify the magnitude of errors introduced into small-area population estimates by incomplete geocoding. These authors suggested average errors attributable to geocoding to be approximately 9.0%, but also observed that approximately 10% of errors in total population estimates exceeded 20% and a surprising amount (nearly 4%) actually exceeded 50% error. To date, no study has estimated the impact of incomplete geocoding on estimates of age/sex structure or those with single-years-of-age, but we can expect that when postcensal estimates are used during periods between censuses rather important errors can be anticipated in both numerator (geocoding of events) and denominator (based on geocoded demographic indicators). Temporal drift in demographic estimation accuracy should also be considered. It is largely unknown how the accuracy of demographic estimates may drift over time between censuses [8], but it is clear that it does decay over time as the time period estimating gets further away from the previous census [8]. In this study, single-year-of-age estimates were made by breaking out actual 2010 census counts in five-year age/sex-specific intervals. While it is debatable that census counts represent any sort of "gold standard" [36,39,40] it also highly plausible that they are closer to reality than any demographic or survey-based estimate can ever be at the point in time in which the enumeration takes place. In practice, demographic estimates of population at risk for five-year age/sex intervals will display their own errors, which will in turn propagate into those made for single-years-of-age. It is beyond the scope of this study to examine this drift and, in fact, any study aiming to do so is faced with the challenge that no estimates even approaching a gold standard exist for years between censuses. Ex-post-facto evaluations [8,9,41] suggest that errors in five-year age categories can be as high as 80% at the census tract level and it is unknown if these errors may offset when applied to single-year-of-age categories. For epidemiologists seeking to use demographic estimates of population at risk, postcensal drift in accuracy is a real, if immeasurable, possibility.
In spite of the potential limitations highlighted in this study, it is worth considering that alternatives may do no better and may actually be worse than using demographic estimates to capture population at risk. Previous studies have indicated that using the previous census values, for example, can produce errors that are even larger in magnitude than those observed in demographic estimates [7,8]. Not updating estimates of population at risk from the previous census is generally not advisable either and introduces an additional liability associated with not capturing changes that are important to understanding the population dynamics that ultimately produce epidemiologic risk.
In terms of single-year-of-age estimates of population at risk (such as for a typical census tract of about 1,500 persons), it is likely true the number of persons within a specific age/sex interval will be small enough that even the errors observed here will have little effect on estimates of prevalence. On balance, we would argue that updating is preferred over use of the previous census. Furthermore, previous studies indicate that simple trend extrapolations (in which historical trends are carried forward) are similarly inaccurate to those produced using other methods [7][8][9], again recommending the use of demographic estimates for population at risk in epidemiologic statistics.
It is likely that readers of this paper will be surprised by the magnitude of error and its variability observed in this research. It is clear that errors in demographic estimates may introduce important limitations in small-area epidemiologic statistics, and this challenge has not received enough consideration in the literature. This paper should serve to spur interest in further evaluative studies as well as introducing motivation for applied demographers to resume exploration of novel methods in smallarea demographic estimation in search of more accurate alternatives [7][8][9]59]. Both descriptive and analytic epidemiology depend upon not only accurate estimates of risk but also accounting for potential bias or uncertainty in these estimates [49,50]. From this perspective, this paper suggests that a much more detailed consideration of how error is propagated into small-area epidemiologic statistics is in order. Such an analysis must include an assessment of errors, uncertainties, and bias in both geocoding (numerator) and demographic estimates (denominator) and this paper suggests some potentially useful ways to approach this challenge.