Diabetes prevalence and diagnosis in US states: analysis of health surveys

Background Current US surveillance data provide estimates of diabetes using laboratory tests at the national level as well as self-reported data at the state level. Self-reported diabetes prevalence may be biased because respondents may not be aware of their risk status. Our objective was to estimate the prevalence of diagnosed and undiagnosed diabetes by state. Methods We estimated undiagnosed diabetes prevalence as a function of a set of health system and sociodemographic variables using a logistic regression in the National Health and Nutrition Examination Survey (2003-2006). We applied this relationship to identical variables from the Behavioral Risk Factor Surveillance System (2003-2007) to estimate state-level prevalence of undiagnosed diabetes by age group and sex. We assumed that those who report being diagnosed with diabetes in both surveys are truly diabetic. Results The prevalence of diabetes in the U.S. was 13.7% among men and 11.7% among women ≥ 30 years. Age-standardized diabetes prevalence was highest in Mississippi, West Virginia, Louisiana, Texas, South Carolina, Alabama, and Georgia (15.8 to 16.6% for men and 12.4 to 14.8% for women). Vermont, Minnesota, Montana, and Colorado had the lowest prevalence (11.0 to 12.2% for men and 7.3 to 8.4% for women). Men in all states had higher diabetes prevalence than women. The absolute prevalence of undiagnosed diabetes, as a percent of total population, was highest in New Mexico, Texas, Florida, and California (3.5 to 3.7 percentage points) and lowest in Montana, Oklahoma, Oregon, Alaska, Vermont, Utah, Washington, and Hawaii (2.1 to 3 percentage points). Among those with no established diabetes diagnosis, being obese, being Hispanic, not having insurance and being ≥ 60 years old were significantly associated with a higher risk of having undiagnosed diabetes. Conclusion Diabetes prevalence is highest in the Southern and Appalachian states and lowest in the Midwest and the Northeast. Better diabetes diagnosis is needed in a number of states.


Background
Diabetes Mellitus is the sixth leading cause of death in the United States (U.S.), accounting for approximately 70,000 annual deaths. Age-standardized adult diabetes death rates across U.S. states ranged from approximately 2 per 10,000 people in Arizona and Florida to 4.5 to 5 in West Virginia and the District of Columbia (D.C.) [1]. There may be two reasons for this large variation: First, there may be variation in diabetes prevalence across states due to differences in risk factors for diabetes. For example, the prevalence of obesity in a number of Southern states is almost 60% higher than Colorado, where obesity is lowest [2]. Second, there may be differences across states in diagnosis and treatment of diabetes or of cardiovascular risks among diabetics. Reliable information on diagnosed and undiagnosed diabetes prevalence at the state level is important because states are important administrative units for funding and implementing programs that influence diagnosis and treatment.
Currently, the only source of information on diabetes prevalence at the state level is the Behavioral Risk Factor Surveillance System (BRFSS), a state-representative telephone survey. However, the BRFSS data are based on selfreports and do not provide estimates of undiagnosed diabetes. The National Health and Nutrition Examination Survey (NHANES) uses laboratory measurements and provides estimates of diagnosed and undiagnosed diabetes, but is representative only at the national level. In this study, we combined data from NHANES and BRFSS to estimate diabetes prevalence and diagnosis at the state level. Our results provide information for state diabetes prevention and control programs, and our methods can be used for regular low-cost monitoring of diabetes at the state level.

Data Sources
NHANES uses a complex multistage stratified clustered probability design to measure health and nutrition characteristics of a nationally representative sample of the civilian non-institutionalized population aged two months and older. NHANES includes an in-person interview and a subsequent physical examination and measurement component in a mobile examination clinic (MEC) or at home for those unable to visit the MEC. We used NHANES data from 2003 to 2006. The response rates for the household interviews were 80% for 2003-2004 and 79% for 2005-2006. The corresponding response rates for the medical examination after the household interview were 95 to 96%.
Each interviewed participant was randomly assigned to either a morning or afternoon/evening MEC session. Subjects ≥ 20 years old assigned to the morning session were asked to fast for 8 to 24 hours, with the exception of those on insulin or those who were excluded for other safety reasons. The NHANES MEC and fasting sample weights account for exclusion, non-response, and inappropriate fasting time. Additional information on NHANES design and methods, including on diabetes measurement, is available elsewhere [3,4] and online http://www.cdc.gov/ nchs/nhanes.htm.
The BRFSS is an annual cross-sectional telephone health survey. Currently, the survey is conducted in all 50 states and the District of Columbia using random-digit dialing to obtain a state-representative sample of the civilian, non-institutionalized population aged 18 and over. In 2003, the response rate among eligible subjects who answered the phone was 77%. Additional information on the design is available elsewhere [5,6] and online http:// www.cdc.gov/brfss. We included adults aged 30 and older in NHANES and BRFSS who had answered the self-reported diabetes question, which asked if they had ever been told by a health professional that they had diabetes. The response rate for this question was more than 99.8% in both surveys. We did not include younger participants because diabetes prevalence is relatively low in these ages.

Statistical Analysis
Consistent with previous analyses [4], we defined total diabetes as either having answered yes to the diabetes diagnosis question: "Other than during pregnancy, have you ever been told by a doctor or health professional that you have diabetes or sugar diabetes?" or having a fasting plasma glucose (FPG) level of ≥ 126 mg/dL. We used FPG because it is used to define diabetes by the American Diabetes Association [7].
We used data from NHANES, which is representative at the national but not at the state level, to characterize the relationship between undiagnosed diabetes status (defined as FPG ≥ 126 mg/dL) and a set of health system, sociodemographic, and risk factor variables listed in Table  1 using a logistic regression. These variables were selected a priori based on their potential association with diabetes prevalence. We excluded education from the primary list of predictors as including it did not improve the fit of the model. In addition, 50.2% of observations in NHANES were missing either smoking or insurance status or both. We used a missing indicator to include these observations in the regression model. The regression incorporated appropriate sampling weights. We estimated the individual-level probability of having diabetes in BRFSS 2003-2007 in two steps: First, participants who had answered "yes" to the diabetes diagnosis question were, by definition, assigned a probability of 1.0 for having diabetes. Second, the probability of having undiagnosed diabetes (i.e., FPG ≥ 126 mg/dL) for those who answered "no" to this question was estimated using the coefficients of the logistic regression fit on the NHANES dataset. Estimates of diabetes prevalence and diabetes diagnosis by age, sex, and state were obtained from the BRFSS using appropriate sample weights. The difference between total diabetes and self-reported diabetes is undiagnosed diabetes. In separate analyses, we used linear regressions to model the relationship between FPG as a continuous variable and self-reported diabetes diagnosis, medication use, and the health system, sociodemographic, and risk factor variables in Table 1 (results for continuous FPG analysis are available from authors by request). We used STATA version 10 for all analyses (Stata-Corp Texas). We present the results in two age groups: 30-59 and ≥ 60 years.

Results
The

Regression results
Among those who answered "no" to having been diagnosed with diabetes, being male and being older was associated with a higher probability of having diabetes ( Table  1). The effect of age on diabetes risk was largest in those 60 to 69 years old and declined slightly in those ≥ 70 years old, consistent with the available evidence on the age association of blood glucose [8]. We evaluated the performance of the prediction model using both internal and external validations. For internal validation, we applied the regression coefficients to NHANES 2003-2006 observations (i.e., the same data used in estimating the regression model) to predict diabetes prevalence. The differences between the predicted and actual diabetes prevalence for different age, sex, and race groups were on average 0.5 percentage points and at most 8.4 percentage points. The Pearson correlation coefficient for the observed and predicted diabetes prevalence for different age, sex, and race groups was 0.98. For external val-idation, we applied the coefficients of regressions estimated using the 2003-2006 rounds to the same variables in pooled data from two previous rounds of NHANES (1999-2000 and 2001-2002). The observed-predicted differences for individual age, sex, and race groups were at the extreme slightly worse than those in the internal validation; specifically, the 60-to 69-year-old males from "other race" had a 20 percentage point discrepancy. This may, however, be because the composition of this race changed between the two surveys. The Pearson correlation coefficient for the observed and predicted diabetes prevalence for different age, sex, and race groups was 0.93. On average, the predicted prevalence was 0.1 percentage points higher than the actual prevalence (versus 0.5 lower percentage points in the internal validation).

State-level prevalence of diabetes and undiagnosed diabetes
In 2003-2007, the lowest prevalence of diabetes was in the Midwest and the Northeast, including Vermont, Minnesota, Montana, and Colorado, with age-standardized prevalence ranging from 11.0% to 12.2% for men and 7.3% to 8.4% for women ( Figure 1 and Table 2). Diabetes prevalence was highest in the primarily Southern and Appalachian states, including Mississippi, West Virginia, Louisiana, Texas, South Carolina, Alabama, and Georgia, where age-standardized diabetes prevalence was 15.8% to 16.6% for men and 12.4% to 14.8% for women, i.e., approximately 30% to 51% higher for men and 48% to 103% higher for women than the states with lowest prevalence. The same geographic pattern was observed when younger (30-59 years) and older (≥ 60 years) age groups were considered separately. The Spearman rank correlation coefficient of state diabetes prevalence and mean BMI was 0.53 for men and 0.76 for women [2].
Age-standardized diabetes prevalence was higher in men than women in all states, with the largest differences in Minnesota, Colorado, Utah, and Maine, where prevalence in men was 32% to 38% higher than among women. The smallest male-female differences were in the District of Columbia, Mississippi, West Virginia, and Louisiana, ranging from 6% to 18% (Figures 1 and 2). Men also had higher prevalence of diabetes than women in almost all states and age groups, except in the youngest ages (30 to  * The standard error of prevalence reported here reflects the sampling variability in the predicted diabetes prevalence but does not incorporate uncertainty in the prediction model (parameter uncertainty and stochastic uncertainty) and thus is an underestimate for the true standard error of prevalence. We included the parameter and stochastic uncertainty of the modeling using a multiple imputation approach in which we imputed the prevalence of diabetes for individuals who did not report having diabetes 10 times, drawing from a multivariate Normal distribution of the coefficients and drawing randomly from the posterior binomial distribution of revalence. We estimated the standard error of the national prevalence of diabetes using these 10 imputed values for each sex. The standard error was 0.8% for men and 0.4% for women, which is almost 10 times larger than the standard error estimated using sampling uncertainty only.  Table 3 for prevalence of undiagnosed diabetes by age, sex, race and insurance status).
Men in all states had higher proportions of undiagnosed diabetes than women, with the male-female difference in undiagnosed proportion being largest in Hawaii, Mississippi, District of Columbia, West Virginia, and Idaho, where the proportion undiagnosed among men was 34.1% to 39.0% higher than among women. The malefemale diagnosis disparity was smallest in Colorado, Pennsylvania, Vermont, and Minnesota (12.9% to 19.8%). When stratified on race, the proportion of cases undiagnosed was highest among Hispanics (33%), followed by whites (28%) and blacks (19%), and it was lowest in the residual group of "other races" (6%). One-third of diabetes cases were undiagnosed in participants who did not have insurance compared to one-fourth among insured Americans.

Discussion
To our knowledge, this is the first study to estimate the total prevalence of diabetes and the proportion of diabetes that is undiagnosed at the state level. The Southern and Appalachian states had the highest diabetes prevalence, with Mississippi faring the worst. The Northern plains, the Northeast and the Midwest had the lowest prevalence. Prevalence of undiagnosed diabetes also varied across states, with Southern states and California having the highest prevalence. The proportion of undiagnosed diabetes was higher in men, Hispanics, and the uninsured compared to women, whites and insured. In fact, one-half of Estimated prevalence of total diabetes by state, sex, and age group   3.6% (.14) This analysis has a number of limitations: First, although our regression models included important sociodemographic, lifestyle, and health system determinants of dia-betes risk and diagnosis, there are other factors that affect diabetes, such as diet and quality of care [9,10]. For instance, we were unable to include family history of diabetes, physical activity, alcohol use and specific dietary risk factors of diabetes [11][12][13][14] in the model because BRFSS does not include a sufficiently detailed dietary questionnaire or any questions on family history of diabetes and because the questions used to measure alcohol use * The standard error of prevalence reported here reflects the sampling variability in the predicted diabetes prevalence but does not incorporate uncertainty in the prediction model (parameter uncertainty and stochastic uncertainty) and thus is an underestimate for the true standard error of prevalence. See footnote to Table 2 for an example of how the inclusion of these sources would affect standard errors. and physical activity are different from those used in NHANES. The effects of some such factors may be captured by the variables in our model (e.g., self-reported diabetes, BMI, smoking, insurance status, visit to a doctor, etc.). If the unexplained effects vary systematically across states, the model may underestimate cross-state variation in diabetes prevalence, making our results conservative. Second, we conducted our analysis using FPG because of its availability for the most recent rounds of NHANES and because it is used by the American Diabetes Association to define diabetes. Other definitions of diabetes, e.g., based on glucose tolerance test, may have led to slightly different estimates. Third, BRFSS response rate varies across states. This may affect the state comparisons if the determinants of non-response are associated with diabetes prevalence. The single best way to reduce uncertainty in our analysis would be the addition of a validation component to BRFSS, which includes measured blood glucose for a random sample of interviewees. Finally, because 50.2% of observations in NHANES were missing either smoking or insurance status, we used a missing indicator in our regression models to include these observations. Dropping these observations would decrease the precision of our regression coefficients but would not affect the predictions of diabetes prevalence by states materially.
Despite uncertainties, our results currently provide the only estimates of total diabetes and undiagnosed diabetes in U.S. states, and should provide motivation, guidance, and benchmarks for designing, implementing, and evaluating diabetes prevention and control programs at the state level. Further, our methods allow states to combine the relatively low-cost BRFSS telephone survey with NHANES to regularly monitor the prevalence of diabetes and progress in diabetes diagnosis.
Increasing the coverage of lifestyle, e.g., physical activity and pharmacological interventions for diabetes, should be a priority in states with high diabetes prevalence. Some states also need to improve diagnosis, especially among men, because early diagnosis and intensive glycemic control reduces the future incidence of microvascular complications [15,16]. Further, diabetes diagnosis will facilitate interventions that lower blood pressure and cholesterol, and hence the risk of cardiovascular disease, among diabetics [17,18]. The states with the highest estimated diabetes prevalence in our analysis also have the highest levels of blood pressure and cardiovascular disease risk [19,20]. This geographical distribution of cardiovascular risks and diabetes points to the need for lifestyle and health care interventions that reduce blood pressure and other cardiovascular risks in high-diabetes states.