Data
We analyzed data from the BRFSS surveys conducted from 1995, the first year in which all 50 states participated in the BRFSS, through 2012, the most recent year in which county identifiers were publicly available. The BRFSS included four “healthy days” questions:
-
1.
Would you say in general that your health is—excellent, very good, good, fair, or poor?
-
2.
Now thinking about your physical health, which includes physical illness and injury, for how many days during the past 30 days was your physical health not good?
-
3.
Now thinking about your mental health, which includes stress, depression, and problems with emotions, for how many days during the past 30 days was your mental health not good?
-
4.
During the past 30 days, for about how many days did poor physical or mental health keep you from doing your usual activities, such as self-care, work, or recreation?
Only the first question was asked by all states in 2002; consequently we excluded data on the remaining questions for this year only.
We created four binary variables from these questions: low general health (responding “fair” or “poor” to question 1); frequent physical distress (reporting 14 or more days in response to question 2); frequent mental distress (reporting 14 or more days in response to question 3); and frequent activity limitation (reporting 14 or more days in response to question 4). The 14 day cut-off used for frequent physical distress, mental distress, and activity limitation is in line with previous research utilizing these questions, and is intended to identify individuals who experienced significant health burden in the previous month [10–12, 17–19]. In addition, we extracted county of residence, age, gender, race/ethnicity (white non-Hispanic, black non-Hispanic, native non-Hispanic, or Hispanic), education status (less than high school, high school graduate, some college, or college graduate), marital status (never married, currently married, or formerly married), and, starting in 2011, phone type (landline only, cell phone only, or dual) from the survey. Respondents with missing values on any of these variables were excluded from the analysis. There were 5,239,833 respondents in the study period. Of these, 2.2% were missing some demographic information, 3.8% were missing one or more outcome variables, and 5.1% were missing the county variable, primarily due to CDC data suppression rules. In total, 4,698,203 (89.7%) had no missing values and were included in the analysis. The survey response rate in the BRFSS varied by year and by state; in 2012, the response rate ranged from 27.7 to 60.4% among states [23].
Small area estimation model
We used previously described and validated small area models to estimate county-level prevalence of low general health, frequent physical distress, frequent mental distress, and frequent activity limitation [24]. These models are designed to “borrow strength” across time, space, and from external data sources (i.e., covariates) in order to increase the effective amount of information available for each county. Briefly, these models were specified as:
$$ \begin{array}{c}\hfill {Y}_{j, t, a, r, m, e}\sim \mathrm{Binomial}\left({p}_{j, t, a, r, m, e},\ {N}_{j, t, a, r, m, e}\right)\hfill \\ {}\hfill \mathrm{logit}\left({p}_{j, t, a, r, m, e}\right) = {\beta}_0+{\beta}_{1, a}+{\beta}_{2, r}+{\beta}_{3, m} + {\beta}_{4, e}+{\boldsymbol{\beta}}_{\mathbf{5}}\cdot {\boldsymbol{X}}_{\boldsymbol{j},\boldsymbol{t}}+{u}_j+{w}_t+{d}_{j, t}\hfill \end{array} $$
where N
j,t,a,r,m,e
, Y
j,t,a,r,m,e
, and p
j,t,a,r,m,e
are the total number of respondents; the number of respondents with low general health, frequent physical distress, frequent mental distress, or frequent activity limitation, depending on the model; and the true prevalence, respectively, in county j, year t, age group a, race/ethnicity group r, marital status group m, and education group e. The β terms are fixed effects: β
0 is the intercept; β
1,a
are age group effects and are included to account for differences in self-reported health among age groups; β
2,r
, β
3,m
, and β
4,e
are race/ethnicity, marital status, and education effects, respectively, and are included to account for differences in self-reported health among each of these groups; β
5
is a vector of coefficients on three county-level covariates that are expected to be predictive of poor self-reported health (percent of the population living in poverty, the unemployment rate, and the percent of households which are rural). The remaining terms are random effects. u
j
and w
t
are county- and year-level random effects, respectively, each of which is assumed to follow a conditional autoregressive distribution that allows for spatial (u
j
) and temporal (w
t
) smoothing (specifically, the distribution described by Leroux et al. [25]). d
j,t
is a county-year-level random effect with a non-separable “Type IV” interaction between space and time as described by Knorr-Held [26], but using the conditional autoregressive distribution described by Leroux et al. [25] for both the spatial and temporal dimensions. Gamma(1, 1000) priors were assigned for the precision parameters of each random effect. Normal(0, 1.5) priors were assigned for the logit-transformed autocorrelation parameter of each random effect.
Models were fit using the TMB package [27] in R version 3.2.4 [28] and 1000 draws of p
j,t,a,r,m,e
were simulated from the posterior distribution. These draws were post-stratified by race, marital status, and education using population counts from the census and American Community Survey to ensure that prevalence estimates represent the demographic composition of a county even where response rates vary among different demographic groups. Draws were then age-standardized using the 2010 census population as the standard. Point estimates were calculated from the mean of the 1000 draws and 95% uncertainty intervals (UIs) were calculated from the 2.5th and 97.5th percentiles. State- and national-level estimates were generated by population-weighting the county-level estimates.
Separate models were fit for males and females for each of the four measures, for eight total models. Prior to 2011, the BRFSS sample did not include cell phones, raising the possibility of non-coverage bias; the correction method described by Dwyer-Lindgren et al. was applied to address this issue [29].
Comparison to risk factors, chronic conditions, and life expectancy
After modeling county-level prevalence of low general health, frequent physical distress, frequent mental distress, and frequent activity limitation, we compared these measures to existing estimates of county-level prevalence of behavioral and metabolic risk factors (smoking, obesity, and physical inactivity), and chronic conditions (hypertension and diabetes), also derived from BRFSS data [24, 29–31]. For each of these variables, we calculated the Pearson correlation coefficient with each of the four measures of poor self-reported health in the most recent year of data available (ranging from 2009 for hypertension to 2012 for diabetes).
We also compared the prevalence of low general health, frequent physical distress, frequent mental distress, and frequent activity limitation with life expectancy in 2012 (Laura Dwyer-Lindgren, Amelia Bertozzi-Villa, Rebecca W Stubbs, Chloe Morozoff, Johan P Mackenbach, Frank J van Lenthe, Ali H Mokdad, and Christopher JL Murray: Inequalities in life expectancy among US counties, 1980 to 2014: Temporal trends and key drivers., forthcoming). We used loess regression—a non-parametric smoothing technique [32]—to characterize the relationship between each of these four variables and life expectancy. We also examined the correlation between change in prevalence of low general health, frequent physical distress, frequent mental distress, and frequent activity limitation, and change in life expectancy between 1995 and 2012.