A procedure to correct proxy-reported weight in the National Health Interview Survey, 1976–2002
© Reither and Utz. 2009
Received: 18 June 2008
Accepted: 06 January 2009
Published: 06 January 2009
Skip to main content
© Reither and Utz. 2009
Received: 18 June 2008
Accepted: 06 January 2009
Published: 06 January 2009
Data from the National Health Interview Survey (NHIS) show a larger-than-expected increase in mean BMI between 1996 and 1997. Proxy-reports of height and weight were discontinued as part of the 1997 NHIS redesign, suggesting that the sharp increase between 1996 and 1997 may be artifactual.
We merged NHIS data from 1976–2002 into a single database consisting of approximately 1.7 million adults aged 18 and over. The analysis consisted of two parts: First, we estimated the magnitude of BMI differences by reporting status (i.e., self-reported versus proxy-reported height and weight). Second, we developed a procedure to correct biases in BMI introduced by reporting status.
Our analyses confirmed that proxy-reports of weight tended to be biased downward, with the degree of bias varying by race, sex, and other characteristics. We developed a correction procedure to minimize BMI underestimation associated with proxy-reporting, substantially reducing the larger-than-expected increase found in NHIS data between 1996 and 1997.
It is imperative that researchers who use reported estimates of height and weight think carefully about flaws in their data and how existing correction procedures might fail to account for them. The development of this particular correction procedure represents an important step toward improving the quality of BMI estimates in a widely used source of epidemiologic data.
In this study, we intend to show that the unusually large increase in mean BMI between 1996 and 1997 is primarily attributable to methodological changes in the NHIS. In 1997, the NHIS discontinued the practice of allowing proxy-reporting for adults [1, 2], a practice where one adult could provide survey responses for other adults in the same household. Prior to the 1997 redesign, demographic and health information for adults in each household were collected through self-response interviews, as well as proxy-responses. The NHIS permitted two types of proxy-reporting for adults; (1) complete proxy-reported data, and (2) partial self-reports, which relied on a mixture of self-reports for some questions but proxy-reports for others.
Previous research has shown that proxy-reported height is a good indicator of self-reported height, but that proxy-reported weight tends to underestimate self-reported weight . Thus, it seems probable that the elimination of proxy-reported height and weight in the 1997 NHIS caused mean BMI to increase suddenly in that year. While revised data collection procedures in the 1997 NHIS likely improved overall data quality, they also may have inadvertently contributed to a misleading impression about the pace of BMI increase in the U.S. population.
The objectives of this analysis were to explore the effect of proxy-reporting on population estimates of BMI and to develop a statistical correction which reduces the downward biases associated with proxy-reporting. Such a correction is imperative if researchers are to use NHIS data to monitor long-term changes in mean BMI and the prevalence of obesity in the U.S. population. Additionally, our study serves as a reminder about how data may be biased by routine data collection procedures. It provides an overview of potential reporting biases and offers statistical tools that may be employed to minimize such biases. These analyses are important for anyone using proxy-reported data and especially those interested in the validity of BMI measurements.
The NHIS is a repeated cross-sectional household survey of the noninstitutionalized civilian population in the U.S.  Its primary functions are to monitor the prevalence and distribution of disease and disability in the U.S. and assess patterns of health care utilization. Every week, interviewers from the U.S. Census Bureau conduct face-to-face interviews to gather information from "responsible family members" residing in randomly chosen households across the nation . Households and the individuals within households are selected via a complex, multistage sampling design that involves both clustering and stratification. On average, Census personnel complete interviews at about 94% of the households selected.
This study merged NHIS data from 1976–2002 into a single database consisting of approximately 1.7 million adults aged 18 and over. Although the NHIS includes data on the health of children and adolescents, height and weight data from persons under 18 are not available. Thus, children were excluded from our sample. Although the NHIS began in 1957, it did not begin collecting data on weight and height until 1976. This timing is fortunate since available estimates suggest that the onset of the obesity epidemic occurred sometime in the 1980s .
The overarching motivation for the 1997 NHIS redesign was to streamline the questionnaire, improve its contents, and reduce the amount of time necessary to complete interviews, which had increased to an average of two hours by the mid-1990s . Although sampling and interviewing procedures remained broadly intact, in 1997 the NHIS began to record survey responses with computer-assisted personal interviewing (CAPI) software on laptop computers, rather than the traditional paper and pencil method that had been used previously. Changes associated with the 1997 NHIS redesign have influenced the estimates for some conditions, such as asthma prevalence . The 1997 redesign also affected estimates of BMI through, as we will show, the elimination of proxy-reporting.
Body mass is measured with body mass index (BMI), calculated as weight(kg)/height(m)2. BMI is a widely used indicator of body mass because it controls for differences in weight due to height variations and has proven to be valid in population research [7, 8]. Between 1976 and 1996, 5.5 percent of NHIS respondents had missing data on BMI; these cases were excluded from this analysis. (Note that some of our analyses extend only to 1996, as proxy-reporting was discontinued in 1997).
Reporting status was divided into three categories: Self-report designates persons who answered the entire survey for themselves. Proxy-report designates persons whose data were reported by another adult member of the household. Between 1976 and 1996, 477,703 individuals (about 31% of the NHIS sample) fit this description. Partial self-report designates persons whose data were a combination of self- and proxy-reports. This means that either the participant or another adult member of the household responded to questions regarding height and weight but, unfortunately, researchers cannot adjudicate between these two possibilities. Between 1976 and 1996, 81,405 participants (about 5% of the NHIS sample) were classified as partial self-reporters. Information on reporting status was unavailable for less than 1% of respondents from 1976–1996. Estimates of BMI for respondents with missing data on reporting status were, on average, comparable to self-reporters. Therefore, we excluded cases with missing reporting status.
Because both body mass and reporting status vary by race and sex, the sample was stratified into four race-sex groups: Black Males, Black Females, Non-Black Males, and Non-Black Females. A recent study using NHIS data found that differences between proxy- and self-reported health indicators narrowed substantially when respondent characteristics were taken into consideration . Thus, our analyses controlled for basic sociodemographic variables that are known to be associated with body mass [10–14] and may also be associated with reporting status. Period of observation was grouped into four categories: 1976–84, 1985–88, 1989–92, and 1993–96. The initial category was broader than the others to capture enough partial self-reporters to produce stable parameter estimates. Age was grouped into six categories of approximately 10 years: 18–29, 30–39, 40–49, 50–59, 60–69, and 70 or older. Marital status was grouped into three categories: married with a spouse in the household, not currently married, and a category for unknown or missing data. The category "not currently married" included separated individuals and a very small proportion of married persons who indicated that their spouse was either absent or in an unknown location. Also, because the NHIS did not include "living with partner" as a response option until the 1997 redesign, we could not combine married with cohabiting individuals in our correction procedure. Working status was divided into three categories: working, not currently working, and a category for missing data. Educational status was grouped into five categories: less than a high school diploma, a high school diploma but no college experience, some college experience, a college degree or more, and a category for persons with missing data. Missing variable categories were necessary to maintain the full sample.
The analysis consisted of two parts: First, we estimated the magnitude of BMI differences by reporting status. Second, we developed a procedure to correct biases in BMI introduced by reporting status. This two-part analysis minimized the sudden increase in BMI that coincided with the 1997 NHIS redesign, resulting in more accurate trend estimates of mean BMI in the adult population from 1976–1996.
In part 1, we explored whether BMI differences by reporting status were the result of misreporting height, weight, or some combination of the two. (As discussed in more detail in the results section, analyses clearly showed that BMI differences were due to misreporting weight, not height). To verify that weight differences were caused by reporting status rather than other respondent characteristics (e.g., age and educational status), we evaluated parameter estimates in ordinary least square (OLS) regression models of weight on reporting status before and after the incorporation of a set of potential confounders. Although the parameter estimates from this part of the analysis could be used to correct differences in weight by reporting status, such an approach would assume that differences between proxy-, partial self-, and self-reports of weight were constant across sociodemographic strata.
where is the regression sum of squares in the full model consisting of all main effects and interaction terms, is the regression sum of squares in the reduced model consisting of all main effects and interaction terms except for k interaction terms, and is the mean square error in the full model.
Unstandardized coefficients in OLS regression models of weight (pounds) on reporting status and sociodemographic characteristics, NHIS 1976–1996
70 and older
Less than high school
College or more
As expected, control variables were strong and statistically significant predictors of weight, although parameter estimates varied somewhat by race and sex (Table 1). For instance, married males weighed considerably more than non-married males, but little difference was observed between married and non-married females. More importantly, weight differences attributable to reporting status persisted after introducing potential confounders. Although accounting for differences in case-mix attenuated coefficients for partial self- and proxy-reported weight among Black and non-Black females, clear biases persisted after controlling for sociodemographic differences. Interestingly, the downward bias in proxy- and partial self-reported weight among non-Black males only emerged after the introduction of control variables. The introduction of control variables also caused the degree of bias to increase among Black males.
Multiple partial F-tests for blocks of interaction terms in OLS models of weight on reporting status and sociodemographic characteristics, NHIS 1976–1996
Multiple Partial F Ratios
by Marital Status
by Working Status
by Educational Status
by Marital Status
by Working Status
by Educational Status
We estimated a final correction equation that included the main effects for reporting status, plus the significant blocks of interaction terms for each race-sex group:
Adjusted weight = β0 + ((-1) * (β1X1 + β2X2 + β(i...n)X1X(i...n) + β(j...n)X2X(j...n))),
where β 0 is the reported weight of the respondent, β 1 is the main effect of partial self-reporting on weight, β 2 is the main effect of proxy-reporting on weight, β (i...n)X1X(i...n) is the constellation of i to n interaction terms associated with partial self-reporting, and β (j...n)X2X(j...n) is the constellation of j to n interaction terms associated with proxy-reporting. Note that since X1 and X2 equal zero among self-reporters, all terms fall out of the equation except β 0.
Incorporation of the interaction terms resulted in a more refined correction procedure that accounted for important sociodemographic differences in proxy- and partial self-reported estimates of weight. To illustrate, proxy estimates of weight among married non-Black males were, on average, about 2 pounds higher than proxy estimates of weight among non-Black males who were not currently married. Similarly, the downward bias in proxy-reported weight among non-Black females was about 3.5 pounds higher in 1993–96 than in the initial period of observation.
This study explored how BMI estimates in a large, nationally representative health survey varied depending on reporting status. Although not the focus of this analysis, extant research has found that self-reported BMI tends to underestimate clinical assessments of BMI [18–24]. This analysis focused specifically on the differences between proxy-reported and self-reported BMI, finding that BMI from proxy-reports was substantially lower than self-reported BMI. In other words, anthropometric data collected through proxy-reports introduced even more measurement error than self-reported data collection techniques. Consistent with our finding, research has shown that parents misestimate the height and weight of their preschool-aged children, producing downwardly biased estimates of BMI . Therefore, relying on proxies such as spouses or parents to provide information about body mass introduces significant measurement error into an analysis and should be avoided whenever possible. However, because the NHIS and other large-scale epidemiologic studies have used proxy-report techniques to measure BMI (e.g., the National Long-Term Care Survey, the Ontario Familial Colon Cancer Registry, and the Continuing Surveys of Food Intakes by Individuals), and because these studies are commonly used to develop policy and to evaluate population health trends, it is imperative to understand and adjust for the measurement error associated with this type of data collection practice.
In summary, we found that the downward biases in BMI associated with proxy-reports were caused primarily by the underestimation of weight and that these underestimates varied systematically: First, the amount of misreporting differed by reporting status, with proxy-reports of weight showing more downward bias than partial self-reports. Second, misreporting differed by race, sex and other respondent characteristics. Third, the disparity between self- and proxy-reported weight increased substantially in recent waves of the NHIS. Given these patterns, we devised a correction procedure for each race-sex group that accounted for reporting status, age, period of observation, marital status, employment, and education. This correction procedure substantially reduced the larger-than-expected increase in BMI that coincided with the elimination of proxy-reporting in the 1997 NHIS redesign. Researchers interested in using results from this study to correct proxy-reported weight in the NHIS are encouraged to contact the authors for additional information.
The analyses presented here demonstrate that biases associated with proxy-reported weight have increased over the past few decades. Although the underestimation of weight appears particularly acute among proxy-reporters, the rise in obesity prevalence has presumably led to more widespread underreporting of weight by all NHIS respondents. If true, this would corroborate previous research showing that overweight subjects tend to underreport their weight to a greater extent than non-overweight subjects . Given the likelihood of increasing biases in self-reported weight over time, future research should explore period trends in the underestimation of BMI in the NHIS. Presuming that research verifies that the downward bias in mean BMI has grown in recent years, a correction procedure should be devised so that NHIS data may be used to provide a more accurate assessment of trends associated with the U.S. obesity epidemic.
As noted, this study found significant differences in the amount of reporting bias among different demographic groups. For instance, females with proxy-reported estimates of BMI had greater measurement error than males with proxy-reported estimates. Also, proxy-reports for married males had less bias than proxy-reports for unmarried males. Assuming that a spouse is often the proxy respondent for a married participant, it appears that wives may report their husbands' weight more accurately than husbands report their wives' weight, and that the proxy respondent for an unmarried person may not know details such as height and weight as well as a spouse does. Our analyses also found that reporting biases were greatest among those from lower socioeconomic status groups (e.g., those with less than a high school education and those not currently working), suggesting that the validity of proxy-reports may be associated with cognitive traits influenced by socioeconomic attainment . These demographic differences offer insight on which groups may provide more valid and reliable sources of proxy-report data. Should proxy-reports of weight be used in future study designs, it appears that females, particularly wives, and those with higher socioeconomic attainment provide more valid estimates than men or those of lower socioeconomic attainment.
The discrepancy between corrected NHIS estimates of BMI and NHANES examination data reveals an important limitation of our study. However, this limitation also points to an opportunity for future research to build upon our study by developing a BMI correction for NHIS data that, in addition to biases in proxy-reporting, accounts for other shortcomings of NHIS data, such as increasing downward biases in BMI estimates over time. Just as importantly, this limitation issues a cautionary statement to researchers that the uncritical application of standard BMI correction procedures may fail to yield estimates that are unbiased approximations of clinical measures. Another important limitation of our study is that it divides race/ethnicity into two rather broad groups (Black and non-Black). While we believe that this is sufficient for our purposes, other studies may benefit from the development of separate corrections for other racial/ethnic groups, such as Hispanics. A third limitation of our correction procedure is that it is only directly applicable to NHIS data. But despite these limitations, our analyses have provided a set of statistical techniques that correct biases associated with proxy-reporting, and could be expanded further to adjust for the measurement error associated with self-reported BMI in the NHIS. Furthermore, we believe that the ideas developed here could be used to help minimize biases in other sources of epidemiologic data that use proxy-reports of height and weight to estimate BMI.
It is imperative that researchers who measure BMI through reported estimates of height and weight think carefully about flaws in their data and how existing correction procedures might fail to account for them. The development of our correction procedure, which minimized the systematic underestimation of BMI due to the inclusion of proxy-reports of height and weight in the NHIS prior to 1997, represents an important step toward improving the quality of BMI estimates in a widely used source of epidemiologic data. As we have shown, however, correcting the downward bias in proxy-reports is only an initial step toward reducing the measurement error associated with BMI estimates in the NHIS or other data sources that rely on reports rather than direct measures of height and weight. Statistical adjustments that simultaneously account for period trends, demographic characteristics, and the interactions between them should be developed to improve the validity of reported estimates of BMI. Through the careful development of appropriate adjustment procedures for proxy- and self-reported data, epidemiologists will improve their capacity to document historic increases in body mass, despite changing data collection procedures over time. Through our detailed investigation of biases introduced into NHIS data by proxy-reporting, we hope to increase general awareness of these measurement issues and provide researchers with useful ideas for correcting patterns of BMI misreporting in other sources of data.
The authors are grateful to Robert M. Hauser for providing invaluable suggestions and comments on an earlier draft of this article. This research was supported by the Utah Agricultural Experiment Station, Utah State University, Logan, Utah.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.