Comments on the missing values of smoking and insurance status
Yiling Cheng, Centers for Disease Control and Prevention
29 October 2009
This article demonstrated a simple and innovative approach to answer an important question that is what the total diabetes prevalences by US states are. I read it with great interesting and noticed the authors mentioned that there were “…50.2% of observations in NHANES were missing either smoking or insurance status…” According to the documentations, this is extremely too high. For example, in NHANES 2003-2004, persons aged 20 years or older had one missing value on question “Smoked at least 100 cigarettes in life” (http://www.cdc.gov/nchs/data/nhanes/nhanes_03_04/smq_c.pdf) and persons aged 0 years or older had only 133 missing values on question “Covered by health insurance”(http://www.cdc.gov/nchs/data/nhanes/nhanes_03_04/hiq_c.pdf). The authors might ignore the skip pattern of these two variables. Incorrectly handling these variables may make incorrect predictions and incorrect conclusions. I am wondering whether the authors can check the document and dataset again and rerun the analyses.
Competing interests
None declared
Authors' response to reader comment
Jolayne Houtz, Population Health Metrics
30 October 2009
We appreciate the attention to this detail by Dr Cheng. The point raised is correct and was indeed due to a skip pattern in the NHANES questionnaire. We repeated the analysis to evaluate the influence on the coefficients of regression within NHANES and predicted diabetes prevalence. Three coefficients (smoking, age 60-69, and age 70+) changed by less than 10%, and the rest remained unchanged. Predicted diabetes prevalence for different state-sex-age-race-insurance categories changed on average by 1.3% and at the most by 3.5% of the values reported in the manuscript, and hence were not sensitive to this error. Goodarz Danaei and Majid Ezzati, on behalf of the authors
Comments on the missing values of smoking and insurance status
29 October 2009
This article demonstrated a simple and innovative approach to answer an important question that is what the total diabetes prevalences by US states are. I read it with great interesting and noticed the authors mentioned that there were “…50.2% of observations in NHANES were missing either smoking or insurance status…” According to the documentations, this is extremely too high. For example, in NHANES 2003-2004, persons aged 20 years or older had one missing value on question “Smoked at least 100 cigarettes in life” (http://www.cdc.gov/nchs/data/nhanes/nhanes_03_04/smq_c.pdf) and persons aged 0 years or older had only 133 missing values on question “Covered by health insurance”(http://www.cdc.gov/nchs/data/nhanes/nhanes_03_04/hiq_c.pdf). The authors might ignore the skip pattern of these two variables. Incorrectly handling these variables may make incorrect predictions and incorrect conclusions. I am wondering whether the authors can check the document and dataset again and rerun the analyses.
Competing interests
None declared
Authors' response to reader comment
30 October 2009
We appreciate the attention to this detail by Dr Cheng. The point raised is correct and was indeed due to a skip pattern in the NHANES questionnaire. We repeated the analysis to evaluate the influence on the coefficients of regression within NHANES and predicted diabetes prevalence. Three coefficients (smoking, age 60-69, and age 70+) changed by less than 10%, and the rest remained unchanged. Predicted diabetes prevalence for different state-sex-age-race-insurance categories changed on average by 1.3% and at the most by 3.5% of the values reported in the manuscript, and hence were not sensitive to this error.
Goodarz Danaei and Majid Ezzati, on behalf of the authors
Competing interests
No competing interests.