- Open Access
- Open Peer Review
PHQ-8 Days: a measurement option for DSM-5 Major Depressive Disorder (MDD) severity
Population Health Metrics volume 9, Article number: 11 (2011)
Proposed draft diagnostic criteria for the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) suggest that dimensional assessments can supplement dichotomous diagnoses by incorporating measures of severity, frequency, and duration, providing the ability to monitor changes in symptoms over time and to guide appropriate treatment.
This report is based on data from the Behavioral Risk Factor Surveillance System 2006 from 198,678 survey participants who responded to all eight Patient Health Questionnaire (PHQ-8) items. We evaluated use of the days version of the PHQ-8 to determine an optimal cut-point for identifying respondents with depression and to evaluate the performance characteristics of the PHQ-8 at this cut-point.
A PHQ-8 score of 55 or more days was determined to be the optimal cut-point when compared to the DSM-derived PHQ-8 algorithm for a major depressive episode (five or more symptoms present "more than half the days," at least one of which must be anhedonia or depression). In the full sample, the sensitivity and the specificity of this cut-point were 0.91 (0.90-0.93) and 0.99 (0.99-0.99), respectively.
The days version of the PHQ-8 may be a valuable dimensional alternative to the traditional PHQ-8 by offering finer granularity of dimensionality (a score of 0 to 112).
Diagnostic and Statistical Manual of Mental Disorders (DSM) criteria are presently designed to establish categorical diagnoses, distinguishing those with a particular mental disorder from those without such a disorder . DSM criteria are currently less useful for measuring psychiatric symptoms and disorders on a continuum. Major depressive disorder is classified as a mood disorder, with diagnosis hinging on the presence of a single episode or of recurrent major depressive episodes (MDE) . The gold standard for a diagnosis of depression is the Structural Clinical Interview (SCID), a diagnostic interview based on DSM criteria that requires clinical expertise to administer. It yields a dichotomous outcome, the presence or absence of MDE, for the past month (current), past year, or over a lifetime, based on the presence of five or more of the nine DSM criteria, provided that anhedonia or depression was present [1, 2].
The proposed draft diagnostic criteria for the fifth edition of the DSM (DSM-5; http://www.dsm5.org) suggest that graded, dimensional assessments can supplement dichotomous diagnoses. Furthermore, dimensional assessments incorporating measures of severity, frequency, and duration may help psychiatric research, epidemiology, and clinical services to not only better monitor changes in respondents' symptoms over time but also to guide the choice of appropriate population and clinical interventions . Categorical and dimensional approaches are fundamentally equivalent with no one right approach. Advocates of both approaches may well be right, but in different circumstances [4, 5].
Given the time and expense required to administer the SCID, epidemiological studies instead use either structured interviews designed for trained lay interviewers (e.g., the Composite International Diagnostic Interview [CIDI], the Diagnostic Interview Schedule [DIS]) or self-report questionnaires [6–8]. Such self-report questionnaires (e.g., the Center for Epidemiologic Studies of Depression Scale [CES-D], versions of the Patient Health Questionnaire [PHQ-9, PHQ-8], the Beck Depression Inventory [BDI], and other measures) [9–11] measure symptoms and mood to provide evidence for a disorder (defined as "something wrong with a patient that is of clinical significance") rather than for a diagnosis (defined as "an expert opinion that a disorder is present") . Nonetheless, self-report questionnaires such as the 9-item (PHQ-9) and the 8-item (PHQ-8) Patient Health Questionnaire depression measure can provide a dimensional assessment for depression because they are scored by summing how often a number of typical depressive symptoms occur [5, 9]. A PHQ-8 score of ≥10 can also yield a categorical diagnosis of clinically significant depression and is more convenient to use than a DSM-IV diagnostic algorithm .
A recent revision to the PHQ-8 (referred to as the PHQ-8 Days) used in the Behavioral Risk Factor Surveillance System (BRFSS) survey adds further dimensionality to the PHQ-8 by asking the number of days in the past 14 days the respondent experienced each of the eight depressive symptoms, yielding 0 to 112 total days . In this study, we determine the optimal cut-point of the PHQ-8 Days scale for identifying respondents experiencing major depression during the past two weeks, and then evaluate the performance characteristics of the PHQ-8 Days at this cut-point. We estimate the robustness of its receiver operating characteristic (ROC) curve and compare the prevalence of major depression at this cut-point (positive test frequency) with that based on the proportion of PHQ-8 respondents meeting the DSM algorithm criteria for MDE. We also demonstrate the fine granularity of the PHQ-8 Days scale by lifetime diagnosis of anxiety and depression and multiple domains of health-related quality of life. Assessment of the PHQ-8 Days scale in this large epidemiological study may provide further evidence of its utility as a dimensional measure of depression in population-based research.
Behavioral Risk Factor Surveillance System survey (BRFSS)
We analyzed data from the 2006 BRFSS survey. The BRFSS is a population-based, state surveillance system using ongoing, random-digit-dialed telephone surveys of noninstitutionalized US residents aged 18 years or older that monitors the prevalence of key health- and safety-related behaviors and characteristics [13, 14]. During the 2006 survey, trained interviewers in 41 states and territories administered the Anxiety and Depression Module, which includes the PHQ-8 . Weighting of BRFSS data is designed to make the total number of cases equal to the number of people in the state who are age 18 and older. In the BRFSS, such post-stratification serves as an adjustment for noncoverage and nonresponse and forces the total number of cases to equal population estimates for each geographic region, usually a state for the BRFSS. The median response rate among all states and territories, based on Council of American Survey and Research Organizations (CASRO) guidelines, was 51.4% (range: 35.1%-66.0%) in 2006, 50.6% (range: 26.9%-65.4%) in 2007, and 53.3% (range: 35.8%-65.9%) in 2008. The median cooperation rate was 74.5% (range: 56.9%-83.5%) in 2006, 72.1% (range: 49.6%-84.6%) in 2007, and 75.0% (range: 59.3%-87.8%) in 2008. Surveillance methodology, design, implementation, and response rates are available at: http://www.cdc.gov/brfss/technical_infodata/2002QualityReport and http://www.cdc.gov/BRFSS/technical_infodata/index.htm.
There were 198,678 respondents from the 38 states, Washington, DC, Puerto Rico, and the US Virgin Islands who completed all of the PHQ-8.
Patient Health Questionnaire eight-item depression scale (PHQ-8)
The PHQ-8 response set was standardized to make it similar to other BRFSS questions by asking the number of days in the past two weeks the respondent had experienced each of the eight out of nine DSM criteria symptoms. In previous BRFSS analyses, the modified response set had been converted back to the original response set: 0 to 1 day = 'not at all,' 2 to 6 days = 'several days,' 7 to 11 days = 'more than half the days,' and 12 to 14 days = 'nearly every day,' with points (0 to 3) assigned to each category, respectively . The scores for each item are summed to produce a total score between 0 and 24 points. A total score of 0 to 4 represents no significant depressive symptoms; 5 to 9, mild symptoms; 10 to 14, moderate symptoms; 15 to 19, moderately severe symptoms; and 20 to 24, severe symptoms . Current depression is defined in two ways: 1) a PHQ-8 DSM-derived algorithm diagnosis of major depression (≥ five symptoms present 'more than half the days,' with at least one symptom being anhedonia or depression) or other depression (two to four symptoms, including depressed mood or anhedonia, are required to be present 'more than half the days'); 2) a PHQ-8 score of ≥10, which has an 88% sensitivity and 88% specificity for major depression, and, regardless of diagnostic status, typically represents clinically significant depression .
Lifetime diagnosis of anxiety or depressive disorders
Two questions were asked about lifetime diagnosis: "Has a doctor or other health care provider ever told you that you have an anxiety disorder (including acute stress disorder, anxiety, generalized anxiety disorder, obsessive-compulsive disorder, panic attacks, panic disorder, phobia, post-traumatic stress disorder, or social anxiety disorder)?" and "Has a doctor or other health care provider ever told you that you have a depressive disorder (including depression, major depression, dysthymia, or minor depression)?"
Health-related quality of life and other items
Three health-related quality of life (HRQoL) questions with demonstrated validity and reliability for population health surveillance were examined [15–17]. The three questions involved respondents' self-assessment of their health over the previous 30 days:
1) Physical health: "How many days was your physical health, which includes physical illness or injury, not good?"
2) Mental health: "How many days was your mental health, which includes stress, depression, and problems with emotions, not good?"
3) Activity limitations: "Are you limited in any way in any activities because of physical, mental, or emotional problems?"
Sociodemographic information was obtained for each respondent. We assessed the extent to which seven sociodemographic characteristics (sex, age, race, education, employment status, annual household income, and marital status) were associated with major depression as determined by participants' responses to the PHQ-8.
In this study, the condition of interest is MDE as defined by DSM-IV-derived PHQ-8 algorithm "gold standard" criteria (five or more depressive symptoms present 'more than half the days,' and at least one of which must be anhedonia or depression). Sensitivity is the proportion of persons with this condition ascertained by a test to have that condition. Specificity is the proportion of persons without this condition ascertained by a test not to have the condition. Youden's J index (YJI), one measure of combined test validity, is the sum of the sensitivity and the specificity minus 1. The test is the PHQ-8 Days cut-point with the highest simultaneous values for sensitivity and specificity (maximum YJI) among respondents indicating ≥ seven days of either anhedonia or depression (n = 22,542).
We ascertained this cut-point by converting all BRFSS weights to integer weights to obtain the ROC curve and area under the curve (AUC) over the range of test values (0 to 112 days). The ROC curve summarizes test validity measures over the range of the test values and plots the sensitivity of the test on the vertical axis versus (1 - specificity) on the horizontal axis. The AUC is a measure of accuracy of a test instrument, with AUCs between 0.5 and 0.7 considered as reflecting low accuracy; between 0.7 and 0.9, moderate accuracy; and those above 0.9, high accuracy .
We assessed the prevalence of major depression by (a) using this PHQ-8 Days cut-point and (b) according to the DSM-derived PHQ-8 algorithm criteria for MDE by sex, age, race, education, employment status, annual income, and marital status. We also estimated the mean number of PHQ-8 Days by three measures of HRQoL and lifetime diagnosis of anxiety and depression. We used SPSS, Version 17.0 (SPSS Inc., Chicago, IL) with the complex sampling module for all analyses.
A PHQ-8 score of 55 or more days was determined to be the optimal cut-point when compared to the DSM-derived PHQ-8 MDE algorithm (≥ five symptoms present 'more than half the days' and at least one of which must be anhedonia or depression). While the Youden's J Index was similar (0.836) over a narrow range of cut-points from 53 to 56, a cut-point of 55 or higher was selected (Table 1). AUC for the range of the test values (0-112) was 0.98 (Figure 1). Among the full sample of 198,678 people who responded to all eight PHQ-8 questions, the sensitivity and the specificity of a PHQ-8 Days cut-point of 55 or higher were 0.91 (0.90-0.93) and 0.99 (0.99-0.99), respectively.
The prevalence estimates of MDE, based on the PHQ-8 Days optimal cut-point of 55 or more days, did not differ statistically significantly from the prevalence estimates of DSM-derived PHQ-8 MDE by sex, age, race, education, employment status, annual income, and marital status (Table 2).
Respondents with disabilities reported more than twice as many PHQ-8 Days, 25.8 (25.2-26.3), as respondents without disability, 10.3 (10.1-10.5; Table 3). As the number of mentally unhealthy days (Table 4) and physically unhealthy days (Table 5) increased, mean PHQ-8 Days and the weighted prevalence of major depression also increased.
Mean PHQ-8 Days increased markedly with changes in current depression status (none, other, and major) and in lifetime depression status (No to Yes) but not with changes in lifetime anxiety status (No to Yes) except in those without current depression (Table 6).
To the best of our knowledge, this is the first study to examine and extend the proposed DSM-5 dimensionality available using PHQ-8 in its current response format. A PHQ-8 Days cut-point of 55 or more days (provided anhedonia or depression was present seven or more days) best identified respondents with MDE derived from a DSM-based PHQ-8 algorithm. AUC for the range of test values reflected high accuracy . Prevalence estimates of MDE based on the cut-point of 55 or more days were not statistically significantly different from those derived from the DSM-based PHQ-8 algorithm when stratified by seven sociodemographic characteristics. Prevalence estimates of MDE, based on the cut-point of 55 or more days for all categories of sociodemographic characteristics examined, were higher than those of MDE derived from the DSM-based PHQ-8 algorithm, except for no difference in prevalence among homemakers. Prevalence estimates at 56 or more days would have been closer to the DSM-derived MDE algorithm estimates. It is noteworthy that beyond two weeks, MDE prevalence estimates from the BRFSS of 4.4% and 4.2%, based on the cut-point of 55 days or more and from the DSM-based PHQ-8 algorithm, respectively, are in close proximity to prevalence rates of past 30-day major depression found in other studies and in a systematic review of the literature [19–21].
Besides the robust operating characteristics of PHQ-8 Days at the cut-point of 55 or more days, this study had other key findings. First, even among respondents without current depression, mean PHQ-8 Days increased significantly in a stepwise manner from 7.6 days (7.4-7.7) for respondents without lifetime depression or anxiety to 20.8 days (20.1-21.4) for respondents with lifetime depression and anxiety. Among respondents with current other or current major depression, a lifetime history of depression but not anxiety increased mean PHQ-8 days statistically significantly. Second, the dimensional scale of PHQ-8 days increased with both physically and mentally unhealthy days, especially with the latter. Mean PHQ-8 Days increased from about nine days at a level of 0 physically unhealthy days to 28 days at 11-15 physically unhealthy days and to 34 days at 21-30 physically unhealthy days. Mean PHQ-8 days increased from about 8 days at 0 mentally unhealthy days to 33 days at 11-15 mentally unhealthy days and to 44 days at 21-30 mentally unhealthy days.
The PHQ-8 Days version may be a valuable dimensional alternative to the traditional PHQ in several respects for psychiatric epidemiology and clinical services. First, its finer granularity (scores from 0 to 112 for PHQ-8 vs. 0 to 24; or 0 to 126 for PHQ-9 vs. 0 to 27) may increase sensitivity to change when monitoring depression longitudinally in clinical trials or cohort studies. Second, a quantitative response format ("number of days") and a standardized recall period provide greater uniformity and ease of translation, whereas the current verbal response options such as "several," "more than half," or "nearly every day" may be more susceptible to variable interpretations when translated into different languages or used across multiple cultures. Third, the current verbal response set using these less well-defined words and phrases is only an ordinal level, not an interval level, of measurement. Fourth, entering the number of days is easier to use in automated data gathering such as interactive voice recorded (IVR) calls. A potential disadvantage of using "number of days" is that some respondents may not be able to provide a specific number of days so that an interviewer may have to interpret what the respondent reply means or to interpolate when the respondent provides a range rather than a discrete number.
Some of these as well as other factors make the PHQ-8 Days version a useful option for public health research and surveillance. Individuals assessed in the general population typically have fewer and less severe depressive symptoms than patients evaluated in clinical settings; in this case, the greater range of the PHQ-8 Days version might make it more sensitive to subthreshold symptoms, with fewer concerns about a floor effect. Besides being valuable for capturing the full spectrum of depressive symptoms in the general population, this feature may also be useful in detecting low levels of symptoms that may occur in the wake of man-made or natural disasters and in monitoring mental health following such traumatic events. The reduction in cultural and language variability with the quantitative response set and the greater ease-of-use with automated data gathering may be particularly useful for large population-based surveys.
Key strengths of this study are that the survey populations in participating states were reasonably representative of the state populations and that sample sizes were large enough to analyze positive test frequency in seven sociodemographic subgroups. Although the PHQ-8 was used in the BRFSS, the concept of PHQ days can likely be applied to the PHQ-9 as well. The cut-points on the PHQ-8 and PHQ-9 are identical, and either is a valid measure of depression severity [22, 23]. The PHQ-8 omits the ninth item of the PHQ-9 (which asks about thoughts of death or self-harm) and is often used in epidemiological studies where professional follow-up is unavailable or impractical, and in clinical research studies where depression is a secondary rather than primary outcome. Almost all of the positive responses to this ninth item represent passive thoughts of death rather than suicidal ideation.
Studies based on BRFSS data in general and the depression and anxiety module in particular have some inherent limitations. First, they are representative of only households with landline telephones included in BRFSS surveys. If respondents in currently excluded households without telephones or with only cell phones answer the PHQ-8 Days questions differently from respondents in households with landline telephones, the prevalence estimates of MDE may be biased compared to that from interviews in all households, but this difference should not affect the test validity of the PHQ-8 Days measure proposed here. Second, because BRFSS data are based on subjective responses of survey participants, recall bias and biases related to the perceived social desirability of certain responses may affect their accuracy. Third, BRFSS and the "gold standard" DSM diagnostic algorithm for MDE to which the PHQ-8 Days version is compared are unable to address both the inclusion criteria for symptoms that cause clinically significant distress or impairment in social, occupational, or other important areas of functioning and the exclusion criteria for episodes due to the direct physiological effects of a substance or antidepressant intervention (e.g., a drug of abuse, a medication, or other treatment). Furthermore, the BRFSS and this "gold standard" do not account for those being successfully treated and asymptomatic at the time of the survey. Despite these limitations, other BRFSS estimates have been shown to be valid and reliable when compared with estimates derived from national household survey data [24, 25]. BRFSS surveys are a cost-effective and timely means of collecting state and local data, and BRFSS data are often the only data source with which states and communities can assess local health conditions and track progress toward improving those conditions.
We have demonstrated the ease of using the PHQ-8 Days responses not only to create a highly granular dimensional measure but also to identify a categorical cut-point for major depressive episode. Additional cut-points for other categories of depression severity could easily be identified, giving the psychiatric epidemiological and services community the much-needed granularity and flexibility to detect changes and help monitor changes in respondents' symptoms over time as well as providing additional data to help guide the choice of appropriate interventions .
The findings and conclusions in this article are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention.
American Psychiatric Association: Diagnostic and statistical manual of mental health disorders. 4th edition. Washington DC: Author; 1994.
Spitzer RL, Williams JBW, Gibbon M, First M: Structured Clinical Interview for DSM-III-R. Washington, DC: American Psychiatric Press; 1990.
American Psychiatric Association DSM-5 Development[http://www.dsm5.org/Newsroom/Documents/Diag%20%20Criteria%20General%20FINAL%202.05.pdf]
Kraemer HC, Noda A, O'Hara R: Categorical versus dimensional approaches to diagnosis: methodological challenges. J Psychiat Res 2004,38(1):17-25. 10.1016/S0022-3956(03)00097-9
American Psychiatric Association DSM-5 Development[http://www.dsm5.org/Research/Pages/DimensionalAspectsofPsychiatricDiagnosis(July26-28,2006).aspx]
Radloff LS: The CES-D Scale: A Self-Report Depression Scale for Research in the General Population. Appl Psych Meas 1977,1(3):385-401. 10.1177/014662167700100306
Kessler RC, Wittchen HU, Abelson JM, McGonagle KA, Schwarz N, Kendler KS, Knauper B, Zhao S: Methodological studies of the Composite International Diagnostic Interview (CIDI) in the US National Comorbidity Survey. Int J Method Psych Research 1998,7(1):33-55. 10.1002/mpr.33
Robins LN, Helzer JE, Croughan J, Ratcliff KS: National Institute of Mental Health Diagnostic Interview Schedule. Its history, characteristics, and validity. Arch Gen Psychiatry 1981, 38: 381-389.
Kroenke K, Spitzer RL, Williams JB: The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med 2001,16(9):606-613. 10.1046/j.1525-1497.2001.016009606.x
Beck AT, Rial WY, Rickets K: Short form of depression inventory: cross-validation. Psychol Rep 1974,34(3):1184-6.
Williams JW Jr, Pignone M, Ramirez G, Perez SC: Identifying depression in primary care: a literature synthesis of case-finding instruments. Gen Hosp Psychiatry 2002, 24: 225-237. 10.1016/S0163-8343(02)00195-0
Strine TW, Mokdad AH, Balluz LS, Gonzalez O, Crider R, Berry JT, Kroenke K: Depression and anxiety in the United States: findings from the 2006 Behavioral Risk Factor Surveillance System. Psychiatr Serv 2008,59(12):1383-1390. 10.1176/appi.ps.59.12.1383
Mokdad AH, Stroup DF, Giles WH: Public health surveillance for behavioral risk factors in a changing environment. Recommendations from the Behavioral Risk Factor Surveillance Team. MMWR Recomm Rep 2003, 52: 1-12.
Holtzman D: The Behavior Risk Factor Surveillance System. In Community-based Health Research Issues and Methods. Edited by: Blumenthal DS, DiClemente RJ. New York: Springer; 2004.
Andresen EM, Catlin TK, Wyrwich KW, Jackson-Thompson J: Retest reliability of surveillance questions on health related quality of life. J Epidemiol Community Health 2003,57(5):339-343. 10.1136/jech.57.5.339
Mielenz T, Jackson E, Currey S, DeVellis R, Callahan LF: Psychometric properties of the Centers for Disease Control and Prevention Health-Related Quality of Life (CDC HRQOL) items in adults with arthritis. Health Qual Life Outcomes 2006, 4: 66. 10.1186/1477-7525-4-66
Moriarty DG, Zack MM, Kobau R: The Centers for Disease Control and Prevention's Healthy Days Measures -- population tracking of perceived physical and mental health over time. Health Qual Life Outcomes 2003,1(1):37. 10.1186/1477-7525-1-37
Fischer JE, Bachmann LM, Jaeschke R: A readers' guide to the interpretation of diagnostic test properties: clinical example of sepsis. Intens Care Med 2003, 29: 1043-1051. 10.1007/s00134-003-1761-8
Blazer DG, Kessler RC, McGonagle KA, Swartz MS: The prevalence and distribution of major depression in a national community sample: National Comorbidity Survey. A, J Psychiatry 1994,151(7):979-86.
Bijl RV, Ravelli A, van Zessen G: Prevalence of psychiatric disorder in the general population: results of the Netherlands Mental Health Survey and Incidence Study (NEMESIS). Soc Psychiatry Psychiatr Epidemiol 1998, 33: 587-95. 10.1007/s001270050098
Waraich P, Goldner EM, Somers JM, Hsu L: Prevalence and incidence studies of mood disorders: a systematic review of the literature. Can J Psychiatry 2004,49(2):124-138.
Kroenke K, Spitzer RL: The PHQ-9: A new depression and diagnostic severity measure. Psychiatric Annals 2002, 32: 509-521.
Kroenke K, Spitzer RL, Williams JBW, Löwe B: The Patient Health Questionnaire somatic, anxiety, and depressive symptom scales (PHQ-SADS): a systematic review. Gen Hosp Psychiatry 2010,32(4):345-59. 10.1016/j.genhosppsych.2010.03.006
Nelson DE, Holtzman D, Bolen J, Stanwyck CA, Mack KA: Reliability and validity of measures from the Behavioral Risk Factor Surveillance System (BRFSS). Soz Praventivmed 2001,46(Suppl 1):S3-42.
Nelson DE, Powell-Griner E, Town M, Kovar MG: A comparison of national estimates from the National Health Interview Survey and the Behavioral Risk Factor Surveillance System. Am J Public Health 2003, 93: 1335-1341. 10.2105/AJPH.93.8.1335
The authors declare that they have no competing interests.
SSD designed the study and performed statistical analyses. SSD, KK, MMZ, TWS, and LSB drafted the manuscript and approved the final version. SSD accepts full responsibility for the work and the conduct of the study, had access to the data, and controlled the decision to publish. All authors read and approved the manuscript.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
About this article
- Major Depressive Episode
- Behavioral Risk Factor Surveillance System Survey
- Patient Health Questionnaire
- Lifetime Depression
- Behavioral Risk Factor Surveillance System Survey Data