This article has Open Peer Review reports available.
The episodic random utility model unifies time trade-off and discrete choice approaches in health state valuation
© Craig and Busschbach; licensee BioMed Central Ltd. 2009
Received: 04 June 2008
Accepted: 13 January 2009
Published: 13 January 2009
To present an episodic random utility model that unifies time trade-off and discrete choice approaches in health state valuation.
First, we introduce two alternative random utility models (RUMs) for health preferences: the episodic RUM and the more common instant RUM. For the interpretation of time trade-off (TTO) responses, we show that the episodic model implies a coefficient estimator, and the instant model implies a mean slope estimator. Secondly, we demonstrate these estimators and the differences between the estimates for 42 health states using TTO responses from the seminal Measurement and Valuation in Health (MVH) study conducted in the United Kingdom. Mean slopes are estimates with and without Dolan's transformation of worse-than-death (WTD) responses. Finally, we demonstrate an exploded probit estimator, an extension of the coefficient estimator for discrete choice data that accommodates both TTO and rank responses.
By construction, mean slopes are less than or equal to coefficients, because slopes are fractions and, therefore, magnify downward errors in WTD responses. The Dolan transformation of WTD responses causes mean slopes to increase in similarity to coefficient estimates, yet they are not equivalent (i.e., absolute mean difference = 0.179). Unlike mean slopes, coefficient estimates demonstrate strong concordance with rank-based predictions (Lin's rho = 0.91). Combining TTO and rank responses under the exploded probit model improves the identification of health state values, decreasing the average width of confidence intervals from 0.057 to 0.041 compared to TTO only results.
The episodic RUM expands upon the theoretical framework underlying health state valuation and contributes to health econometrics by motivating the selection of coefficient and exploded probit estimators for the analysis of TTO and rank responses. In future MVH surveys, sample size requirements may be reduced through the incorporation of multiple responses under a single estimator.
Health state valuation studies using the time trade-off (TTO) approach lack a sound theoretical framework for the incorporation of worse than death (WTD) responses. Furthermore, TTO responses may be considered a form of discrete choice (i.e., expressions of a tie between two alternative scenarios); yet, no valuation study has applied discrete choice estimators to TTO data. In this paper, we introduce an episodic random utility model (RUM) and two novel estimators for health state valuation. We show that the assumption of the episodic RUM theoretically and econometrically unifies TTO and other discrete choice approaches.
As described by Torrance in 1982, values of better than dead (BTD) states are bounded by the values of optimal health (1.00) and dead (0). WTD states may be as large as minus infinity . In Figure 1, a person's "spaghetti" line may lie anywhere between the dotted lines, but the slope of the spaghetti line must remain between one and minus infinity. The potential for an infinitely negative slope poses a fundamental challenge in the estimation of QALYs using TTO, standard gamble (SG), person trade-off (PTO), or any other discrete-choice approach. In TTO, the conventional approach to QALY estimation entails an average of positive and negative slopes (i.e., mean slope estimator); a similar process is applied in SG and PTO. An often noted problem is that the influence of negative slopes can be so massive (e.g., -39 in the MVH study) that the mean slopes appear much too low, well outside the reasonable range of face validity within the QALY concept.
Confronted with this threat to face validity, researchers typically manipulate WTD response data, arbitrarily increasing the negative slopes and imposing an ad-hoc boundary of negative one on the slopes. The boundary of negative one reduces the influence of negative slopes on the mean slope and gives an appealing mirror image for the valuations space above zero. Nevertheless, critics from early on have warned that there is no theoretical justification for the value of negative one, which means the truncated scale may not represent 'utility' . Changing data to improve face validity is generally frowned upon, even in the case of outliers.
A similar health econometrics discussion has taken place on cost analyses, revealing that the transformation of positive outliers has a large effect on the mean cost per patient. At the 2008 American Society of Health Economists, John Mullahy compared the role of a health econometrician to that of an anatomist, dissecting data in an Aristotelian fashion . In his lecture, "Anatomy of Healthcare Cost Distributions," he dismantled the thick upper tail of a common cost distribution and discussed its possible interpretations. Likewise, health state valuation studies continuously re-examine the theoretical framework that guides estimator selection and the best approach to address results with poor face validity.
In pursuit of a justification for this ad hoc transformation, studies report that respondents find it more difficult and make more errors estimating negative values than estimating positive values, especially in TTO tasks . These psychometric complications are reflected in the high variance of negative values, the low discriminating power of negative values, and the discontinued scale around the value of death, otherwise known as the 'gap-effect' [5–7]. While the evidence on the influence of state-specific heteroskedasticity is mounting, there is not yet a clear and coherent framework for combining BTD and WTD TTO responses.
Recently, there has been considerable interest in estimating health state values from ranking exercises suitable for QALY calculations [4, 5, 8, 9]. Ranking is seen as a relatively easy valuation method, like the visual analogue scale (VAS), and shown to render predictions that are concordant with (if not identical to) VAS predictions . The advantage of ranking versus VAS is a well developed theoretical foundation in Item Response Theory without the response spreading and context effects associated with VAS . Unlike VAS, ranking is a choice-based approach, which provides a basis for its merger with economic oriented choice-based methods, like TTO and SG. A drawback for both ranking and VAS is their unclear relation to health state values on the QALY scale, a relation which is better described for TTO and SG.
A theoretically driven model that reduces the difference between a psychometrically strong method (e.g., ranking) and a method with a strong link to utility theory (e.g., TTO) has the potential to revolutionize the field of health state valuation. This model would increase the 'convergent validity' of related psychometric and econometric methods, and therefore, enhance the 'construct validity' of these methods . In the absence of a 'gold standard' in health state valuation, such an increase in convergent validity would advance our understanding regarding the latent construct of quality of life and its assessment. Furthermore, if a model reduces dependence on arbitrary deviations from utility theory, such as negating the use of ad hoc corrections of WTD responses in the QALY paradigm, the model would promote face validity. Lastly, such a model might further improve upon the validity of QALYs by integrating the benefits of psychometric and econometric methods under a single statistical estimator.
In this paper, we introduce an episodic random utility model (RUM) as such a theoretical framework. This model not only allows for the comparisons between rank and TTO predictions within a common estimator, it resolves key econometric and psychometric issues that inhibit TTO-based valuation. In introducing this model, the difficulties with the face validity of WTD responses are addressed in a way that is theoretically coherent for the fields of economics and psychometrics, and improves upon the convergent validity between TTO and rank-based predictions. For purposes of illustration, the conventional and episodic RUMs are estimated using the Measurement and Valuation of Health (MVH) study data from the United Kingdom (UK) [12–14].
Episodic and Instant Random Utility Models (RUMs)
In the episodic RUM, the error, ε ij , represents variability in the value of an episode. For example, Figure 1 has time on the x-axis and utility on the y-axis, so the error would be distributed vertically along the y-axis. The second model is an instant RUM, which suggests a random slope. Its error, ε ij , represents variability in the value of an instantaneous state, not the episode. The instant RUM is the theoretical basis underlying the mean slope estimator, the conventional approach to health state valuation studies.
The instant RUM would be equivalent to an episodic RUM if we were to assume that the magnitude of error is proportional to the duration of the episode. In other words, more time in state j coincides with more error in the valuation. However, each model assumes that errors have equal variances. This difference is subtle, but highly influential in cases where there are WTD TTO responses. In WTD responses, the respondent's choice of time in optimal health changes the amount of time in state j, thus, changing the amount of error under the instant RUM. For example, if the respondent equates the state to "immediate death" (t = 0), according to the instant RUM model, there is no error in this response.
Both models assume that the utility of dead for any duration is zero (i.e., Udead(t) = 0), and the utility of optimal health for any duration equals the duration (i.e., Uoptimal(t) = t) . Both models assume constant proportionality: the expected utility of a health state is proportional to its duration, t, and the expected error is zero. State-specific components and errors may depend on the duration (e.g., μ j (t)) [15, 16]; however, questions concerning duration effects in health state valuation are outside the scope of this paper and left to be examined in future work.
Interpretations of TTO responses
The interpretation of the BTD response, t 1, is for all intensive purposes equivalent under episodic and instant RUMs, because the amount of time in state j is equal regardless of response (i.e., ten years).
The interpretation of the WTD response, t 2, differs greatly between the episodic and instant RUM estimates.
Both estimators are non-parametric, and they are equivalent if the sample only includes BTD responses, t 1. The instant RUM estimator is a mean slope (Figure 1), and because slopes can be exceptionally negative (e.g., -39), the mean slope estimator is not robust to small changes in the error term. The episodic RUM estimator is a fraction of weighted sums, creating additional stability.
Beginning in the mid 1990's, the field of economic evaluations faced a similar choice between estimators [18, 19]. The emergence of patient-level data led to the question of whether to use the mean ratio (i.e., mean slope) or the mean cost over the mean effectiveness as the estimator of the incremental cost-effectiveness ratio (ICER). Like in our case, if incremental effectiveness approaches zero for any patient, the patient's ICER blows up together with the mean. As such, ratio statistics are not widely used in cost-effectiveness research.
A parallel argument in favor of the coefficient estimator comes from psychometrics. The coefficient estimator is motivated by economic theory (i.e., episodic RUM). However, measurement theory also implies the same estimator with a slightly different interpretation: when respondents provide the amount of time in optimal health, they may respond with some error (t + ε) . The coefficient estimator accommodates such response error.
Nevertheless, the mean slope estimator is the conventional approach to health state valuation studies using discrete choice methods (i.e., TTO, SG, PTO, etc.). In an effort to improve the face validity of instant RUM predictions, Dolan replaced the negative slopes with -t2/10, while Shaw and colleagues divided the negative slopes by a constant (i.e., 39) [12, 20]. Each transformation attenuates the magnifying effects in the slopes by bounding them to be greater than negative one. In the economic evaluation analogy, Dolan's transformation is like changing the incremental effectiveness to the maximum, 10 years, when the patient's ICER is negative. By construction, the Dolan approach will produce estimates greater than the unadjusted mean slope, but less than the coefficient if there are any WTD responses (mathematical proof available upon request). These arbitrary manipulations are not nested within either the instant or episodic RUMs, or within any other utility or psychometric theory .
Mixing TTO, Rank, and RUM
While TTO estimation does not require further specification to produce consistent results, it may be more efficient to assume the errors are normally distributed. This assumption allows for maximum likelihood estimation and, more importantly, the merger of rank and TTO responses under a single estimator.
The exploded probit estimation can predict the state-specific components and variances. While health states clearly have different expected utilities, differences in variances (i.e., σ j ≠ σ k) have little effect on the predicted values as demonstrated by Craig, Busschbach, and Salomon . Therefore, in this article, we estimated a homoskedastic probit model using rank responses and predicted values for 42 health states on the QALY scale with fixed anchors for comparison. TTO responses were used to predict the OLS episodic RUM.
where y equals t1 if BTD, or -t 2 if WTD, and x equals 10 if BTD, or (10-t 2) if WTD. This is equivalent to a simple linear regression with no constant and an assumption of normally distributed errors with state-specific variances. A central advantage of the exploded probit is the estimator can accommodate both TTO and rank responses.
Caution is warranted when merging responses from different valuation techniques into a single estimator. While the estimation of state-specific components, μ j, may benefit greatly from the added information, it remains unclear whether the TTO variance is equal to the variance found in rank responses. Completion of the TTO task entails a greater cognitive burden for respondents, which may result in greater errors. In the combined estimator, a separate variance parameter describing the difference between the method-specific variances is included for rank responses.
In combining TTO and rank responses within a single estimation, we increase the power of valuation studies that explore preferences of respondents using both TTO and rank responses. In most valuation studies done on the basis of the MVH protocol, both TTO and rank were administrated. A problem might be that there are more ranked pairs than TTO responses. To impose balance across methods, we assigned the pair-wise comparisons a reduced weight equal to the respondent's number of hypothesized non-anchor states over the respondent's number of pair-wise comparisons. As a result, each respondent's set of decomposed rank responses received the same weight in the maximum likelihood estimation as their set of TTO responses. The estimator accounts for both sources of information equitably.
United Kingdom Measurement and Valuation of Health (MVH) Study
In 1993, the University of York administered 3395 interviews with a response rate of 64%, and collected values of 42 EQ-5D health states and the state of unconsciousness [12–14]. The MVH protocol, developed for the aforementioned study, describes a face-to-face interview that can be separated into several sections. First, the respondents are asked to describe their own health using the EQ-5D descriptive system. Then, the respondents rank 15 cards each describing a health state. This set of 15 health state cards always includes the anchor states, optimal health (11111) and immediate death. The respondents are instructed to assume that the duration of the health state is 10 years and followed by death. After the ranking exercise, the subjects are asked to place each card on the EQ-VAS, often referred to as the EuroQol "thermometer." After the EQ-VAS valuation section, the deck of health state cards is reshuffled, and 13 health states are valued using the TTO method. The two missing states are 11111 and 'immediate death' as these states cannot be valued directly using the standard TTO, because they anchor the TTO scale. The TTO-interview is complemented by a visual aid, specifically a TTO-probe board that graphically displays the difference in life years between health states. As previously described, the TTO task produces either t 1 or t 2 responses, each of which describes a compensating amount of time in the optimal health state.
For the TTO and rank analytical sample (N = 3,333 and 3,355, respectively), respondents were excluded for a particular method (1) if only one or two states were valued (other than 11111, "immediate death," and "unconscious"); (2) if all states were given the same value; or (3) if all states were valued worse than "immediate death." In addition, respondents were excluded from the rank sample if they ranked death equivalent to optimal health. These four criteria motivated the exclusion of 1.8% of the rank respondents and 1.2% of the TTO respondents.
Comparison between Instant and Episodic RUM
Correlation and Agreement between Predicted Values for 42 EQ-5D States
Comparison between Episodic RUM Estimates using TTO responses and...
Instant RUM Estimates using...
Episodic RUM Estimates using...
Unadjusted TTO Responses
Adjusted TTO Responses*
TTO ank Responses
Mean absolute difference
Episodic RUMs using TTO, Rankings, and Both Responses
Table 1 further describes the relationship between the predictions from the three episodic RUM estimations. Coefficient estimates based on TTO responses show stronger agreement with rank-based predictions (Lin's rho = 0.910) than adjusted mean slopes (Lin's rho = 0.794). This suggests that rank responses provide similar information to TTO responses based on the episodic RUM compared to the instant RUM with Dolan's transformation of WTD responses. Convergence validity between the two methods is improved more by a theoretical coherent model, than by an ad hoc boundary of -1.00. This, in turn, increases the construct validity of both the ranking and TTO estimates for health state valuation.
State-specific Component Estimates (μ j) by EQ-5D State, Model and Estimator
TTO and Rank
In this paper, we introduce the episodic RUM and its coefficient estimator, which together provides a framework for health state valuation that is theoretically and econometrically consistent. The findings suggest a re-analysis of current health state valuation data and the potential merger of TTO and rank responses under a unified QALY estimator, specifically the exploded probit. To better understand this conclusion, we delineate the three major contributions of the episodic RUM.
The first contribution is the theoretical realization that under the conventional TTO approach, known as the instant RUM, the error scale in WTD and BTD responses is different by construction. As shown in equation 1, BTD error is divided by ten, and WTD error is divided by a number less than ten. Therefore, the instant RUM inflates the error of WTD responses, causing them to become more influential on the estimator and pulling the estimates down. Dolan's transformation of WTD responses (-t 2/10) inadvertently causes the error scale to be equivalent, but the predictions lose internal consistency. On the contrary, the episodic RUM assigns the same error scale, regardless of response type, and produces consistent results.
The second contribution is in convergent validity . The episodic RUM predictions from the TTO responses strongly agree with predictions from the rank responses. In fact, this strength of agreement is larger than the agreement between rank predictions and instant RUM predictions with the Dolan transformation of WTD responses. The results confirm ranking and TTO to be closely related, suggesting the combination of both methods' strengths: the sound psychometric foundations and feasibility of ranking, and the face validity of TTO as it relates closely to the QALY paradigm. In a previous paper, Craig, Busschbach and Salomon show that rank predictions are essentially equivalent to VAS predictions (Lin's rho = 0.98); therefore, the results of this paper complementarily demonstrate convergent validity in the predictions for rank, VAS and TTO under the episodic RUM . Furthermore, this evidence on the promise of the episodic RUM demonstrates that Dolan's arbitrary correction of negative responses is outmoded.
The third contribution is more practical. Under the assumption of normal errors, the episodic RUM implies an exploded probit estimator that integrates rank and TTO responses. This exploded probit estimator increases the power of valuation studies considerably by combining responses from two forms of discrete choice experiments: TTO and ranking. We demonstrate that the integration of rank and TTO responses is feasible and decreases the standard errors of the state value predictions. By merging a psychometrically strong instrument (i.e., ranks) with discrete choice data based on utility theory (i.e., TTO), predictions are more robust. However, we recognize the appeal of the nonparametric episodic RUM estimator (equation 5).
The episodic RUM may replace the current paradigm in health state valuation, given that the instant RUM changes the error scale by response type; arbitrary corrections of WTD responses produce aberrant results; and the exploded probit allows the integration of TTO, rank, SG, and other discrete choice responses in a theoretically and econometrically consistent manner. In more practical terms, future valuation studies (e.g., EQ-5D five level version) may be statistically powered using a variety of discrete choice responses. The next step might be to re-estimate each country-specific valuation set using the episodic RUM and further examine duration effects in components and errors.
BMC is an Assistant Member for the Health Outcomes & Behavior Program at Moffitt Cancer Center in Tampa, Florida and Courtesy Associate Professor in the Department of Economics at the University of South Florida, Tampa, Florida. BMC holds professional membership for the EuroQol Group, the International Health Economics Association, the International Society for Pharmacoeconomics & Outcomes Research, and the American Society of Health Economists.
JJVB is professor and vice director of the Department for Medical Psychology and Psychotherapy of the Erasmus MC in Rotterdam. JJVB holds professional membership for the EuroQol Group (chair of the foundation), the International Health Economics Association, and the International Society for Pharmacoeconomics & Outcomes Research
- Torrence GW: Multi-attribute utility theory as a method of measuring social preferences for health states in long-term care. In Values and Long Term Care. Edited by: Kane Rl, Kane RA. Lexington, Massachusetts: Lexington Books; 1982:127-156.Google Scholar
- Patrick DL, Starks HE, Cain KC, Uhlmann RF, Pearlman RA: Measuring preferences for health states worse than death. Med Decis Making 1994, 14: 9-18. 10.1177/0272989X9401400102View ArticlePubMedGoogle Scholar
- Mullahy J: The Anatomy of Healthcare Cost Distributions. In 2nd Biennial Conference of the American Society of Health Economists. Duke University, Durham, North Carolina, US; 2008.Google Scholar
- Craig BM, Ramachandran S: Relative risk of a shuffled deck: a generalizable logical consistency criterion for sample selection in health state valuation studies. Health Econ 2006, 15: 835-848. 10.1002/hec.1108View ArticlePubMedGoogle Scholar
- Craig BM, Busschbach JJV, Salomon JA: Ranking, Time Trade-Off and Visual Analogue Scale Values for EQ-5D Health States. Under Review 2008.Google Scholar
- Busschbach JJV, Weijnen T, Nieuwenhuizen M, Oppe S, Badia X, Dolan P, Greiner W, Kind P, Krabbe P, Ohinmaa A, et al.: A comparison of EQ-5D time trade-off values obtained in Germany, The United Kingdom and Spain. In The Measurement and Valuation of Health Status Using EQ-5D: A European Perspective. Edited by: Brooks R, Rabin R, Charro Fd. Netherlands: Kluwer Academic Publishers; 2003:143-165.View ArticleGoogle Scholar
- Stalmeier PF, Busschbach JJ, Lamers LM, Krabbe PF: The gap effect: discontinuities of preferences around dead. Health Econ 2005, 14: 679-685. 10.1002/hec.986View ArticlePubMedGoogle Scholar
- Salomon JA: Reconsidering the use of rankings in the valuation of health states: a model for estimating cardinal values from ordinal data. Popul Health Metr 2003, 1: 12. 10.1186/1478-7954-1-12View ArticlePubMedPubMed CentralGoogle Scholar
- McCabe C, Brazier J, Gilks P, Tsuchiya A, Roberts J, O'Hagan A, Stevens K: Using rank data to estimate health state utility models. J Health Econ 2006, 25: 418-431. 10.1016/j.jhealeco.2005.07.008View ArticlePubMedGoogle Scholar
- Krabbe PF, Stalmeier PF, Lamers LM, Busschbach JJ: Testing the interval-level measurement property of multi-item visual analogue scales. Qual Life Res 2006, 15: 1651-1661. 10.1007/s11136-006-0027-7View ArticlePubMedGoogle Scholar
- Nunnally JC: Psychometric Theory. 2nd edition. New York, New York: McGraw-Hill Book Company; 1978.Google Scholar
- Dolan P: Modeling valuations for EuroQol health states. Medical Care 1997, 35: 1095-1108. 10.1097/00005650-199711000-00002View ArticlePubMedGoogle Scholar
- Gudex C: Time Trade-Off User Manual: Props and Self-Completion Methods. In Report of the Centre for Health Economics. York, United Kingdom: University of York; 1994.Google Scholar
- Kind P, Dolan P, Gudex C, Williams A: Variations in population health status: results from a United Kingdom national questionnaire survey. Bmj 1998, 316: 736-741.View ArticlePubMedPubMed CentralGoogle Scholar
- Craig BM: The duration effect: a link between TTO and VAS values. Health Econ 2008.Google Scholar
- Stalmeier PF, Lamers LM, Busschbach JJ, Krabbe PF: On the assessment of preferences for health and duration: maximal endurable time and better than dead preferences. Med Care 2007, 45: 835-841. 10.1097/MLR.0b013e3180ca9ac5View ArticlePubMedGoogle Scholar
- Goldberger AS: Econometric Theory. New York, New York, USA: John Wiley & Sons; 1964.Google Scholar
- Drummond MF, Sculpher MJ, Torrance GW, O'Brien BJ, Stoddart GL: Methods for the economic evaluation of health care programmes. 3rd edition. Oxford; New York: Oxford University Press; 2005.Google Scholar
- Mullahy J, Manning W: Statistical issues in cost-effectiveness analyses. In Valuing Health Care. Edited by: Sloan F. Cambridge, UK: University of Cambridge; 1995.Google Scholar
- Shaw JW, Johnson JA, Coons SJ: US valuation of the EQ-5D health states: development and testing of the D1 valuation model. Med Care 2005, 43: 203-220. 10.1097/00005650-200503000-00003View ArticlePubMedGoogle Scholar
- Efron B: The efficiency of Cox's likelihood function for censored data. Journal of the American Statistical Association 1977, 72: 557-565. 10.2307/2286217View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.