This study systematically examined the impact of measurement error in the context of a prediction algorithm. This simulation study reveals several interesting aspects of the influence of measurement error (systematic and random) on the performance of a risk algorithm.
As hypothesized, random error reduced calibration and discrimination of the algorithm due to the fact that the observed variance is greater than the true variance in the presence of measurement error. The observed BMI distribution was wider than the true distribution due to this increased variation. This affects both diabetics and nondiabetics due to its random nature, resulting in greater overlap between the BMI distributions. Ultimately, this makes assigning risk according to BMI levels more difficult to achieve. Even though random error in height and weight should, on average, correctly estimate the true BMI in the population (since it does not skew the mean in a particular direction), it can still influence the performance of a prediction model due to decreased precision, which leads to greater dispersion in the BMI distribution.
In this study, systematic error in height and weight biased the predicted risk estimates in the direction of the error. This affects calibration, which is not surprising since the concordance of observed and predicted events would be influenced by the under- or overreporting of the BMI level. In other words, persons that are over- or underreporting their weight will then be over- or underestimated by the risk model and thus result in disagreement with observed estimates. Systematic error did not influence the ability to rank order subjects. Therefore, the ability to discriminate between who will and will not develop diabetes was not affected by systematic error when variance due to random error is held constant. This was reflected by the stability of the C-statistic under varying degrees of systematic error. The way that the systematic error was examined in this study was such that the distribution of BMI was shifted to the left (as a result of underestimating weight or overestimating height, or both) compared to the true distribution. This is an overall effect, and the decreased precision or increased variability as seen with random error is therefore not observed. Even though the distribution is shifted to the left, those with higher BMI still have a higher probability of developing diabetes compared to those with lower BMI despite the fact that the absolute levels of risk will be underestimated in both groups. This is a classic example of how discrimination and calibration are often discordant. Due to the nature of probability, it is possible for a prediction algorithm to exhibit perfect discrimination, i.e., it can perfectly resolve a population into those who will and will not experience the event, and at the same time have deficient accuracy (meaning that the predicted probability of developing diabetes does not agree with the true probability) . This study did not impose systematic error with respect to disease status, but it could be hypothesized that if the systematic error were differential between diabetics and nondiabetics that this could indeed affect discrimination.
The finding that random error resulted in the overall predicted risk estimated to be biased upwards was contradictory to the hypothesis that only systematic error will bias the risk estimate. Random error increases the variability of a measurement and increases the range of predicted risk, which is bounded by 0 in the logistic model. In a situation where the outcome probabilities are very high, the skew would be expected to be in the opposite direction. Not surprisingly, the error in predicted risk resulting from underreporting weight or overreporting height is in the anticipated direction (i.e., if weight is underreported the observed risk will be underestimated). Furthermore, the addition of random error to this type of systematic error slightly reduced the amount of underestimation because the random and systematic errors worked in opposite directions. In another situation, random error could potentially augment the error in predicted risk. Such would be the case if systematic error tended to result in an overestimate of risk.
This study shows that random error, which accounts for 20% of the total observed variance (ICC of 0.8 or higher), is unlikely to affect the performance or validation of a prediction model. Research shows that the random error in height and weight reporting is unlikely to exceed that amount . Interestingly, the effects on the predicted diabetes risk were relatively minor, even in situations of high under- or overreporting of weight and height. This is likely because BMI has such a strong relationship with diabetes such that increased risk is apparent even with significant underestimation. The true distributions of BMI in diabetics and nondiabetics are so distinct that even in the presence of underreporting these populations have dissimilar risk for developing diabetes. Had this misclassification affected a variable that did not have such a strong relationship with the outcome, the effect on predicted risk may have been more severe. Furthermore, in this study systematic error in self-reported height and weight was taken as an overall effect in the population. If self-reporting error were significantly more likely to occur in those who were more likely to develop diabetes, then the impact of this bias could be augmented.
This study focused on the overall trend of self-reporting error seen in several validation studies, that is an underestimation of weight and an overestimation of height ; however, these patterns may also vary across subpopulations, such as gender and socioeconomic status. Generally, women tend to underestimate weight more so than men, and men tend to overestimate height more so than women [31, 32]. Socioeconomic status has been shown to modify these associations such that those of lower socioeconomic status may actually overestimate their weight and/or underestimate their height [33, 34]. These subgroups may also have differential diabetes risk and the extent to which this error influences population risk prediction is a topic of future research. In addition, values from a given individual in the population may exceed the maximum values included in this study; however, the influence of this would be more relevant for individual risk prediction tools versus for population prediction.
There are several limitations to consider in the context of this study. Conclusions drawn from this simulation study will relate only to the scenarios simulated and may not apply to all risk algorithm situations. Simulation programs that reflect the specific study conditions to which a study is applied must be created to make conclusions applicable. Another caution in interpreting the findings of this study is that models examined in this exercise are simpler than complicated multivariate risk algorithms encountered in practice. This simpler model allows us to focus on the height and weight error, which is the greatest potential source of error in DPoRT. It should be noted that one of the assumptions of this study is that the only sources of error are in self-reported height and weight. Other sources of error, including error in diabetes status and selection bias in the survey or in sampling, are assumed to be absent.
This study provides novel information about the influence of measurement error in a risk prediction model. By understanding the consequences of measurement error on prediction and algorithm performance, efforts can be made to correct for these errors and thus improve the accuracy and validity of a risk algorithm. Further, efforts must be made to understand the nature of error in self-reporting measurements. Ongoing work to improve the quality of measurements used in risk algorithms will improve model performance. Researchers developing and validating risk tools must be aware of the presence of measurement error and its impact on the performance of their risk tools.
The study was approved by the Research Ethics Board of Sunnybrook Health Sciences Centre, Toronto, Ontario, Canada.