Transition to the new race/ethnicity data collection standards in the Department of Veterans Affairs

Background Patient race in the Department of Veterans Affairs (VA) information system was previously recorded based on an administrative or clinical employee's observation. Since 2003, the VA started to collect self-reported race in compliance with a new federal guideline. We investigated the implications of this transition for using race/ethnicity data in multi-year trends in the VA and in other healthcare data systems that make the transition. Methods All unique users of VA healthcare services with self-reported race/ethnicity data in 2004 were compared with their prior observer-recorded race/ethnicity data from 1997 – 2002 (N = 988,277). Results In 2004, only about 39% of all VA healthcare users reported race/ethnicity values other than "unknown" or "declined." Females reported race/ethnicity at a lower rate than males (27% vs. 40%; p < 0.001). Over 95% of observer-recorded data agreed with self-reported data. Compared with the patient self-reported data, the observer-recorded White and African American races were accurate for 98% (kappa = 0.89) and 94% (kappa = 0.93) individuals, respectively. Accuracy of observer-recorded races was much worse for other minority groups with kappa coefficients ranging between 0.38 for American Indian or Alaskan Natives and 0.79 for Hispanic Whites. When observer-recorded race/ethnicity values were reclassified into non-African American groups, they agreed with the self-reported data for 98% of all individuals (kappa = 0.93). Conclusion For overall VA healthcare users, the agreement between observer-recorded and self-reported race/ethnicity was excellent and observer-recorded and self-reported data can be used together for multi-year trends without creating serious bias. However, this study also showed that observation was not a reliable method of race/ethnicity data collection for non-African American minorities and racial disparity might be underestimated if observer-recorded data are used due to systematic patterns of inaccurate race/ethnicity assignments.


Background
In 1997, the Office of Management and Budget (OMB) released the revised standards for the collection of race and ethnicity known as Statistical Directive 15 that federal agencies were mandated to comply by January, 2003 [1][2][3]. The most significant changes in the new standards included self-identification as the preferred data collection method and the ability to report multiple races for an individual. For researchers who use data from multiple years for disease surveillance or tabulation of utilization and cost trends by race groups, this transition involved two methodological issues. One was how to handle races for those who identify themselves with more than one race. Sometimes called the "bridging," this issue concerns how to assign multiracial persons to a single race category. The other was a more fundamental question of whether the prevalence of a particular disease, treatment, or outcomes would be comparable over time in race/ethnicity categories.
This study examines how this federal mandate affects the collection and use of race/ethnicity data for a large federal agency, the Department of Veterans Affairs (VA). Until 2003, the VA had collected data on race and ethnicity for all its healthcare users based on observation (e.g., registration clerk's or clinician's perception) [4]. In compliance with the OMB standards, the VA changed to self-identification as its preferred method of data collection. Now the underlying meaning of race has fundamentally changed (e.g., from appearance to self-perception) and the key question is whether the old data based on observation can be used together with the new, self-reported data.
The transition to the OMB standards is not an issue unique to the VA. Previous studies have examined the implications of this transition for state public health data [2,5], focusing mainly on "bridging" issues. The VA experience additionally involved issues related to change in data collection methods from observation to self-identification. Considering that many hospitals in the private sector still collect race data by observation [6] and have yet to make this transition for more standardized and reliable race/ethnicity data [7][8][9], this study can inform privatesector hospital administrators of the issues involved in making this transition.
The objective of this study is to examine the effect of this transition on the research use of race/ethnicity data for multi-year trends in the VA, specifically focusing on how comparable race data collected under two different methods are and what the effect of bridging may be on different race categories when one try to map multiracial race values to the old single races for the same individuals. Previous studies have examined data quality issues on race in the VA data [4, 10,11], but none have examined the effect of this transition in the VA.

Study design and population
The Institutional Review Board at the Edward Hines, Jr. VA Hospital approved the study including a HIPAA waiver of authorization. In this study, we examined race data for all users of the Veterans Health Administration (VHA) healthcare services from fiscal years 1997 through 2004. A fiscal year in the VA runs from October 1 st of the previous year to September 31 st of the current year. All years henceforth are fiscal years unless otherwise noted.
The year 2004 is the first full year for which self-reported race/ethnicity is available within the VA healthcare data. For this reason, we used all VA healthcare users in 2004 as the baseline population, and identified all patients who had valid self-reported race values recorded in the VA healthcare utilization data in that year. A valid race value is defined in this study as one of the legitimate race/ethnicity categories excluding "unknown" or "declined." They were then merged with healthcare utilization records in the years between 1997 and 2002 to capture all observer-recorded race values for the same individuals. 2003 was the transition year. For the first seven months of the year (until May 2003), race data were collected according to the old standards and for the rest of the year, according to the new standards. We did not use race data from 2003 due to concerns of data integrity during this transition year.

Data sources
The VHA Medical SAS Inpatient (often called the Patient Treatment Files or PTF) and Outpatient (Outpatient Clinic Files or OPC) Datasets were used as main data sources [14][15][16][17]. These national patient-level datasets capture information collected at VHA healthcare sites and entered in the VHA electronic medical record system, including demographic data [18]. Before May 2003, the Inpatient Datasets recorded race values in one variable (RACE) and, after the implementation of the new standards, they captured race in six different variables (RACE1 -RACE6) to accommodate individuals who identify themselves with more than one race. The Outpatient Datasets contain similarly all records for outpatient care and the new race values were collected in seven new variables (RACE1 -RACE7) since 2004.
The new race variables encoded both the race and data collection method in one value (e.g., "AP" indicates proxy-reported Native Hawaiian or Other Pacific Islander). Three data collection methods could be recorded, including self-identification, observation, and proxy-reporting. In this study, a self-reported race value is one whose data collection method code explicitly indicated "self-identification." A multiracial individual was identified as such only when multiple self-identified race values were present on the same record for the individual (e.g., the same hospitalization or the same outpatient clinic visit). When different races were recorded on two or more records, we considered them to be inconsistent rather than multiracial.
Both Inpatient and Outpatient Datasets were combined to compile a list of all individuals who reported their race in the 2004 datasets. The same datasets for 1997 -2002 were used to compile a list of individuals with valid observerrecorded race values.
When linking the data over time, we used social security numbers (SSNs), sex, and two parts of date of birth (e.g., year, month, or day) as linkage variables [19]. There were 8.4 million unique SSNs for all VHA users in years 1997 -2002 and 2004. Of these, 1.7 percent (97,236) did not match according to these three criteria and were not used in the analysis.

Statistical analysis
Self-reported race/ethnicity data for 2004 were compared with the observer-recorded data in the VA administrative files for 1997 -2002 for the matching individuals. Since the OMB Statistical Directive 15 considers race/ethnicity a social construct and the self-reported race/ethnicity "accurate by definition" [3,5], we used the self-reported race/ ethnicity as the gold standard to examine the accuracy of the old, observer-recorded data. For each old value, five different accuracy and agreement measures were computed. These measures included sensitivity, specificity, positive and negative predictive values, and kappa.
Sensitivity indicates the probability that the old race/ethnicity is correct according to the new race/ethnicity. Specificity indicates the probability that the old race/ethnicity correctly excludes a person who is not of that race/ethnicity. Positive predictive value indicates the probability that a person of a race/ethnicity according to the old data are actually of that race/ethnicity. Negative predictive value indicates the probability that a person not of a race/ethnicity in the old data are actually not of that race/ethnicity. Finally, we used Cohen's kappa as a measure of agreement between the old and new data [20]. According to Landis and Koch [21], a kappa coefficient larger than 0.8 indicates an excellent agreement.
To examine how the "bridging" issue may affect the VA race data, we used four bridging techniques that assign the whole person to a single race/ethnicity group using the smallest group, the largest group, or the largest group other than White, or assign equal fractions to all race groups reported. For example, an individual who identified oneself as White, African American, and Asian would be assigned to Asian using the smallest group assignment, White using the largest group assignment, and African American using the largest group other than White assignment methods. On the other hand, the same individual would be assigned equally to all three groups by 1/3 using the equal fractions method. The definitions and detailed discussion of these bridging techniques are found elsewhere [2,3].
Age, sex, region and race/ethnicity variables were used to tabulate the study population in Table 1. Age was computed using dates of birth of veterans and defined as age on January 1, 2004. The U.S. Census Bureau's definition of region and the veterans' state of residence were used to group all veterans in regions. Table 1 shows the demographic characteristics of individuals used in the analysis. There were about 4.8 million users of VHA healthcare in 2004. Valid self-reported race values were recorded for slightly less than 39% of them (1.9 million).

Self-reported race/ethnicity data
Of those with valid race/ethnicity data in 2004, 79.2% identified themselves as Whites and 16.6% as African Americans, which together accounted for almost 96% of all VHA users with valid race/ethnicity data. Of the remainder, 39,296 (2.1%) belonged to three other single race groups (Asian, Native Hawaiian or Other Pacific Islander, and American Indian or Alaska Native) and 13,135 (0.7%) identified themselves as belonging to two or more races. The rest (1.2%) reported ethnicity but not race.  Table 1 shows the breakdown of these individuals in age, gender and region categories. Females reported race at a significantly lower rate than males (27% vs. 40%; p < 0.001). Individuals in the South reported proportionately more, and those in the West reported less, self-reported races than those in other regions. While 43% of all users in the South reported valid race, only 33% of those in the West did so. Only about 16% of all users in the West could be linked across years and had both old and new race values, whereas about 20 to 22% of all users in other regions could. Table 2 shows comparison of observer-recorded with selfreported races using old race categories. Under the old standards, race had been recorded in six race/ethnicity categories: Hispanic White, Hispanic Black, American Indian or Alaska Native (AIAN), Asian or Pacific Islander (API), Black or African American, and White. Under the new standards, race is recorded in five categories, including AIAN, Asian, Black or African American, Native Hawaiian or Other Pacific Islander, and White. As far as the race categories are concerned, the only change involved splitting the old API category into separate "Asian" and "Hawaiian or Other Pacific Islander" categories in the new standards. * SR indicates self-reported; OR, observer-recorded; AIAN, American Indian or Alaska Native; OB, observation; PROXY, proxy report. § The percentages in the last two columns do not add up to 100% due to those who identified themselves as Hispanic but did not report a valid race value.

Accuracy of observer-recorded race
Thus, the new standards provide slightly finer breakdowns in race/ethnicity groups than the old.
In Table 2, the new race and ethnicity values were combined into the old categories (e.g., Hispanic ethnicity and Black or African American race were combined into Hispanic Black) and two race categories in the new standards, Asian and Native Hawaiian or Other Pacific Islander, were combined into Asian or Pacific Islander. Since those who reported multiracial values (7,347) or Hispanic ethnicity but no race or a non-African American minority race (14,430) in the 2004 data could not be assigned to the old race/ethnicity categories, they were not included in this table.
Those in the cells on the main diagonal were the individuals whose new and old race values agreed and accounted for 95.1% of all users with both new and old race values. This indicates that observer-recorded race values have an excellent agreement with the self-reported values (kappa = 0.87). There were no differences in agreement rates between males and females, but considerable variations existed between regions (results not shown). The Midwest region had the highest agreement rate (97%; kappa = 0.89) and the West the lowest (93%; kappa = 0.85).
All off-diagonal cells indicate the number of disagreements in race coding between new and old data. The largest number of divergences occurred between Whites and African Americans, the two largest racial groups. 8,160 Whites (1.1%) were coded as African Americans and 9,638 (5.4%) African Americans as Whites; these together represent 36.2% of all miscoded values in the observerrecorded data. The second largest divergence occurred between Whites and Hispanic Whites; 7,472 Whites (1.0%) and 4,883 Hispanic Whites (12.4%) were miscoded in the old data. These together represent another 25% of all miscoded races. For AIANs and APIs, almost 70% of all observer-recorded data in these two categories were miscoded. And 58% of AIANs and 47% of APIs in particular were coded as Whites in the observer-recorded data. Of the total 47,772 individuals with incorrectly identified race/ethnicity in the observer-recorded data, 40,584 (85%) were either other race/ethnicity members incorrectly identified as Whites (22,777) or Whites incorrectly identified as other race/ethnicity members (17,807). Table 3 shows summary measures of accuracy of the old race values compared with the self-reported race values as the gold standard for the same individuals analyzed in Table 2. The summary measures were computed separately for each race category. The largest two racial categories, White and African American, had 97.6% and 94.0% sensitivity rates, respectively. For Whites, the specificity (90.3%) was considerably lower than the sensitivity (97.6%) due to a large number of individuals who were incorrectly coded as Whites in the old data. Other measures of accuracy for these two groups were either acceptable or quite good. Kappa coefficients for Whites and African Americans were 0.89 and 0.93, respectively.
Sensitivity rates for other race/ethnicity groups were considerably lower. Hispanic Whites had 83% sensitivity, but other groups had sensitivities lower than 40% (AIAN, 31.8%; API, 35.3%; Hispanic Black, 26.6%), indicating that they are not reliable enough for research use as separate race/ethnicity groups.
Race categories are frequently combined into larger ones in research. The last three rows in Table 3 shows the accuracy and agreement measures for combined race/ethnicity categories. The sensitivity for the Hispanic category that combines Hispanic White and Hispanic Black was 86% (kappa = 0.81). The sensitivity for the combined AIAN and API category was still extremely poor at 35% (kappa = 0.47). All non-African American minority groups combined into one category were only 73.7% sensitivity (kappa = 0.75). When Hispanics with no race, multiple races, or non-African American minority races were included in these comparisons, the summary measures showed slightly worse agreement between the old and new data with sensitivity rates 85% (kappa = 0.83) and 70% (kappa = 0.74) for the combined Hispanic category and the combined non-White, non-African American category, respectively (data now shown). Table 4 shows the extent of agreement between the selfreported and observer-recorded race/ethnicity values in

"Bridging" issues
Another important issue in the transition to the new standards is how to use the multiracial values in tabulating trends for multiple years. We considered four commonly used bridging methods to assign multiracial individuals to single race categories in the old standards. No matter what method was used, these assignments had little effect for the two largest race groups, Whites and African Americans. However, when the subjects categorized as multiracial were allocated to the smallest group, the number of individuals in the AIAN group increased by almost 55% and the API group by 20% (Table 5). Some other methods also increased these two groups by 10% to 29%, indicating that the allocation methods can potentially have a large impact in race/ethnicity identification for API and AIAN groups when individuals with multiple races in the self-reported data are "bridged" to the single races in the old, observer-recorded data.

Discussion
This study examined issues related to the transition to the new federal standards in collecting race/ethnicity in the VA. We showed that the overall agreement between the observer-recorded and self-reported race/ethnicity data was excellent. Excluding those who reported ethnicity only in 2004, the overall agreement between the new and old data was over 95% (kappa = 0.87). This indicates that the observer-recorded data are highly consistent with, and can be used together with, the self-reported data without creating substantial bias in multi-year trends. This was mainly due to accurate identification by observation of the two largest racial groups, Whites and African Americans, who had sensitivity rates of 97.6% (kappa = 0.89) and 94.0% (kappa = 0.93), respectively.
However, we also showed that observation was not a reliable method of identifying race/ethnicity for non-African American minority groups. The sensitivity rates for these groups varied between 26.6% and 83.0% (kappa, 0.23 and 0.79), too low for identifying them separately for research purposes. They can be combined with other groups to create a higher-level, more inclusive group, to  achieve better sensitivity. We showed that the African American and Other (Whites and all other non-African American minorities combined in one group) distinction had the best agreement between the old and new race/ethnicity data.
We also observed a systematic pattern by which observerrecorded data misclassified individuals; 85% of all inaccurate race/ethnicity in the observer-recorded data involved Whites in such a way that Whites were incorrectly identified as members of a minority group or vice versa. This pattern of misclassifications in the observer-recorded data can reduce the observed disparity between Whites and other racial groups, and accordingly the racial disparity based on observer-recorded data may be underestimated. Researchers using the observer-recorded and self-reported data together thus need to conduct sensitivity analyses to rule out the possibility that any change in disparity before and after the transition is not attributable to using mixed data.
The findings of this study are consistent with a previous study that reported agreement rates of 97.9% and 92.0% for Whites and African Americans, respectively, between the observer-recorded race in VA administrative files and the self-reported race in a survey of veterans [4]. The agreement of the APIs was much lower with the self-reported data in the administrative files (35.3%) than with the survey data (75.5% for Asians and 69.6% for Pacific Islanders).
The observer-recorded race/ethnicity data in the VA Medical SAS Datasets also compare favorably in accuracy with those in the Medicare Enrollment Database (EDB), which showed 96.5% and 95.6% for sensitivity rates for Whites and African Americans, respectively [22]. The VA observerrecorded data performed slightly better in identifying Whites but slightly worse in identifying African Americans. In the VA, only about 15% of Hispanics were misclassified to some other race/ethnicity groups, while almost 65% in the EDB were misclassified. The sensitivity for Hispanic category in the EDB was only 35.7% compared with 85.5% in the VA data. Except for Asians, the sensitivity rates for other minority groups in the VA data were much higher than those in the EDB. Thus, when both VA and Medicare race values are available for an individual, this implies that the old VA data should actually be preferred to the Medicare data, especially for Hispanics.
We found that the completeness of self-reported race/ethnicity data was a serious problem. Over 60% of all VHA users in 2004 did not report any race values, which represents almost 15% drop in completeness compared with observer-recorded data in the pre-transition years. For example, 45% of all VHA users in 2002 were missing race/ ethnicity data. This sudden drop in completeness of the race data from the pre-transition years may in part be a transitional problem that occurs during the first few years after a new system is implemented. If these were the case, the race/ethnicity data may be randomly missing.
However, it is also possible that some groups may not like to disclose their race/ethnicity more than others and so this drop may also be in part attributable to the change in data collection methods. As we have shown, race/ethnicity data for multiracial individuals may be seriously underreported in the VHA data. Only 0.3% of all users and 0.7% of those with valid self-reported race/ethnicity values reported two or more races in the 2004 VHA data, while a national survey of veterans conducted in 2001 indicated that 2.1% of all veterans and 3.2% of VHA users may be multiracial [23]. The selective self-reporting is also shown in the regional variations in the completeness of race data in 2004. The South region had the highest completeness at 43%, followed by Northeast and Midwest at 39%, and West at 33%. According to the 2000 Census, the West had the highest concentration of multiracial persons with 40% of all multiracials in the country [24]. This suggests that the multiracial individuals are more reluctant to report their own races than individuals of single race and the self-reported data may have selection issues that the previous observer-recorded data do not have, further complicating the mixed use of observer-recorded and selfreported data for multi-year trends.
To address the incompleteness issue, the VA can consider several options. First, the VA can obtain data through special surveys or from external sources. As the Centers for Medicare and Medicaid Services (CMS) have done [25,26], the VA could survey veterans specifically to collect race/ethnicity data from enrollees whose self-reported race/ethnicity data are not known. Alternatively, the VA could establish an interagency agreement with the Social Security Administration (SSA) and acquire the SSA's race data regularly to supplement its own race data, an approach also used by the CMS [22,26].
However, a more fundamental and long-term solution to this problem is to improve race reporting at the source, namely, in VA hospitals and clinics. The VA may need to examine whether the way the race/ethnicity questions are asked (e.g., specific wording of the questions, use of any prefatory remarks or probes following an incomplete answer, or circumstances under which the questions are asked) can be improved. Previous research suggests that how a question is asked about race/ethnicity can make substantial differences in the response rate, especially for small race/ethnicity groups [27,28]. For example, a study showed that an open-ended question (i.e., allowing the respondents to describe their race/ethnicity in their own terms) can reduce the rates of unusable data compared with data obtained with the OMB standards, and that the open-ended format is especially effective in improving race reporting for minority groups such as Hispanics, Asians or multiracial individuals, who are often reluctant to describe their race/ethnicity profile in pre-defined categories such as those in the OMB standards [27].
Until the self-reported race data in the VA are substantially improved in completeness, researchers using the VA race data may consider supplementing self-reported data with observer-recorded data from past years or through SSA or Medicare data sets when applicable. Future research needs to examine how the observer-recorded and self-reported data can be integrated for a well-validated patient-level race database and if such an approach can substantially improve the completeness of VA race data.
In the meantime, however, whether self-reported data are used alone or in combination with old race/ethnicity data, users of VA self-reported race data should be aware of the potential selectivity in the self-reported race/ethnicity data. As discussed above, about 25% of those who reported Hispanic ethnicity did not report race in the FY2004 VHA data. These individuals may not view themselves as having racial identity distinct from their ethnicity [27]. They thus may choose either "Other" category for their race or refuse to disclose race when they are given OMB categories. This is shown in 2000 Census data in which over 42% of all those who reported their ethnicity chose "Some other race" compared with only 5.3% of the total population [29]. In the VHA, "Other race" is not provided as a response category and as a result many refused to report race. Regional variations in the completeness of self-reported race/ethnicity data (e.g., 33% in the West vs. 41% in the other three regions) may also reflect not so much systemic failure to enforce the new race/ethnicity data collection standards among VHA facilities in the West as variations in the distribution across regions of non-African American minority groups such as Hispanics, Asians, and individuals of two or more races.
One limitation of this study is that we have not considered the characteristics of the VA population who had no self-reported data. Their individual characteristics, and accordingly the accuracy of their observer-recorded race values, may be systematically different from those who could be linked. As a consequence, this study cannot provide an estimation of how good the quality of data would be that combine the old and new information. Further, the findings about the accuracy of observer-recorded race need to be cautiously generalized because only about 28% of all valid observer-recorded data for 1997 -2002 could be linked to the self-reported data.

Conclusion
In conclusion, our results indicate that observer-recorded race data for VA users compiled from 1997 -2002 have excellent agreement with self-reported race data collected in 2004 for veterans of African American or White race. However, there is considerable under-reporting of race overall and the accuracy of observer-recorded race data for non-African American minorities is poor, limiting the usefulness of observer-recorded race data for individuals in these racial categories. Private and public healthcare providers considering a similar transition can learn from the VA experience in anticipating potential issues and planning for a smooth transition.