Accuracy and completeness of mortality data in the Department of Veterans Affairs

Background One of the national mortality databases in the U.S. is the Beneficiary Identification and Record Locator Subsystem (BIRLS) Death File that contains death dates of those who have received any benefits from the Department of Veterans Affairs (VA). The completeness of this database was shown to vary widely from cohort to cohort in previous studies. Three other sources of death dates are available in the VA that can complement the BIRLS Death File. The objective of this study is to evaluate the completeness and accuracy of death dates in the four sources available in the VA and to examine whether these four sources can be combined into a database with improved completeness and accuracy. Methods A random sample of 3,000 was drawn from 8.3 million veterans who received benefits from the VA between 1997 and 1999 and were alive on January 1, 1999 according to at least one source. Death dates found in BIRLS Death File, Medical SAS Inpatient Datasets, Medicare Vital Status, and Social Security Administration (SSA) Death Master File were compared with dates obtained from the National Death Index. A combined dataset from these sources was also compared with National Death Index dates. Results Compared with the National Death Index, sensitivity (or the percentage of death dates correctly recorded in a source) was 77.4% for BIRLS Death File, 12.0% for Medical SAS Inpatient Datasets, 83.2% for Medicare Vital Status, and 92.1% for SSA Death Master File. Over 95% of death dates in these sources agreed exactly with dates from the National Death Index. Death dates in the combined dataset demonstrated 98.3% sensitivity and 97.6% exact agreement with dates from the National Death Index. Conclusion The BIRLS Death File is not an adequate source of mortality data for the VA population due to incompleteness. When the four sources of mortality data are carefully combined, the resulting dataset can provide more timely data for death ascertainment than the National Death Index and has comparable accuracy and completeness.


Background
Accurate data for mortality ascertainment are of critical importance for epidemiologic and health care outcomes studies. One of the national mortality databases in the U.S. is the Beneficiary Identification and Record Locator Subsystem (BIRLS) Death File that contains death dates of those who have received any benefits from the Department of Veterans Affairs (VA) since the early 1970s. This database has been widely used as the main source of death dates for veterans who received health care from the VA [1][2][3][4][5]. The completeness of this database has been shown to vary widely from cohort to cohort, ranging between 70.0% and 96.5% [6][7][8][9][10][11][12] in sensitivity or the percentage of death dates that are correctly recorded.
Three other databases for mortality ascertainment are available in the VA that can supplement the BIRLS Death File. One of them is the VA health care inpatient datasets. Another is the Social Security Administration (SSA) Death Master File which the VA acquires from the SSA. A third source is a subset of the Medicare Vital Status file that contains death dates for all Medicare-enrolled veterans. It is the newest source of mortality data for veterans which became available in the VA in 1999.
This study had two main objectives. The first was to evaluate the completeness and accuracy of death dates in these four sources of mortality data for a sample representative of the VA population. The second objective was to examine whether these four sources could be combined into a database with improved accuracy and completeness for mortality ascertainment.

Study population
This study was approved by the Human Subjects Committee and the Research and Development Committee at the Edward Hines Jr. VA Hospital in Hines, Illinois. The study population comprised all veterans who received any benefits from the VA between 1997 and 2002 and were alive on January 1, 1999 according to at least one source of mortality data available within the VA. Veterans who were enrolled in the Veterans Health Administration (VHA), received compensation or pension benefits, or utilized VA health care were included. After excluding those with no date of birth (20,049, 0.2%), those with invalid Social Security Numbers (10,940, 0.1%), and those who were Medicare beneficiaries but did not have an updated record in the Medicare Vital Status file (53,943, 0.6%), 8.3 million veterans were in the sampling frame.

Mortality data available in the VA
The BIRLS is a VA database that contains information on all VA beneficiaries, including veterans discharged from military service since March 1973, Medal of Honor recip-ients, veterans who received education benefits from the VA, and veterans whose survivors applied for burial benefits [11]. This database contains death dates reported by family members applying for death benefits, VHA hospitals, or the VA National Cemetery Administration. The BIRLS Death File is a subset of the BIRLS that contains data on deceased veterans, including death dates.
The VA database called the Medical SAS Inpatient Datasets (MSID) is another source of death dates for veterans and contains information on patients who are discharged each year from any of the VHA hospitals across the county [13]. This database has been compiled each year since 1970 and includes dates of deaths that occurred in VHA hospitals or shortly after discharge [11]. It has previously been referred to as the Patient Treatment Files or PTF [14].
The Medicare Vital Status file is a dataset constructed by the Centers for Medicare and Medicaid Services (CMS) and contains demographic and vital status information for all Medicare beneficiaries. The VA has a data sharing agreement with the CMS to receive annually all Medicare data for VA-enrolled veterans [15]. The primary source for dates of death in the Vital Status file is the Death Master File compiled by the Social Security Administration (SSA), but the CMS also updates the Vital Status file with dates of death from other sources, including Medicare claims data [16].
The Death Master File is produced by the SSA, contains over 70 million deaths, and is updated monthly. This file is populated with death dates which SSA obtains from death reports by family members, funeral homes, state and federal agencies, postal authorities and financial institutions. Previous studies reported sensitivity for the SSA Death Master File ranging between 83% and 95% compared with the National Death Index [9,[17][18][19][20]. The VA regularly obtains the SSA Death Master File and its monthly updates from the SSA, and makes them available to the researchers.
The data from the fifth source, namely the National Death Index, are considered the "gold standard" for mortality ascertainment. The National Death Index was established in 1981 by the National Center for Health Statistics to be a central repository of computerized death records for the entire U.S. population. It contains dates and causes of death from actual death records filed in state vital statistics offices since 1979 [21]. Deaths that occurred in a calendar year are added to the National Death Index annually, about 12 months after the end of the year. The lag time between the occurrence and the reporting of a death in the National Death Index may be anywhere between 12 to 24 months. The data in the National Death Index have been evaluated against known deaths from sources such as actual death certificates or direct contact with patients or their families, and have consistently exceeded 95% sensitivity [6,7,[22][23][24][25]. The National Death Index was used in this study as the gold standard in evaluating completeness and accuracy of death dates from the four sources.

National Death Index search
The study sample consisted of 3,000 veterans who were randomly drawn from the sampling frame. It was submitted to the National Death Index for a death date search in January, 2005. We provided Social Security Number (SSN), last name, first name, middle name, date of birth, date of death, sex, and state of residence.
The National Death Index search often returns multiple records as possible matches. The National Death Index uses nine different matching criteria to select possible matches, some of which do not require a match on Social Security Number [21]. A set of criteria must be developed to determine if any of the possible matches are the correct ones. A "true match" is established when the record returned from the National Death Index and the submitted record both belong to the same individual according to chosen criteria. A liberal criterion may increase sensitivity rates of death reporting, but can also increase the number of false positives and thus decrease specificity [6,7]. A careful choice of match criteria has important implications for the comparison [11].
Three match criteria were used to establish "true matches" in this study. The first criterion matched on SSN, sex, and two parts of the date of birth (day, month, or year). The probability that the match according to this criterion was correct was estimated to be 95% or higher [26]. Two other match criteria were: at least 7 digits of the SSN, date of birth, last name, first name, middle initial if provided by both sources, and sex; and, at least 7 digits of the SSN, date of death, last name, first name, middle initial if provided by both sources, and sex. These two criteria were used to match those cases with SSNs whose two digits were transposed. We established 96.2% of all true matches using the first criterion and only 3.8% using the other two.

Determining the best death dates by combining mortality data sources
We developed an algorithm to determine the best source of death dates among our study sample. This algorithm used other stratified samples drawn from the pool of veterans who had death dates recorded in any of the four sources. These samples included veterans with death dates (1) that appeared in only one source (4.4% of the study population with a death date in any of the four sources), (2) that appeared in more than one source and agreed (88.1%), and (3) that appeared in more than one source but did not all agree (7.5%). These samples were submitted to the National Death Index along with the study sample. The results from this process of combining data sources are detailed elsewhere and are available upon request [16]. Results for the study sample are described below.

Statistical analysis
For each source, an individual was placed into one of the four categories defined in Table 1, depending on how a veteran's mortality status from the source agreed with that from the National Death Index. False positives (B) are "misreported" deaths in the sense that they are not found in the National Death Index. True negatives (D) are "unreported" deaths; they are not reported in the source but are in the National Death Index.
Using the number of individuals in each group in Table 1, four comparison statistics were computed for each source with the National Death Index data as the gold standard. Sensitivity indicates the per cent of deaths that were correctly recorded in a source, and was computed as the ratio of true positives to true positives plus false negatives [A/ (A+C)]. Specificity refers to the per cent of individuals without a death date in the National Death Index who were not recorded as deceased in a source. It is the ratio of true negatives to true negatives plus false positives [D/ (B+D)]. Positive predictive value refers to the probability that an individual who was identified as deceased in a source was actually deceased, and is computed as the ratio of true positives to true positives plus false positives [A/ (A+B)]. Negative predictive value refers to the probability that an individual who was identified as not deceased in a source was actually not deceased, and is computed as the ratio of true negatives to true negatives plus false negatives [D/(C+D)].
We then computed sensitivity by VA health care use groups. Three groups were defined based on any VHA health care use in 1999-2002: inpatient users, outpatient users, and non-users. Inpatient and outpatient use groups are not mutually exclusive. Almost all inpatient users (99.2%) also used outpatient care during this period. The sensitivity rates were lower for the users of outpatient care only, but we chose to report these rates for all outpatient users to make comparison with previous studies easier [10,12].
Finally, we computed agreement rates at three different levels of precision: an exact date match, a match within two days, and a year and month match. The agreement rate indicates the proportion of death dates in a source that match at a given level of precision with dates in the National Death Index.
We only used the death dates from the National Death Index that fell between January 1, 1999 and December 31, 2002 to allow for at least 24-month time-lag in death reporting in the National Death Index.

Results
Of the 3,000 records submitted, the National Death Index rejected two records due to incomplete demographic data  and returned match results for 2,998 records. Of the veterans with returned possible match records, only 292 (9.7%) could be linked to the sample records as "true matches" using the three match criteria discussed above. After deleting two records rejected by the National Death Index, we had the final sample with 2,998 veterans and 292 deaths confirmed by the National Death Index.  In contrast, death dates in these four sources agreed with those in the National Death Index at extremely high rates ( Table 5). The death dates in the Inpatient Datasets all agreed exactly with the dates from the National Death  Index. The BIRLS Death File had the second highest agreement (96.9%). Both the Medicare and SSA sources had slightly lower agreement rates than the BIRLS Death File with 95.5% and 95.6%, respectively.
There were 292 deaths identified in the combined data, with five unreported and five misreported deaths. The sensitivity of the combined data was 98.3%; the specificity, 99.8%; the positive predictive value, 98.3%; and the negative predictive value, 99.8% (Table 3). For the inpatient and outpatient users, the sensitivities of the combined data were both 100%, indicating that the combined data are as complete as the National Death Index for the VHA users. The rates of agreement with the National Death Index were 97.6%, 98.6%, and 99.7% for exact date match, match within 2 days, and year and month match, respectively.
Finally, we counted the number of death dates which each source contributed to the combined data (  [18,19], and we suspect that differences in age distributions may explain the inconsistency between the two studies [19].
This study showed the Medicare Vital Status file to be an important supplemental source of mortality information. It contained death dates for veterans that were not found in any other source available in the VA. For Medicareenrolled veterans, this file was the most accurate and complete of all the single sources considered in this study. The high sensitivity and agreement rates suggest that it can be used alone as a source of mortality data for Medicareenrolled veterans.
When available sources were combined, the resulting mortality data proved to be highly comparable to the National Death Index in both accuracy and completeness. For the VHA users, the combined data were 100% complete and 97.9% accurate to the date compared with the National Death Index.
There are some advantages in using the combined data over the National Death Index. While the four data sources described above are readily available free of charge to researchers affiliated with the VA, the National Death Index requires fees for data search that can be quite substantial when data for a large number of subjects are searched. Since both the BIRLS Death File and the SSA Death Master File are updated monthly, they can be used to obtain more recent death dates than the National Death Index.
One limitation of this study is that we examined death dates for a limited time frame (1999 -2002), while death dates in the BIRLS Death File precede the 1970s and those in the SSA Death Master File precede the 1940s [11]. The accuracy and completeness of the combined data observed for this sample may not apply to deaths that occurred before 1999 and especially to those that occurred before the mid-1970s, since the completeness of both the BIRLS Death File and the SSA Death Master File was poor until mid-1970's [6,19]. These results may also not be used in studies which follow mortality of veterans using identifiers other than Social Security Numbers, since the completeness of mortality data for those with and without Social Security Numbers are known to be quite different [6,9].

Conclusion
We found that a combined data set could provide highly accurate and complete data for mortality ascertainment and that hardly any improvement in accuracy or completeness could be achieved by acquiring death dates from the National Death Index. The combined data set can also provide more timely mortality data than the National Death Index whose time-lag for death reporting can be up to 24 months. Given the time-and resource-intensive nature of combining these four sources, a centralized database can be greatly beneficial to researchers by providing them with easy access to high-quality mortality data and thus improving the quality of research involving veteran mortality ascertainment.