Accuracy and completeness of mortality data in the Department of Veterans Affairs
© Sohn et al; licensee BioMed Central Ltd. 2006
Received: 10 November 2005
Accepted: 10 April 2006
Published: 10 April 2006
One of the national mortality databases in the U.S. is the Beneficiary Identification and Record Locator Subsystem (BIRLS) Death File that contains death dates of those who have received any benefits from the Department of Veterans Affairs (VA). The completeness of this database was shown to vary widely from cohort to cohort in previous studies. Three other sources of death dates are available in the VA that can complement the BIRLS Death File. The objective of this study is to evaluate the completeness and accuracy of death dates in the four sources available in the VA and to examine whether these four sources can be combined into a database with improved completeness and accuracy.
A random sample of 3,000 was drawn from 8.3 million veterans who received benefits from the VA between 1997 and 1999 and were alive on January 1, 1999 according to at least one source. Death dates found in BIRLS Death File, Medical SAS Inpatient Datasets, Medicare Vital Status, and Social Security Administration (SSA) Death Master File were compared with dates obtained from the National Death Index. A combined dataset from these sources was also compared with National Death Index dates.
Compared with the National Death Index, sensitivity (or the percentage of death dates correctly recorded in a source) was 77.4% for BIRLS Death File, 12.0% for Medical SAS Inpatient Datasets, 83.2% for Medicare Vital Status, and 92.1% for SSA Death Master File. Over 95% of death dates in these sources agreed exactly with dates from the National Death Index. Death dates in the combined dataset demonstrated 98.3% sensitivity and 97.6% exact agreement with dates from the National Death Index.
The BIRLS Death File is not an adequate source of mortality data for the VA population due to incompleteness. When the four sources of mortality data are carefully combined, the resulting dataset can provide more timely data for death ascertainment than the National Death Index and has comparable accuracy and completeness.
Accurate data for mortality ascertainment are of critical importance for epidemiologic and health care outcomes studies. One of the national mortality databases in the U.S. is the Beneficiary Identification and Record Locator Subsystem (BIRLS) Death File that contains death dates of those who have received any benefits from the Department of Veterans Affairs (VA) since the early 1970s. This database has been widely used as the main source of death dates for veterans who received health care from the VA [1–5]. The completeness of this database has been shown to vary widely from cohort to cohort, ranging between 70.0% and 96.5% [6–12] in sensitivity or the percentage of death dates that are correctly recorded.
Three other databases for mortality ascertainment are available in the VA that can supplement the BIRLS Death File. One of them is the VA health care inpatient datasets. Another is the Social Security Administration (SSA) Death Master File which the VA acquires from the SSA. A third source is a subset of the Medicare Vital Status file that contains death dates for all Medicare-enrolled veterans. It is the newest source of mortality data for veterans which became available in the VA in 1999.
This study had two main objectives. The first was to evaluate the completeness and accuracy of death dates in these four sources of mortality data for a sample representative of the VA population. The second objective was to examine whether these four sources could be combined into a database with improved accuracy and completeness for mortality ascertainment.
This study was approved by the Human Subjects Committee and the Research and Development Committee at the Edward Hines Jr. VA Hospital in Hines, Illinois. The study population comprised all veterans who received any benefits from the VA between 1997 and 2002 and were alive on January 1, 1999 according to at least one source of mortality data available within the VA. Veterans who were enrolled in the Veterans Health Administration (VHA), received compensation or pension benefits, or utilized VA health care were included. After excluding those with no date of birth (20,049, 0.2%), those with invalid Social Security Numbers (10,940, 0.1%), and those who were Medicare beneficiaries but did not have an updated record in the Medicare Vital Status file (53,943, 0.6%), 8.3 million veterans were in the sampling frame.
Mortality data available in the VA
The BIRLS is a VA database that contains information on all VA beneficiaries, including veterans discharged from military service since March 1973, Medal of Honor recipients, veterans who received education benefits from the VA, and veterans whose survivors applied for burial benefits . This database contains death dates reported by family members applying for death benefits, VHA hospitals, or the VA National Cemetery Administration. The BIRLS Death File is a subset of the BIRLS that contains data on deceased veterans, including death dates.
The VA database called the Medical SAS Inpatient Datasets (MSID) is another source of death dates for veterans and contains information on patients who are discharged each year from any of the VHA hospitals across the county . This database has been compiled each year since 1970 and includes dates of deaths that occurred in VHA hospitals or shortly after discharge . It has previously been referred to as the Patient Treatment Files or PTF .
The Medicare Vital Status file is a dataset constructed by the Centers for Medicare and Medicaid Services (CMS) and contains demographic and vital status information for all Medicare beneficiaries. The VA has a data sharing agreement with the CMS to receive annually all Medicare data for VA-enrolled veterans . The primary source for dates of death in the Vital Status file is the Death Master File compiled by the Social Security Administration (SSA), but the CMS also updates the Vital Status file with dates of death from other sources, including Medicare claims data .
The Death Master File is produced by the SSA, contains over 70 million deaths, and is updated monthly. This file is populated with death dates which SSA obtains from death reports by family members, funeral homes, state and federal agencies, postal authorities and financial institutions. Previous studies reported sensitivity for the SSA Death Master File ranging between 83% and 95% compared with the National Death Index [9, 17–20]. The VA regularly obtains the SSA Death Master File and its monthly updates from the SSA, and makes them available to the researchers.
The data from the fifth source, namely the National Death Index, are considered the "gold standard" for mortality ascertainment. The National Death Index was established in 1981 by the National Center for Health Statistics to be a central repository of computerized death records for the entire U.S. population. It contains dates and causes of death from actual death records filed in state vital statistics offices since 1979 . Deaths that occurred in a calendar year are added to the National Death Index annually, about 12 months after the end of the year. The lag time between the occurrence and the reporting of a death in the National Death Index may be anywhere between 12 to 24 months. The data in the National Death Index have been evaluated against known deaths from sources such as actual death certificates or direct contact with patients or their families, and have consistently exceeded 95% sensitivity [6, 7, 22–25]. The National Death Index was used in this study as the gold standard in evaluating completeness and accuracy of death dates from the four sources.
National Death Index search
The study sample consisted of 3,000 veterans who were randomly drawn from the sampling frame. It was submitted to the National Death Index for a death date search in January, 2005. We provided Social Security Number (SSN), last name, first name, middle name, date of birth, date of death, sex, and state of residence.
The National Death Index search often returns multiple records as possible matches. The National Death Index uses nine different matching criteria to select possible matches, some of which do not require a match on Social Security Number . A set of criteria must be developed to determine if any of the possible matches are the correct ones. A "true match" is established when the record returned from the National Death Index and the submitted record both belong to the same individual according to chosen criteria. A liberal criterion may increase sensitivity rates of death reporting, but can also increase the number of false positives and thus decrease specificity [6, 7]. A careful choice of match criteria has important implications for the comparison .
Three match criteria were used to establish "true matches" in this study. The first criterion matched on SSN, sex, and two parts of the date of birth (day, month, or year). The probability that the match according to this criterion was correct was estimated to be 95% or higher . Two other match criteria were: at least 7 digits of the SSN, date of birth, last name, first name, middle initial if provided by both sources, and sex; and, at least 7 digits of the SSN, date of death, last name, first name, middle initial if provided by both sources, and sex. These two criteria were used to match those cases with SSNs whose two digits were transposed. We established 96.2% of all true matches using the first criterion and only 3.8% using the other two.
Determining the best death dates by combining mortality data sources
We developed an algorithm to determine the best source of death dates among our study sample. This algorithm used other stratified samples drawn from the pool of veterans who had death dates recorded in any of the four sources. These samples included veterans with death dates (1) that appeared in only one source (4.4% of the study population with a death date in any of the four sources), (2) that appeared in more than one source and agreed (88.1%), and (3) that appeared in more than one source but did not all agree (7.5%). These samples were submitted to the National Death Index along with the study sample. The results from this process of combining data sources are detailed elsewhere and are available upon request . Results for the study sample are described below.
Definitions and Identification Methods of Four Groups for Assessing Completeness and Accuracy of Mortality Data
Definition and Identification Method
A. True Positives
All deceased individuals who were identified as deceased in a source. A valid death date was found in both the source and the NDI.
B. False Positives
All living individuals who were identified as deceased in a source. They have a death date in the source but not in the NDI.
C. False Negatives
All deceased individuals who were identified as alive in a source. They have a death date in the NDI but not in the source.
D. True Negatives
All living individuals who were identified as living in a source. They do not have a death date in the NDI nor in the source.
Using the number of individuals in each group in Table 1, four comparison statistics were computed for each source with the National Death Index data as the gold standard. Sensitivity indicates the per cent of deaths that were correctly recorded in a source, and was computed as the ratio of true positives to true positives plus false negatives [A/(A+C)]. Specificity refers to the per cent of individuals without a death date in the National Death Index who were not recorded as deceased in a source. It is the ratio of true negatives to true negatives plus false positives [D/(B+D)]. Positive predictive value refers to the probability that an individual who was identified as deceased in a source was actually deceased, and is computed as the ratio of true positives to true positives plus false positives [A/(A+B)]. Negative predictive value refers to the probability that an individual who was identified as not deceased in a source was actually not deceased, and is computed as the ratio of true negatives to true negatives plus false negatives [D/(C+D)].
We then computed sensitivity by VA health care use groups. Three groups were defined based on any VHA health care use in 1999–2002: inpatient users, outpatient users, and non-users. Inpatient and outpatient use groups are not mutually exclusive. Almost all inpatient users (99.2%) also used outpatient care during this period. The sensitivity rates were lower for the users of outpatient care only, but we chose to report these rates for all outpatient users to make comparison with previous studies easier [10, 12].
Finally, we computed agreement rates at three different levels of precision: an exact date match, a match within two days, and a year and month match. The agreement rate indicates the proportion of death dates in a source that match at a given level of precision with dates in the National Death Index.
We only used the death dates from the National Death Index that fell between January 1, 1999 and December 31, 2002 to allow for at least 24-month time-lag in death reporting in the National Death Index.
Of the 3,000 records submitted, the National Death Index rejected two records due to incomplete demographic data and returned match results for 2,998 records. Of the veterans with returned possible match records, only 292 (9.7%) could be linked to the sample records as "true matches" using the three match criteria discussed above. After deleting two records rejected by the National Death Index, we had the final sample with 2,998 veterans and 292 deaths confirmed by the National Death Index.
Veteran Population and Study Sample by Selected Individual Characteristics, 1999 – 2002
Age on Jan. 1, 1999
65 or over
Comparison of VA Mortality Data with NDI Data by Source (N = 2,998)*
Deceased in Source
Deceased in NDI
(95% Confidence Interval)
(72.2 – 82.1)
(99.7 – 100.0)
(96.2 – 99.7)
(97.0 – 98.2)
(8.49 – 16.3)
(99.9 – 100.0)
(90.0 – 100.0)
(90.3 – 92.3)
(78.4 – 87.3)
(99.7 – 100.0)
(96.5 – 99.7)
(97.7 – 98.7)
(88.4 – 94.9)
(99.7 – 100.0)
(96.8 – 99.8)
(98.7 – 99.5)
(96.0 – 99.4)
(99.6 – 99.9)
(96.0 – 99.4)
(99.6 – 99.9)
The sensitivity rates were 77.4% for BIRLS Death File, 12.0% for Inpatient Datasets, 83.2% for Medicare Vital Status, and 92.1% for SSA Death Master File. The low rate for the BIRLS Death File was mainly due to unreported (66) rather than misreported (3) deaths. There were 49 and 23 unreported deaths in the Medicare Vital Status and SSA Death Master File, respectively, and 3 misreported deaths in both. When the sensitivity was computed only for Medicare beneficiaries (N = 1,705), the Vital Status file far exceeded any other single source with 99.2% sensitivity.
Sensitivity Rates of Mortality Data by Selected Subgroups and Source for Veterans, 1999 – 2002
Sensitivity (95% Confidence Interval)**
VHA Healthcare Use*
86.3 (76.2 – 93.2)
94.5 (86.6 – 98.5)
100.0 (95.1 – 100.0)
80.2 (73.8 – 85.7)
93.0 (88.4 – 96.2)
100.0 (98.0 – 100.0)
72.1 (62.5 – 80.5)
90.4 (83.0 – 95.3)
95.2 (89.1 – 98.4)
Age on Jan. 1, 1999
67.6 (55.5 – 78.2)
91.5 (82.5 – 96.8)
95.8 (88.1 – 99.1)
65 or older
80.5 (74.7 – 85.5)
92.3 (88.0 – 95.5)
99.1 (96.8 – 99.9)
Agreement Rates of VA Mortality Data with NDI Dates by Source and Level of Precision*
Number of Deaths
Death Dates in a Source that Agree with NDI Dates (%)
Within 2 Days
In Year and Month
There were 292 deaths identified in the combined data, with five unreported and five misreported deaths. The sensitivity of the combined data was 98.3%; the specificity, 99.8%; the positive predictive value, 98.3%; and the negative predictive value, 99.8% (Table 3). For the inpatient and outpatient users, the sensitivities of the combined data were both 100%, indicating that the combined data are as complete as the National Death Index for the VHA users. The rates of agreement with the National Death Index were 97.6%, 98.6%, and 99.7% for exact date match, match within 2 days, and year and month match, respectively.
Source of Death Dates in the Combined Data
A. Death dates found in source
B. Death dates from source used in combined data
C. Death dates found only in source
D. % found only in source (C/A)
E. % of all dates in combined data found only in source (C/292)
This study shows that the BIRLS Death File was extremely accurate but not complete. Of all 292 deaths reported in the National Death Index, 22.6% (66) were not reported and 1.3% (3) were incorrectly reported in the BIRLS Death File. These results suggest that if used alone, the BIRLS Death File is not an adequate source of mortality data for the overall VA population.
Of all four sources available within the VA, the SSA Death Master File was the most complete one for the VA population. If a researcher had to choose any one source for mortality ascertainment for veterans, we recommend the SSA Death Master File. It identified considerably more deaths with slightly more false positives than the BIRLS Death File. However, this finding is not consistent with a study by Page and colleagues  who reported lower sensitivity for the SSA Death Master File than for the BIRLS Death File. Previous studies found that the SSA Death Master File had large variability in completeness of death reporting by age [18, 19], and we suspect that differences in age distributions may explain the inconsistency between the two studies .
This study showed the Medicare Vital Status file to be an important supplemental source of mortality information. It contained death dates for veterans that were not found in any other source available in the VA. For Medicare-enrolled veterans, this file was the most accurate and complete of all the single sources considered in this study. The high sensitivity and agreement rates suggest that it can be used alone as a source of mortality data for Medicare-enrolled veterans.
When available sources were combined, the resulting mortality data proved to be highly comparable to the National Death Index in both accuracy and completeness. For the VHA users, the combined data were 100% complete and 97.9% accurate to the date compared with the National Death Index.
There are some advantages in using the combined data over the National Death Index. While the four data sources described above are readily available free of charge to researchers affiliated with the VA, the National Death Index requires fees for data search that can be quite substantial when data for a large number of subjects are searched. Since both the BIRLS Death File and the SSA Death Master File are updated monthly, they can be used to obtain more recent death dates than the National Death Index.
One limitation of this study is that we examined death dates for a limited time frame (1999 – 2002), while death dates in the BIRLS Death File precede the 1970s and those in the SSA Death Master File precede the 1940s . The accuracy and completeness of the combined data observed for this sample may not apply to deaths that occurred before 1999 and especially to those that occurred before the mid-1970s, since the completeness of both the BIRLS Death File and the SSA Death Master File was poor until mid-1970's [6, 19]. These results may also not be used in studies which follow mortality of veterans using identifiers other than Social Security Numbers, since the completeness of mortality data for those with and without Social Security Numbers are known to be quite different [6, 9].
We found that a combined data set could provide highly accurate and complete data for mortality ascertainment and that hardly any improvement in accuracy or completeness could be achieved by acquiring death dates from the National Death Index. The combined data set can also provide more timely mortality data than the National Death Index whose time-lag for death reporting can be up to 24 months. Given the time- and resource-intensive nature of combining these four sources, a centralized database can be greatly beneficial to researchers by providing them with easy access to high-quality mortality data and thus improving the quality of research involving veteran mortality ascertainment.
The authors gratefully acknowledge funding support from the Department of Veterans Affairs, Veterans Health Administration, Health Services Research and Development Service (SDR 03-157, PI: Min-Woong Sohn; SDR 98-004, PI: Denise Hynes). The paper presents the findings and conclusions of the authors; it does not necessarily represent the views of the Department of Veterans Affairs or Health Services Research and Development Service.
- Ho PM, Masoudi FA, Spertus JA, Peterson PN, Shroyer AL, McCarthy MJ, Grover FL, Hammermeister KE, Rumsfeld JS: Depression predicts mortality following cardiac valve surgery. Ann Thorac Surg 2005, 79: 1255-1259. 10.1016/j.athoracsur.2004.09.047View ArticlePubMed
- O'hare AM, Bertenthal D, Shlipak MG, Sen S, Chren MM: Impact of renal insufficiency on mortality in advanced lower extremity peripheral arterial disease. J Am Soc Nephrol 2005, 16: 514-519. 10.1681/ASN.2004050409View ArticlePubMed
- Sprenkle MD, Niewoehner DE, Nelson DB, Nichol KL: The Veterans Short Form 36 questionnaire is predictive of mortality and health-care utilization in a population of veterans with a self-reported diagnosis of asthma or COPD. Chest 2004, 126: 81-89. 10.1378/chest.126.1.81View ArticlePubMed
- East MA, Jollis JG, Nelson CL, Marks D, Peterson ED: The influence of left ventricular hypertrophy on survival in patients with coronary artery disease: do race and gender matter? J Am Coll Cardiol 2003, 41: 949-954. 10.1016/S0735-1097(02)03006-1View ArticlePubMed
- Young BA, Maynard C, Boyko EJ: Racial differences in diabetic nephropathy, cardiovascular disease, and mortality in a national population of veterans. Diabetes Care 2003, 26: 2392-2399.View ArticlePubMed
- Boyle CA, Decoufle P: National sources of vital status information: extent of coverage and possible selectivity in reporting. Am J Epidemiol 1990, 131: 160-168.PubMed
- Fisher SG, Weber L, Goldberg J, Davis F: Mortality ascertainment in the veteran population: alternatives to the National Death Index. Am J Epidemiol 1995, 141: 242-250.PubMed
- Page WF, Braun MM, Caporaso NE: Ascertainment of mortality in the U.S. veteran population: World War II veteran twins. Mil Med 1995, 160: 351-355.PubMed
- Page WF, Mahan CM, Kang HK: Vital status ascertainment through the files of the Department of Veterans Affairs and the Social Security Administration. Ann Epidemiol 1996, 6: 102-109. 10.1016/1047-2797(95)00126-3View ArticlePubMed
- Dominitz JA, Maynard C, Boyko EJ: Assessment of vital status in Department of Veterans Affairs national databases. comparison with state death certificates. Ann Epidemiol 2001, 11: 286-291. 10.1016/S1047-2797(01)00211-3View ArticlePubMed
- Cowper DC, Kubal JD, Maynard C, Hynes DM: A primer and comparative review of major US mortality databases. Ann Epidemiol 2002, 12: 462-468. 10.1016/S1047-2797(01)00285-XView ArticlePubMed
- Lorenz KA, Asch SM, Yano EM, Wang M, Rubenstein LV: Comparing strategies for United States veterans' mortality ascertainment. Popul Health Metr 2005, 3: 2. 10.1186/1478-7954-3-2PubMed CentralView ArticlePubMed
- VIReC:VIReC Research User Guide: FY2002 VHA Medical SAS Inpatient Datasets. Hines, IL; 2003. [http://www.virec.research.med.va.gov/References/RUG/RUG-Inpatient02.pdf]
- Murphy PA, Cowper DC, Seppala G, Stroupe KT, Hynes DM: Veterans Health Administration inpatient and outpatient care data: an overview. Eff Clin Pract 2002, 5: E4.PubMed
- Veterans Affairs Information Resource Center:Research findings from the VA Medicare data merge initiative: veterans' enrollment, access and use of Medicare and VA health services. Report to the Under Secretary for Health, Department of Veterans Affairs.. 2003. [http://www.virec.research.med.va.gov/DataSourcesName/VA-MedicareData/USHreport.pdf]
- Arnold N, Sohn MW, Maynard C, Hynes DM: VA-NDI Mortality Data Merge Project: Technical Report. 2005.
- Schall LC, Buchanich JM, Marsh GM, Bittner GM: Utilizing multiple vital status tracing services optimizes mortality follow-up in large cohort studies. Ann Epidemiol 2001, 11: 292-296. 10.1016/S1047-2797(00)00217-9View ArticlePubMed
- Wentworth DN, Neaton JD, Rasmussen WL: An evaluation of the Social Security Administration master beneficiary record file and the National Death Index in the ascertainment of vital status. Am J Public Health 1983, 73: 1270-1274.PubMed CentralView ArticlePubMed
- Hill ME, Rosenwaike I: The Social Security Administration's Death Master File: the completeness of death reporting at older ages. Soc Secur Bull 2001, 64: 45-51.PubMed
- Lash TL, Silliman RA: A comparison of the National Death Index and Social Security Administration databases to ascertain vital status. Epidemiology 2001, 12: 259-261. 10.1097/00001648-200103000-00021View ArticlePubMed
- Statistics NCH: National Death Index User's Manual.Hyattsville, MD, National Center for Health Statistics, Centers for Disease Control and Prevention; 1995. [http://www.cdc.gov/nchs/r&d/ndi/ndiusrsguide.htm]
- Stampfer MJ, Willett WC, Speizer FE, Dysert DC, Lipnick R, Rosner B, Hennekens CH: Test of the National Death Index. Am J Epidemiol 1984, 119: 837-839.PubMed
- Acquavella JF, Donaleski D, Hanis NM: An analysis of mortality follow-up through the National Death Index for a cohort of refinery and petrochemical workers. Am J Ind Med 1986, 9: 181-187.View ArticlePubMed
- Williams BC, Demitrack LB, Fries BE: The accuracy of the National Death Index when personal identifiers other than Social Security number are used. Am J Public Health 1992, 82: 1145-1147.PubMed CentralView ArticlePubMed
- Wong O, Harris F, Rosamilia K, Raabe GK: Updated mortality study of workers at a petroleum refinery in Torrance, California, 1959 to 1997. J Occup Environ Med 2001, 43: 1089-1102.View ArticlePubMed
- Fleming C, Fisher ES, Chang CH, Bubolz TA, Malenka DJ: Studying outcomes and hospital utilization in the elderly. The advantages of a merged data base for Medicare and Veterans Affairs hospitals. Med Care 1992, 30: 377-391.View ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.