- Open Access
- Open Peer Review
Evaluation of record linkage of mortality data between a health and demographic surveillance system and national civil registration system in South Africa
© Kabudula et al.; licensee BioMed Central Ltd. 2014
- Received: 17 April 2014
- Accepted: 11 August 2014
- Published: 30 August 2014
Health and Demographic Surveillance Systems (HDSS) collect independent mortality data that could be used for assessing the quality of mortality data in national civil registration (CR) systems in low- and middle-income countries. However, the use of HDSS data for such purposes depends on the quality of record linkage between the two data sources. We describe and evaluate the quality of record linkage between HDSS and CR mortality data in South Africa with HDSS data from Agincourt HDSS.
We applied deterministic and probabilistic record linkage approaches to mortality records from 2006 to 2009 from the Agincourt HDSS and those in the CR system. Quality of the matches generated by the probabilistic approach was evaluated using sensitivity and positive predictive value (PPV) calculated from a subset of records that were linked using national identity number. Matched and unmatched records from the Agincourt HDSS were compared to identify characteristics associated with successful matching. In addition, the distribution of background characteristics in all deaths that occurred in 2009 and those linked to CR records was compared to assess systematic bias in the resulting record-linked dataset in the latest time period.
Deterministic and probabilistic record linkage approaches combined linked a total of 2264 out of 3726 (60.8%) mortality records from the Agincourt HDSS to those in the CR system. Probabilistic approaches independently linked 1969 (87.0%) of the linked records. In a subset of 708 records that were linked using national identity number, the probabilistic approaches yielded sensitivity of 90.0% and PPV of 98.5%. Records belonging to more vulnerable people, including poorer persons, young children, and non-South Africans were less likely to be matched. Nevertheless, distribution of most background characteristics was similar between all Agincourt HDSS deaths and those matched to CR records in the latest time period.
This study shows that record linkage of mortality data from HDSS and CR systems is possible and can be useful in South Africa. The study identifies predictors for death registration and data items and registration system characteristics that could be improved to achieve more optimal future matching possibilities.
- Health and demographic surveillance system (HDSS)
- Agincourt HDSS
- Record linkage
- Civil registration system
- Death registration
- South Africa
Reliable and valid statistics on the levels and causes of mortality are widely acknowledged as essential information for monitoring the impact of health interventions and developing public health policies and programs for improving population health -. An adequate and complete civil registration (CR) system is the ideal source from which to draw such information ,.
Well-functioning CR systems do not exist in the majority of African countries . South Africa is one of the few that produce mortality statistics from a CR system ,, but previous assessments rated their quality as low ,. In recent years, the country has adopted the Africa Programme on Accelerated Improvement of Civil Registration and Vital Statistics (APAI-CRVS) , building on the focused initiatives by Statistics South Africa, the Department of Health, and a group of researchers since the 1990s to improve and strengthen its CR system and cause of death information -. Therefore, there is a continuous need for assessing the quality of CR mortality data to ascertain the impact of these initiatives and identify remaining gaps and options for further improvement.
A number of criteria, organized into a framework of four quality concepts (generalizability, reliability, validity, and policy relevance), have been proposed for comprehensive assessment of the quality of mortality data in CR systems ,. Although most criteria can be evaluated directly from the mortality data recorded in the CR system and administrative information on the system, data from other sources are also required . Combining vital-event data sources, and cooperation among the custodians of such data sources, was encouraged at the 2012 International Network for the Demographic Evaluation of Populations and Their Health in developing countries (INDEPTH) - African Census Analysis Project (ACAP) Bellagio meeting on using longitudinal INDEPTH data, national censuses, Demographic and Health Surveys, and other national surveys for better health policy in Africa .
In South Africa, three INDEPTH Health and Demographic Surveillance Systems (HDSS) collect mortality data in rural populations -. Such data provide an opportunity for comparison with CR data. However, this requires record linkage between the two data sources, which has not been attempted before. Both data sources are protected by strict data-use clauses to protect the confidentiality of the identity and other information of the deceased. Once linked, comparison would also depend on the quality of the matched records.
This paper describes the practical steps we took to set up and execute record linkage of mortality data and evaluates the quality of the matched records between the CR system and the longest-running of the three HDSS centers in South Africa, the Agincourt HDSS ,. It describes how we overcame the challenges of bringing together data that are kept in secure databases and environments almost 600 kilometers apart, each governed by data-security policies that prohibit the off-site and non-staff use of unit-record data that contain personal identifiers.
Records of individuals who died from 1 January 2006 to 31 December 2009 were extracted from the Agincourt HDSS database and saved under password protection on a portable device. An Agincourt HDSS staff member who is familiar with the collection, processing, and coding of mortality data and the stringent data-use policies at Agincourt, and who had previous experience in electronic record linkage, securely brought the data files to Statistics South Africa’s (Stats SA) head office in Pretoria. After confidentiality and data-security agreements were undertaken and signed by the Agincourt HDSS staff member and other members of the record linkage team, the non-Stats SA team members were given access to the secure environment in the Stats SA building where CR data for deaths that occurred within the same period were made available for linkage.
The CR data were captured by Stats SA from Notification of death/still-birth forms (Form BI-1663) that were submitted to the Department of Home Affairs offices for death registration as required by the country’s Births and Deaths Registration Act No 51 of 1992 . As required by the Act, different sections of the form are completed by (i) the person reporting the death, (ii) a medical practitioner (where a medical practitioner is not available, a traditional leader may complete the Death Report (Form BI-1680)), and (iii) a Home Affairs official or member of the South African Police Services if the former is not available ,.
Record linkage procedures
We applied deterministic and probabilistic record linkage approaches to link the Agincourt HDSS and CR mortality data. Variables common to both data sources that we used are: national identity number (a unique 13-digit number assigned to South African citizens), surname, sex, day of death, month of death, year of death, day of birth, month of birth, year of birth, institution/place of death, and village name. For village name, village of the household of the deceased individual in the Agincourt HDSS was matched to place of birth, residency, and death in the CR records. Due to the recording of local tribal area names rather than the official village names for some deaths on the CR death registration forms, the place names in the CR records were mapped to their equivalent Agincourt HDSS village names prior to the record linkage exercise.
Matches in trimmed CR dataset
Matches in full CR dataset
Match on National ID No
Match on Surname, Sex, Date of birth, Date of death
Match on Surname, Sex, Date of birth, Year of death, Month of death
Match on Surname, Sex, Year of birth, Month of birth, Date of death
Match on JW(Surname) > =0.85, Sex, Date of birth, Date of death
Match on JW(Surname) > =0.85, Sex, Date of birth, Year of death, Month of death
Match on JW(Surname) > =0.85, Sex, Year of birth, Month of birth, Date of death
Match on JW(Surname) > =0.85, Sex, Year of birth, Year of death, Agincourt HDSS village = CR place of birth
Match on JW(Surname) > =0.85, Sex, Year of birth, Year of death, Agincourt HDSS village = CR place of residence
Match on JW(Surname) > =0.85, Sex, Year of birth, Year of death, Agincourt HDSS village = CR place of death
Match on JW(Surname) > =0.85, Sex, Year of birth, Date of death, died at hospital
Match on JW(Surname) > =0.85, Sex, Date of birth, Year of death, died at hospital
In probabilistic record linkage, a pair of records from two data sources is classified as a match based on the statistical probability that the values of common variables from the two data sources belong to the same individual ,-. Each matching variable is assigned a weight that indicates its contribution to the probability of accurately designating a pair of records as a match or non-match ,,. The weight of a matching variable, i, is calculated from the probability that records belonging to the same individual agree, denoted by m i , and the probability that records belonging to different individuals agree, denoted by u i ,,. Record pairs where variable i agrees receive a weight value of , and those where the variable disagrees get a weight value of . A record pair is classified as a match if the sum of the weights on all the matching variables is above a particular threshold value. We estimated m i and u i values for all matching variables, except national identity number, using the Expectation Maximization (EM) algorithm ,,. Only surname pairs with a JW score ≥ 0.85 were considered as matches. Similar to the work of Méray et al.  and Tromp et al. , the threshold value for determining which record pairs were matches was derived from an estimate of the proportion of true matches among all possible record pair combinations produced by the EM algorithm. The estimated proportion of true matches was multiplied by the total number of all possible record pair combinations to obtain the total number of true matches. Thereafter, all possible record pair combinations were sorted in descending order of the sum of the weights on all matching variables and the top n record pairs, where n equals the calculated number of true matches, were designated as matches.
Evaluation of record linkage results
Since we set strict deterministic matching rules with very narrow margins for error, evaluation of the record linkage results focused on matches generated by the probabilistic record linkage approach. Their quality was evaluated using sensitivity and PPV calculated from a subset of records that were linked by means of national identity number. This is justifiable because national identity numbers contain a check digit that prevents incorrect matching. We also compared characteristics of the deceased individuals in the Agincourt HDSS dataset whose records were matched and unmatched to records in the CR dataset in logistic regression models to identify characteristics associated with successful matching. Variables selected for analysis included sex, age, nationality, having a national identity number, residency status, level of education, wealth quintile, year of death, and place of death. Wealth quintiles were derived from data on ownership of assets such as cattle, a car, and cell phone and access to amenities including drinking water and sanitation using principal component analysis . In addition, the distribution of background characteristics in all deaths that occurred in 2009 and those linked to CR records was compared using Pearson Chi squared tests to assess systematic bias in the resulting record-linked dataset in the latest time period.
The record linkage of the data between the two data sources was done using Microsoft SQL Server 2008 which had the EM algorithm implemented in Microsoft C# progamming language, integrated in it as a common language runtime (CLR) function. The JW algorithm we used is part of the SimMetrics library . It was also integrated in Microsoft SQL Server 2008 as a CLR function. Stata (version 11.2, Stata Corporation, Texas, USA) was used for data analysis.
The study received ethical approvals from the University of Queensland School of Population Health Research Ethics Committee (approval no. JJ010911), the South African Medical Research Council Ethics Committee (EC008-6/2011), and the University of the Witwatersrand Human Research Ethics Committee (Medical) (M120106).
Weights for the probabilistic linkage approach with blocking on sex and year of death
Day of birth
Month of birth
Year of birth
Month of death
Day of death
Institution/place of death
Weights for the probabilistic linkage approach with blocking on sex and year of birth
Day of birth
Month of birth
Year of death
Month of death
Day of death
Institution/place of death
Most of the record pairs that were generated by linking the remaining Agincourt HDSS records with records in the full CR dataset had Hazyview, a town about 40 km away from the Agincourt HDSS, as the reported place of birth, residence, or death in the CR dataset. There were also a few cases for which the reported place of birth, residence, or death in the CR dataset is indeed within the Agincourt HDSS study site, such as Belfast and Somerset, but had not been assigned to the Bushbuckridge municipality in the CR system. For example, one of the death records from Somerset village in the Agincourt HDSS dataset was in the CR dataset assigned to Somerset West, a town in the Western Cape province. Over half (53.7%) of the combined deterministic matches were found via the deceased’s identity number (Table 1).
In a subset of 708 records from the Agincourt HDSS that were deterministically linked by means of national identification number, the probabilistic approaches yielded sensitivity of 90.0% and a positive predictive value of 98.5%.
Factors predictive of successful matching of death records between Agincourt HDSS and South African CR system
(95% confidence interval)
(95% confidence interval)
National Identity number recorded in VA system
Temporary and other
Year of death
Place of death
Vehicle accident site
Background characteristics of all 2009 Agincourt HDSS deaths compared to those matched with CR records
All deaths in Agincourt HDSS (n = 846)
Deaths matched with CR records (n = 618)
Temporary and other
Highest level of education
Place of death
Vehicle accident site
In South Africa, there are no comprehensive systems of pre-linked health data covering large or entire populations such as the Manitoba Population Health Information System in Canada  or systems that routinely or periodically link data at any level of jurisdiction. In this study, we have assessed the feasibility of setting up and executing record linkage of mortality data and evaluated the quality of the matched records between the Agincourt HDSS and the CR system. The study was motivated by the unexplored potential of HDSS as sources of independent mortality data for assessing the quality of mortality data in CR systems in low-and middle-income countries.
Using deterministic and probabilistic approaches, our study yielded a matching rate of 60.8% for mortality records from 2006 to 2009, with sensitivity of 90% and PPV of 98.5% for the probabilistic linkage. This matching rate was influenced by a number of limitations relating to the amount, accuracy, completeness, and consistency of information available for the linkage process . First, we had a small number of common variables in the two datasets. Second, collection of the ideal unique-identifier variable, national identity number, was introduced gradually in the Agincourt HDSS over the period of our investigation, starting only in 2007. However, it is worth noting that as of 2013, national identity number was available on 68% of the individuals still under surveillance in the Agincourt HDSS. Therefore, national identity number has an increased future potential as a unique matching variable.Third, we set strict deterministic matching rules with narrow margins for error, such as in the case of the spellings of surnames. Fourth, there has been a particular problem with the reporting of tribal area names instead of village names for some deaths in the death registration system. As more than one village is contained in a tribal area, it is not possible to correct this data entry. Last, the use of proxy respondents, inevitably, in both VA and CR systems, and that VA interviews are conducted one to 11 months after death, may also have reduced the accuracy of individual-level information.
While the record linkage approach employed in this study would typically allow the assessment of completeness using a standard two-source capture-recapture analysis ,, it is not possible in our study. This stems from difficulties in identifying CR deaths that occurred within the Agincourt HDSS borders due to the recording of local tribal area names rather than the official village names on the CR death registration forms for some deaths. The three tribal areas containing the study site additionally include areas not covered by the Agincourt HDSS. Furthermore, the places of birth, death and residence in the CR data, reported by the relative or friend of the deceased, were not verified against the StatsSA official or Agincourt HDSS colloquial place names. Valuable lessons were learned in this regard, and recommendations are offered in the Conclusion.
Even though the matching rate in this study is low and it is not possible to assess completeness of death registration using a standard two-source capture-recapture analysis due to the limitations above, the similarity in the distribution of most of the background characteristics in all Agincourt HDSS deaths compared to those matched with CR records in the latest time period (2009) suggests that the record-linked data can enhance understanding of death registration practices into the CR system through identifying subgroups likely to be underrepresented in the CR data. For example, the finding that after adjusting for other variables, matching rates are significantly lower for records belonging to more vulnerable people, including poorer persons, children <5 years, and non-South Africans could possibly be interpreted to mean that their deaths are less likely to be registered. In addition, adding cause of death data to the record-linked data can also allow cause attribution and leading cause of death comparisons between the data sources. Such analyses, accompanied by careful interpretation, can form a useful basis from where to adjust cause of death data according to observed biases. At the individual level, misclassification patterns can be identified, which can offer insight into newly identified and re-occurring biases in cause of death attribution. Cause of death analyses using the record-linked data generated in this linkage study are presented in a forthcoming paper .
Despite strict policies to protect the confidentiality and safety of the data reported into each system, record linkage of mortality data between a CR system and an HDSS was possible in our study. To our knowledge, our study is the first in South Africa and possibly in sub-Saharan Africa to assess the feasibility and utility of linking HDSS and CR mortality data. The resultant data are useful for assessing selected population and individual health measures as referred to above, and hold potential to improve rural data quality.
We suggest the following five crucial contributions for further fruitful linkage exercises: the routine collection of national identity number in all the South African HDSSs; collaborative efforts to address place-name inconsistencies; recording of actual village/town/suburb names on death notification forms instead of tribal area names or adequate provision to provide for both; the development of an electronic place-name database, linked to detailed maps, against which to verify place names reported into the CR system, for use by Home Affairs registration offices; and aligning study site borders with established official borders when setting up or extending HDSS sites.
Given our success in matching with surnames in this study and other studies’ successes in using names ,, we additionally recommend that in addition to the surname, the deceased’s full names (which are already captured on notice of death/stillbirth forms) be included in StatsSA datasets. Finally, concerted action among the governmental departments involved, health researchers, and relevant health data advisory committees is suggested to revitalize/modify the data fields on the notification form such that it is possible to identify the place of death, death registration, most recent employment prior to death, and residence of the deceased.
From a broader perspective, the methods and findings from this study are also of interest given the potential for application in other HDSS sites. Currently there are more than 45 HDSS sites across Africa, Asia and Oceania ,. Conducting similar studies could serve to evaluate CR data where available, help identify gaps in national or sample CR systems, and where feasible, guide improved mortality and cause of death estimates. Of special interest would be the conduct of a similar study using data from an urban HDSS, such as DodaLab in Vietnam , to obtain empirical evidence for or against the general assumption that death registration is more complete in urban compared to rural areas and to help identify under-registered groups in urban areas. Such an empirical approach has potential to strengthen the evidence base for population health assessment and policy in developing countries where CR systems are weak.
Finally, our study provides scarce empirical evidence about factors affecting death registration, which has implications for strategies to accelerate death registration in countries with deficient CR systems.
We thank Statistics South Africa and the MRC/Wits Rural Public Health and Health Transitions Research Unit (Agincourt) for making available data used in this study. We are also grateful to Statistics South Africa for housing the matching exercise at the head office in Pretoria; Ms Ramadimetja Matji, Ms Aletia Barkley, and Ms Kerotse Mmatli for their participation in meetings and contributions to securing the data prior to the matching exercise; Ms Marlanie Moodley for preparing maps of the Agincourt and Bushbuckridge areas; and Ms Ria Laubscher for her assistance during the matching exercise.
The study was supported by the MRC/Wits Rural Public Health and Health Transitions Research Unit (Agincourt), the South African Medical Research Council (MRC), and Statistics South Africa. The study was conducted while the second author held a University of Queensland Research Scholarship and the Endeavour International Postgraduate Research Scholarship at the University of Queensland, Brisbane, Australia. The Agincourt HDSS is funded by the Medical Research Council and University of the Witwatersrand, South Africa, Wellcome Trust, UK (grant no. 058893/Z/99/A, 069683/Z/02/Z, 085477/Z/08/Z), and National Institute on Aging of the NIH (grants 1R24AG032112-01 and 5R24AG032112-03). This paper was first presented at the INDEPTH Scientific Conference, October 2013, and was supported by an INDEPTH travel award. The funders had no role in study design, data collection, analysis, decision to publish, or preparation of the manuscript.
- Lozano R, Naghavi M, Foreman K, Lim S, Shibuya K, Aboyans V, Abraham J, Adair T, Aggarwal R, Ahn SY, AlMazroa MA, Alvarado M, Anderson HR, Anderson LM, Andrews KG, Atkinson C, Baddour LM, Barker-Collo S, Bartels DH, Bell ML, Benjamin EJ, Bennett D, Bhalla K, Bikbov B, Abdulhak AB, Birbeck G, Blyth F, Bolliger I, Boufous S, Bucello C: Global and regional mortality from 235 causes of death for 20 age groups in 1990 and 2010: a systematic analysis for the Global Burden of Disease Study 2010. Lancet 2012, 380: 2095-2128. 10.1016/S0140-6736(12)61728-0View ArticlePubMedGoogle Scholar
- Mahapatra P, Shibuya K, Lopez AD, Coullare F, Notzon FC, Rao C, Szreter S: Civil registration systems and vital statistics: successes and missed opportunities. Lancet 2007, 370: 1653-1663. 10.1016/S0140-6736(07)61308-7View ArticlePubMedGoogle Scholar
- Wang H, Dwyer-Lindgren L, Lofgren KT, Rajaratnam JK, Marcus JR, Levin-Rector A, Levitz CE, Lopez AD, Murray CJ: Age-specific and sex-specific mortality in 187 countries, 1970-2010: a systematic analysis for the Global Burden of Disease Study 2010. Lancet 2012, 380: 2071-2094. 10.1016/S0140-6736(12)61719-XView ArticlePubMedGoogle Scholar
- Carter KL, Rao C, Lopez AD, Taylor R: Mortality and cause-of-death reporting and analysis systems in seven pacific island countries. BMC Public Health 2012, 12: 436. 10.1186/1471-2458-12-436View ArticlePubMedPubMed CentralGoogle Scholar
- Rao C, Osterberger B, Anh TD, MacDonald M, Chúc NTK, Hill PS: Compiling mortality statistics from civil registration systems in Viet Nam: the long road ahead. Bull World Health Organ 2009, 87: 58-65. 10.2471/BLT.07.050138View ArticleGoogle Scholar
- Africa Programme on Accelerated Improvement of Civil Registation and Vital Statistics (APAI-CRVS). Economic Commission for Africa, United Nations, New York; 2011.Google Scholar
- Setel PW, Macfarlane SB, Szreter S, Mikkelsen L, Jha P, Stout S, AbouZahr C: A scandal of invisibility: making everyone count by counting everyone. Lancet 2007, 370: 1569-1577. 10.1016/S0140-6736(07)61307-5View ArticlePubMedGoogle Scholar
- Bradshaw D, Groenewald P, Laubscher R, Nannan N, Nojilana B, Norman R, Pieterse D, Schneider M, Bourne DE, Timaeus I, Dorrington R, Johnson L: Initial burden of disease estimates for South Africa, 2000. SAMJ 2003, 93: 682-688.PubMedGoogle Scholar
- Mathers CD, Ma Fat D, Inoue M, Rao C, Lopez AD: Counting the dead and what they died from: an assessment of the global status of cause of death data. Bull World Health Organ 2005, 83: 171-177c.PubMedPubMed CentralGoogle Scholar
- Bradshaw D, Kielkowski D, Sitas F: New birth and death registration forms - a foundation for the future, a challenge for health workers? SAMJ 1998, 88: 971-974.PubMedGoogle Scholar
- Rao C, Bradshaw D, Mathers CD: Improving death registration and statistics in developing countries: lessons from sub-Saharan Africa. South Afr J Demogr 2004, 9: 81-99.Google Scholar
- Bah S: Multiple forces working in unison: the case of rapid improvement of vital statistics in South Africa post-1996. World Health Popul 2009, 11: 50-59. 10.12927/whp.2013.21017View ArticlePubMedGoogle Scholar
- Rao C, Lopez AD, Yang G, Begg S, Ma J: Evaluating national cause-of-death statistics: principles and application to the case of China. Bull World Health Organ 2005, 83: 618-625.PubMedPubMed CentralGoogle Scholar
- Joubert J, Rao C, Bradshaw D, Vos T, Lopez AD: Evaluating the quality of national mortality statistics from civil registration in South Africa, 1997-2007. PLoS ONE 2013, 8: e64592. 10.1371/journal.pone.0064592View ArticlePubMedPubMed CentralGoogle Scholar
- Using Longitudinal INDEPTH Data, National Censuses, DHS, and Other National Surveys for Better Health Policy in Africa. Report of Meeting nr 1352. INDEPTH Network, Bellagio, Italy; 2012.Google Scholar
- Sankoh O: Global health estimates: stronger collaboration needed with low-and middle-income countries. PLoS Med 2010, 7: e1001005. 10.1371/journal.pmed.1001005View ArticlePubMedPubMed CentralGoogle Scholar
- Ye Y, Wamukoya M, Ezeh A, Emina J, Sankoh O: Health and demographic surveillance systems: a step towards full civil registration and vital statistics system in sub-Sahara Africa? BMC Public Health 2012, 12: 741. 10.1186/1471-2458-12-741View ArticlePubMedPubMed CentralGoogle Scholar
- Sankoh O, Byass P: The INDEPTH Network: filling vital gaps in global epidemiology. Int J Epidemiol 2012, 41: 579-588. 10.1093/ije/dys081View ArticlePubMedPubMed CentralGoogle Scholar
- Kahn K, Collinson MA, Gómez-Olivé FX, Mokoena O, Twine R, Mee P, Afolabi SA, Clark BD, Kabudula CW, Khosa A, Khoza S, Shabangu MG, Silaule B, Tibane JB, Wagner RG, Garenne ML, Clark SJ, Tollman SM: Profile: Agincourt Health and Socio-demographic Surveillance System. Int J Epidemiol 2012, 41: 988-1001. 10.1093/ije/dys115View ArticlePubMedPubMed CentralGoogle Scholar
- Kahn K, Tollman SM, Collinson MA, Clark SJ, Twine R, Clark BD, Shabangu M, Gomez-Olive FX, Mokoena O, Garenne ML: Research into health, population and social transitions in rural South Africa: data and methods of the Agincourt Health and Demographic Surveillance System. Scand J Public Health 2007, 35: 8-20. 10.1080/14034950701505031View ArticleGoogle Scholar
- Kahn K, Tollman SM, Garenne M, Gear JSS: Validation and application of verbal autopsies in a rural area of South Africa. Trop Med Int Health 2000, 5: 824-831. 10.1046/j.1365-3156.2000.00638.xView ArticlePubMedGoogle Scholar
- Division UNS: Principles and Recommendations for a Vital Statistics System. Revision 3. Final Draft. United Nations, New York; 2013.Google Scholar
- Births and Deaths Registration Act, 1992 (No. 51 of 1992). In: Government Gazette No. 13953. Government Printer, Cape Town; 1992.Google Scholar
- Mortality and Causes of Death in South Africa, 2008: Findings from Death Notification. Statistical Release P0309.3. Statistics South Africa, Pretoria; 2010.Google Scholar
- Deaths Certificates. , [http://www.home-affairs.gov.za/index.php/death-certificates1]
- Li B, Quan H, Fong A, Lu M: Assessing record linkage between health care and Vital Statistics databases using deterministic methods. BMC Health Serv Res 2006, 6: 48. 10.1186/1472-6963-6-48View ArticlePubMedPubMed CentralGoogle Scholar
- Machado CJ: A literature review of record linkage procedures focusing on infant health outcomes. Cadernos de Saúde Pública 2004, 20: 362-371. 10.1590/S0102-311X2004000200003View ArticlePubMedGoogle Scholar
- Maso LD, Braga C, Franceschi S: Methodology used for software for automated linkage in Italy (SALI). Comp Biomed Research 2001, 34: 395.Google Scholar
- Victor TW, Mera RM: Record linkage of health care insurance claims. J Am Med Inform Assoc 2001, 8: 281-288. 10.1136/jamia.2001.0080281View ArticlePubMedPubMed CentralGoogle Scholar
- Winkler WE: String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage. In Proceedings of the Section on Survey Research Methods. American Statistical Association, Alexandria; 1990:354-359.Google Scholar
- Durham E, Xue Y, Kantarcioglu M, Malin B: Quantifying the correctness, computational complexity, and security of privacy-preserving string comparators for record linkage. Inform Fusion 2012, 13: 245-259. 10.1016/j.inffus.2011.04.004View ArticleGoogle Scholar
- Jaro MA: Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida. J Am Stat Assoc 1989, 84: 414-420. 10.1080/01621459.1989.10478785View ArticleGoogle Scholar
- Sariyar M, Borg A, Pommerening K: Evaluation of record linkage methods for iterative insertions. Methods Inf Med 2009, 48: 429-437. 10.3414/ME9238View ArticlePubMedGoogle Scholar
- Howe GR: Use of computerized record linkage in cohort studies. Epidemiol Rev 1998, 20: 112-121. 10.1093/oxfordjournals.epirev.a017966View ArticlePubMedGoogle Scholar
- Beauchamp A, Tonkin AM, Kelsall H, Sundararajan V, English DR, Sundaresan L, Wolfe R, Turrell G, Giles GG, Peeters A: Validation of de-identified record linkage to ascertain hospital admissions in a cohort study. BMC Med Res Methodol 2011, 11: 42. 10.1186/1471-2288-11-42View ArticlePubMedPubMed CentralGoogle Scholar
- Cook L, Olson L, Dean J: Probabilistic record linkage: relationships between file sizes, identifiers, and match weights. Methods Inf Med 2001, 40: 196-203.PubMedGoogle Scholar
- Jaro MA: Probabilistic linkage of large public health data files. Stat Med 1995, 14: 491-498. 10.1002/sim.4780140510View ArticlePubMedGoogle Scholar
- Nitsch D, Morton S, DeStavola BL, Clark H, Leon DA: How good is probabilistic record linkage to reconstruct reproductive histories? Results from the Aberdeen children of the 1950 s study. BMC Med Res Methodol 2006, 6: 15. 10.1186/1471-2288-6-15View ArticlePubMedPubMed CentralGoogle Scholar
- Grannis SJ, Overhage JM, Hui S, McDonald CJ: Analysis of a probabilistic record linkage technique without human review. AMIA Annu Symp Proc 2003, 2003: 259-263.PubMed CentralGoogle Scholar
- Herzog TN, Scheuren F, Winkler WE: Data Quality and Record Linkage Techniques. Springer, Heidelberg; 2007.Google Scholar
- Méray N, Reitsma JB, Ravelli AC, Bonsel GJ: Probabilistic record linkage is a valid and transparent tool to combine databases without a patient identification number. J Clin Epidemiol 2007, 60: 883-e881. 10.1016/j.jclinepi.2006.11.021Google Scholar
- Tromp M, Ravelli AC, Bonsel GJ, Hasman A, Reitsma JB: Results from simulated data sets: probabilistic record linkage outperforms deterministic record linkage. J Clin Epidemiol 2011, 64: 565-572. 10.1016/j.jclinepi.2010.05.008View ArticlePubMedGoogle Scholar
- Filmer D, Pritchett LH: Estimating wealth effects without expenditure data or tears: an application to educational enrollments in states of India. Demography 2001, 38: 115-132.PubMedGoogle Scholar
- SimMetrics , [http://sourceforge.net/projects/simmetrics]
- Roos NP, Black CD, Frohlich N, Decoster C, Cohen MM, Tataryn DJ, Mustard CA, Toll F, Carriere KC, Burchill CA, MacWilliam L, Bogdanovic B: A population-based health information system. Med Care 1995, 33: DS13-DS20. 10.1097/00005650-199533020-00001PubMedGoogle Scholar
- Karmel R, Rosman D: Linkage of health and aged care service events: comparing linkage and event selection methods. BMC Health Serv Res 2008, 8: 149. 10.1186/1472-6963-8-149View ArticlePubMedPubMed CentralGoogle Scholar
- Chandrasekar C, Deming WE: On a method of estimating birth and death rates and the extent of registration. J Am Stat Assoc 1949, 44: 101-115. 10.1080/01621459.1949.10483294View ArticleGoogle Scholar
- Hook EB, Regal RR: Capture-recapture methods in epidemiology: methods and limitations. Epidemiol Rev 1995, 17: 243-264.PubMedGoogle Scholar
- Joubert J, Bradshaw D, Kabudula C, Rao C, Kahn K, Mee P, Tollman S, Lopez AD, Vos T: Record-linkage comparison of verbal autopsy and routine civil registration death certification in rural north-east South Africa: 2006-09. Int J Epidemiol. In press.Google Scholar
- Kabudula CW, Clark BD, Gómez-Olivé FX, Tollman S, Menken J, Reniers G: The promise of record linkage for assessing the uptake of health services in resource constrained settings: a pilot study from South Africa. BMC Med Res Methodol 2014, 14: 71. 10.1186/1471-2288-14-71View ArticlePubMedPubMed CentralGoogle Scholar
- Quantin C, Binquet C, Bourquard K, Pattisina R, Gouyon-Cornet B, Ferdynus C, Gouyon J-B, Allaert F-A: Which are the best identifiers for record linkage? Inf Health and Social Care 2004, 29: 221-227. 10.1080/14639230400005974Google Scholar
- Tran TK, Eriksson B, Nguyen CT, Horby P, Bondjers G, Petzold M: DodaLab: an urban health and demographic surveillance site, the first three years in Hanoi, Vietnam. Scand J Public Health 2012, 40: 765-772. 10.1177/1403494812464444View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.