Developing a comprehensive time series of GDP per capita for 210 countries from 1950 to 2015
© James et al.; licensee BioMed Central Ltd. 2012
Received: 19 August 2011
Accepted: 11 June 2012
Published: 30 July 2012
Income has been extensively studied and utilized as a determinant of health. There are several sources of income expressed as gross domestic product (GDP) per capita, but there are no time series that are complete for the years between 1950 and 2015 for the 210 countries for which data exist. It is in the interest of population health research to establish a global time series that is complete from 1950 to 2015.
We collected GDP per capita estimates expressed in either constant US dollar terms or international dollar terms (corrected for purchasing power parity) from seven sources. We applied several stages of models, including ordinary least-squares regressions and mixed effects models, to complete each of the seven source series from 1950 to 2015. The three US dollar and four international dollar series were each averaged to produce two new GDP per capita series.
Results and discussion
Nine complete series from 1950 to 2015 for 210 countries are available for use. These series can serve various analytical purposes and can illustrate myriad economic trends and features. The derivation of the two new series allows for researchers to avoid any series-specific biases that may exist. The modeling approach used is flexible and will allow for yearly updating as new estimates are produced by the source series.
GDP per capita is a necessary tool in population health research, and our development and implementation of a new method has allowed for the most comprehensive known time series to date.
KeywordsGDP GDP per capita Income Social determinants Covariate Indicator
Income per capita is one of the most widely used socioeconomic predictors of health, and the relationship between income and health has been studied extensively. In his seminal work in 1975, Preston  framed three ways in which income and health are related, focusing on mortality as a measure of health. These mechanisms, summarized in the Preston curve, suggest that the level of income influences the level of health, the level of income influences the rate of change in health, and the rate of change of income influences the rate of change of health. Further economic and demographic research has also illustrated the depth of this relationship [2–9]. Gross domestic product (GDP) per capita is the most widely used indicator for country-level income  and has been used in modeling health outcomes , mortality trends [12, 13], cause-specific mortality estimation , health system performance and finances [13, 14], and several other topics of interest.
Over the years, the implications of these studies cultivated a global focus on improving health through economic policy and growth. The converse relationship, i.e., the effect of health on the economy, has also been studied extensively by macroeconomists [15–18]. In 2000, the World Health Organization (WHO) launched the Commission for Macroeconomics in Health , which studied the dynamics through which health impacts economic integrity. The commission heralded new goals and guidelines, which suggested that health interventions resulting in the aversion of 330 million disability-adjusted life years by 2010 would produce savings of up to US$ 180 billion per year by 2015. Later, in 2005, WHO started the Commission for Social Determinants in Health , which sought to develop a more comprehensive framework to describe factors that predict health and to also highlight the critical role of economic well-being in the attainment of better health.
Given the critical relationship between income and health, GDP per capita is one of the most widely used covariates in population health research. It is also one of the most regularly measured economic indicators, with estimates produced quarterly or annually by countries themselves as well as agencies such as the World Bank (WB) , the United Nations Statistics Division , and the International Monetary Fund (IMF)  and by institutions such as the University of Pennsylvania  and the University of Groningen . The currently available data sources suffer from a series of limitations. First, the calculation of GDP varies across sources  (though it is generally defined as being the sum of private consumption, gross investment, government spending, and net exports [exports minus imports] ). Second, each of the sources for GDP per capita provides estimates for a range of country-years, but no particular source provides a complete dataset for all countries and years, which is often what is needed by health researchers. Third, GDP per capita estimates for any given country-year can vary dramatically depending on the source . This variation is particularly exaggerated in developing countries with low economic infrastructure or unstable economic conditions , which are often the countries of great interest to population health researchers. Finally, geopolitical events causing a state’s acquisition or loss of sovereignty can result in extended time periods without GDP estimates. This is most evidently the case for the former Soviet Union, where no GDP estimates exist for any of the constituent republics prior to the USSR’s dissolution. This combination of issues means that any study involving the use of GDP per capita can be subject to significant variation and completeness depending on the sources used and on the country-time period of interest.
To address some of these limitations, in this paper we propose a method for achieving two goals. Goal 1 was to impute missing country-years for each available series for all countries and years from 1950 to 2015. Goal 2 was to create a new US dollar (USD) series and a new international dollar (ID) (purchasing power parity [PPP]) series based on the competed source series also comprehensive of 210 countries from 1950 to 2015.
Available data sources and time span of GDP estimates
Constant LCUs (base year)
Constant USD (base year)
Constant ID (base year)
The Penn and WB ID series were expressed in 2005 constant ID. The IMF ID series was expressed in “current” or “historical” ID and also in constant local currency units (LCUs), neither of which were immediately comparable to the other series. To convert the series to constant 2005 ID, the current ID value for the year of 2005 was kept for each country, and the growth rate derived from the constant LCU series was applied to this value to chain estimates forward and backward. This created the IMF constant 2005 ID series. The Maddison series offered portions of GDP estimates that were integral to our analytical strategy but also had two main weaknesses. First, the Maddison series was expressed only in constant 1990 ID, and since there was no reliable method for converting these to 2005 ID, we only used this series as a predictor variable in modeling the missing portions of our other series. Since the series is expressed in constant 1990 terms, this limitation should not inhibit our modeling strategy’s ability to make accurate estimations. The second weakness of the Maddison series was that it was based on the most recent set of national accounts data released no later than March 2010. Other GDP estimates were based on more recently updated sets of data. Despite this older data vintage, we opted to include the Maddison series in our analysis because it offered estimates for 8,693 country-years (out of 15,780 country-years possible), many of which were not included in the other GDP sources at our disposal. Consequently, we were able to input more data into our models, particularly for the years before 1980 and for the former Soviet Union. This implementation is described in more detail below.
Time span of available data from each source series for Somalia
Range of available years of data
IMF international dollar, 2005 base year
Penn international dollar, 2005 base year
World Bank international dollar, 2005 base year
Maddison international dollar, 1990 base year
World Bank US dollar, 2005 base year
IMF US dollar, 2005 base year
UNSTAT US dollar, 2005 base year
During the assessment of data and development of a modeling method we observed a number of outliers. These data points seemed implausibly high or low for a particular country-series-year in the context of surrounding data points and dramatically altered the predictions from our models when included. Consequently, after confirming on a case-by-case basis that there were no outstanding geopolitical or economic incidents that could explain the anomaly, we removed the points from the series prior to modeling our estimates. Out of a total of 48,781 data points, we identified 111 as outliers. The specific country-series-year data points that were removed are listed in Additional file 2: Annex 2.
We approached Goal 1 of filling in the missing years between 1950 and 2015 for each series and predicting a series if a particular source did not include estimates for a given country through the following steps.
Imputing missing years for existing series
We started the project with 48,670 country-year-series of data (after removing outliers and former countries such as Former Yugoslavia) distributed unevenly across the seven source data series. Some country-series were missing completely or had limited time frames of income estimates. For example, neither the WB series nor the IMF series offer estimates for Somalia. Both IMF series for Afghanistan are available only from 2002 to 2015. Some countries showed extremely sparse data coverage, particularly smaller countries, such as Aruba or Turks and Caicos, which only had estimates from the UN series. As noted above, estimates for the constituent republics of the USSR are completely missing for the years prior to the dissolution of the USSR. In our modeling approach, we sought to complete each time series for each country from 1950 to 2015 for a total of 97,020 country-year-series data points. Put another way, our database was missing roughly 50% of its estimates. From the completed database, we intended to generate the two new Institute for Health Metrics and Evaluation (IHME) GDP per capita series.
For an overall example, the exponential growth rate from the UNSTAT estimates were regressed on the growth rates from the IMF ID, IMF USD, Maddison, WB ID, and WB USD, each as separate models. The growth rate predicted from each of these six regressions was then averaged to produce the estimated growth rate for UNSTAT.
Using this averaged growth rate, we make predictions for country-years that are missing. For example, across the seven series, estimates exist in differing time spans for Somalia from 1950 to 2008, but not all series have estimates for every year in that time span. Using the estimated growth rates we forecast and backcast the missing years in each series. When missing years are flanked by years that do have estimates on both sides (for example, if a series is complete for 1950 to 1970 and 1980 to 1990 but is missing estimates from 1970 to 1980), the chain-forward and chain-backward predictions for this period are averaged. If a series is missing completely from a country, no new GDP per capita levels are predicted at this stage.
Making predictions for a missing series
In the second stage of modeling we generate estimates for series that are completely missing for a given country. For example, the WB ID series does not include estimates for Somalia but does have estimates for 166 other countries. In order to conduct this stage of our model, we made the assumption that the series-to-series relationship that exists in country-years where both series exist should exist in other country-years where only one series provides predictions.
where the ln GDP per capita of each series is regressed against all other series, similar to the growth rates stage above. The model includes a country-nested-in-region random intercept to capture effects that may be intrinsic to a particular region-country. To capture the potential differential relationship between GDP series and countries, we incorporated a country-nested-in-region random slope on . These additional model specifications are based on the assumption that there is an association between a country’s economic patterns and other countries in that region. We also conducted a nonnested version of the same model and observed that the model was not sensitive to the nesting based on the ultimate GDP per capita estimates resulting from each approach. Regions were determined by the GBD Study  and are based on both geographical location and economic status.
Making out-of-sample predictions
Creating estimates for former USSR republics
One of the myriad complications in estimating and analyzing a comprehensive time series is approaching the changes in countries’ sovereignty status. It is difficult to produce a time series that is both comprehensive and appropriately reflective of geopolitical chronology. The USSR republics posed a unique challenge in our estimation process. None of the data sources had estimates for any of the constituent republics (Armenia, Azerbaijan, Belarus, Estonia, Georgia, Kazakhstan, Kyrgyzstan, Latvia, Lithuania, Moldova, Russia, Tajikistan, Turkmenistan, Ukraine, Uzbekistan) prior to 1990. We attempted to include these republics in the estimation process described above, but the model tended to predict very low or nonexistent growth rates for the period from 1950 to 1990, which did not seem feasible considering the aggressive economic growth of the Soviet Union during certain periods of this era . We concluded that our approach needed slight modification to be used successfully with the USSR republics and conducted the following steps (which roughly follow our general approach and hold similar assumptions as specified above) at an early stage of our overall modeling process.
Following this step, the USSR estimates were added to the global estimates prior to the third stage of the modeling process described above. Thus, all USSR countries were a part of the final mixed effects model that ensured all series and countries were complete from 1950 to 2015.
Cumulatively the steps described above allowed us to achieve Goal 1 of our study.
Derivation of a new GDP per capita series
Exploring the sensitivity of results to the choice of series
One premise of this project was the idea that the use of different income series could affect the outcomes and inferences drawn from a statistical model that uses income as a predictor or covariate. We investigated this question by conducting regressions to model under-5 mortality (5q0) and adult male and female mortality (45q15). Specifically, we conducted a first-differences model with these health outcomes as the dependent variable and income, female education, and HIV seroprevalence (three-year lag) as the independent variables. Separately, we also ran a Beck and Katz model that in addition to the independent variables also uses a one-year lag of the outcome variable as a predictor variable. For the income covariate, we used each of the seven original income series and then each of the nine complete income series (including the two IHME series). All outcome and predictor variables were modeled in log space.
Stata 11.0 was used for all analysis and data management. All data and code are available from the authors upon request.
GDP per capita estimates for 210 countries from 1950 to 2015 are provided in Additional file 3: Annex 3. The estimates are provided for each of the seven source series used for analysis (IMF ID, Penn ID, World Bank ID, Maddison ID, World Bank USD, IMF USD, and UNSTAT USD) and for the new IHME ID and USD series. The USD estimates are all expressed in constant 2005 USD terms, as are all of the ID (PPP) series except for Maddison, which uses 1990 as a base year. These series are the most comprehensive GDP per capita series currently available and offer researchers diverse options to serve different analytical purposes.
Example of series effect on health outcome modeling
Child mortality (5q0)
Adult mortality, female (45q15)
Adult mortality, male (45q15)
-0.480 (-0.495, -0.464)
-0.286 (-0.299, -0.273)
-0.278 (-0.289, -0.266)
-0.389 (-0.402, -0.377)
-0.252 (-0.262, -0.242)
-0.235 (-0.244, -0.226)
-0.490 (-0.505, -0.474)
-0.288 (-0.301, -0.276)
-0.273 (-0.285, -0.261)
-0.331 (-0.342, -0.321)
-0.200 (-0.208, -0.192)
-0.185 (-0.193, -0.178)
-0.378 (-0.390, -0.367)
-0.215 (-0.225, -0.205)
-0.214 (-0.223, -0.205)
-0.306 (-0.315, -0.297)
-0.185 (-0.193, -0.178)
-0.173 (-0.180, -0.166)
-0.390 (-0.402, -0.378)
-0.244 (-0.253, -0.235)
-0.227 (-0.235, -0.219)
-0.371 (-0.383, -0.359)
-0.239 (-0.248, -0.231)
-0.224 (-0.232, -0.216)
-0.377 (-0.389, -0.365)
-0.239 (-0.248, -0.230)
-0.222 (-0.230, -0.214)
-0.324 (-0.333, -0.315)
-0.188 (-0.195, -0.181)
-0.178 (-0.185, -0.172)
-0.324 (-0.333, -0.315)
-0.189 (-0.196, -0.182)
-0.179 (-0.186, -0.173)
-0.320 (-0.330, -0.311)
-0.185 (-0.192, -0.178)
-0.175 (-0.182, -0.169)
-0.386 (-0.398, -0.374)
-0.245 (-0.254, -0.236)
-0.229 (-0.237, -0.221)
-0.324 (-0.334, -0.315)
-0.189 (-0.196, -0.182)
-0.179 (-0.185, -0.172)
Child mortality (5q0)
Beck and Katz
Adult mortality, female (45q15)
Adult mortality, male (45q15)
-0.507 (-0.524, -0.491)
-0.307 (-0.321, -0.293)
-0.287 (-0.300, -0.274)
-0.399 (-0.413, -0.386)
-0.272 (-0.283, -0.262)
-0.244 (-0.254, -0.234)
-0.509 (-0.526, -0.492)
-0.303 (-0.316, -0.289)
-0.282 (-0.295, -0.269)
-0.347 (-0.358, -0.336)
-0.213 (-0.222, -0.204)
-0.194 (-0.203, -0.186)
-0.403 (-0.416, -0.391)
-0.233 (-0.244, -0.222)
-0.223 (-0.233, -0.213)
-0.326 (-0.336, -0.315)
-0.201 (-0.209, -0.192)
-0.183 (-0.191, -0.176)
-0.412 (-0.426, -0.399)
-0.262 (-0.272, -0.252)
-0.239 (-0.249, -0.230)
-0.390 (-0.403, -0.377)
-0.260 (-0.269, -0.250)
-0.238 (-0.247, -0.229)
-0.397 (-0.411, -0.384)
-0.256 (-0.265, -0.246)
-0.234 (-0.243, -0.225)
-0.344 (-0.354, -0.333)
-0.202 (-0.210, -0.194)
-0.189 (-0.196, -0.181)
-0.344 (-0.354, -0.334)
-0.204 (-0.212, -0.196)
-0.190 (-0.197, -0.183)
-0.338 (-0.349, -0.328)
-0.199 (-0.206, -0.191)
-0.185 (-0.192, -0.177)
-0.408 (-0.421, -0.395)
-0.264 (-0.274, -0.255)
-0.242 (-0.251, -0.233)
-0.344 (-0.354, -0.334)
-0.203 (-0.211, -0.195)
-0.189 (-0.196, -0.182)
In this paper we have proposed and accomplished our two goals of 1) providing a method for producing a complete time series for GDP per capita for all existing sources and 2) proposing a new series to be used as an alternative to the existing series. By accomplishing these two goals we have produced a resource that will be useful for myriad purposes. Each of the source data series is now usable for 210 countries from 1950 to 2015. The strength of the IHME data series is that they reduce the bias that may result from using one source’s series for analyses. Assumptions, methods, and data availability may differ from source to source [26, 31], and it is not clear whether one source’s methods are superior to that of any other. Some researchers may have a deliberate reason to use one particular series and they will no longer be limited by data availability.
The first goal of our study was designed to address the fact that population health analyses were limited previously by spatial or temporal limitations of the existing data. Our method of completing the time series preserves series-specific trends and nuances throughout the estimation process and creates series that extend the existing data to missing countries and years. This should facilitate population health analyses and reduce biases that arise from using series with missing country-years [32, 33]. To illustrate this, we provided an example of the type of bias that can arise in missing time series values in our analysis of different GDP series in their original versus imputed state as predictor variables for mortality and indicate how bias can be reduced through the use of a more complete time series. We note that the estimates for GDP for post-2010 are driven solely by the growth estimated by the IMF series and that speculation on future years’ GDP growth should be interpreted and used cautiously. Similarly, the estimates for GDP per capita for emerging economies are more subject to variation between different series, as is indicated in Figure 1, and to higher uncertainty. This caveat is important in the use and interpretation of any GDP series for emerging economies.
In imputing these series, we sought to retain as much flexibility in the usage of the data series as possible. The inclusion of both constant USD series and PPP-based ID series offers such flexibility. The ID series controls for the idiosyncratic qualities of a country’s economy that affect the cost of goods for consumers. For example, due to economic policy, agriculture, and geography, a particular quantity of food may cost much more in one country than another. The pathways through which income affects health and other social outcomes may relate to the goods and services that can be afforded at different incomes. As such, we recommend the use of the IHME ID series for users who are modeling a health or social outcome affected by individual behavior or opportunity and who wish to control for income or to use income as a covariate.
In contrast, the USD series are derived entirely from the empirical amounts of trade occurring in a country without taking into consideration the cost of goods to consumers. Consequently, the IHME USD series (or any of the other USD series) may be a better option for researchers interested in exploring finance, trade, government spending, or other econometric topics that involve the movement of fungible assets.
Regardless of whether researchers opt to use a USD or ID series, we recommend that researchers test the sensitivity of their findings to using alternative completed income series. This task has also been made much more accessible since each series is provided in exactly the same format, whereas previously it was difficult to switch from one series to another due to formatting and naming nuances.
As additional analysis beyond our two research goals in this paper, we explored the effects that using different income time series could have on a statistical model of a health outcome. Using under-5 and adult mortality as exemplary health outcomes, we found that completed income time series could be interchanged without significantly affecting the regression’s income coefficients. Nevertheless, researchers should test the sensitivity of any income-dependent analysis using each of the different completed income series.
The role of income as a driver of health means that it can serve a wide array of analytical purposes. Previous analyses and studies that invoked econometrics may have been limited by the availability of income estimates for different countries and years. Thus, the completion of existing GDP per capita series and the development of the new IHME GDP per capita series provide useful resources for economic, demographic, and population health research. We have included all existing major data sources in this project, and our proposed modeling framework allows for easy updating of estimates when each of the sources updates its series.
Gross domestic product
Institute for Health Metrics and Evaluation
International dollars converted to purchasing power parity terms
Local currency units
United States dollars
Purchasing power parity
University of Pennsylvania Center for International Comparisons of Production
Angus Maddison’s research homepage at the University of Groningen Department of Economics
International Monetary Fund
abbreviation refers to World Economic Outlook report
United Nations Statistics Division
abbreviation refers to National Accounts Main Aggregates Database
abbreviation refers to World Development Indicators.
This paper is dedicated to the work of Angus Maddison (1926-2010).
- Preston SH: The Changing Relation between Mortality and Level of Economic Development. Population Studies 1975, 29: 231-248.View ArticlePubMed
- Carstairs V: Health Inequalities in European Countries. J Epidemiol Community Health 1991, 45: 86-86.PubMed CentralView Article
- Illsley R, Svensson PG: Health inequities in Europe. Soc Sci Med 1990, 1990: 223-420.
- Stronks K, van de Mheen H, van den Bos J, Mackenbach JP: The interrelationship between income, health and employment status. International Journal of Epidemiology 1997, 26: 592-600. 10.1093/ije/26.3.592View ArticlePubMed
- Mackenbach JP, Martikainen P, Looman CW, Dalstra JA, Kunst AE, Lahelma E, SEdHA working group: The shape of the relationship between income and self-assessed health: an international study. International Journal of Epidemiology 2005, 34: 286-293. 10.1093/ije/dyh338View ArticlePubMed
- Blaxter M: Health and lifestyles. Psychology Press; 1990.View Article
- Subramanian SV, Kawachi I: Being well and doing well: on the importance of income for health. International Journal of Social Welfare 2006, 15: S13-S22. 10.1111/j.1468-2397.2006.00440.xView Article
- Ecob R, Davey Smith G: Income and health: what is the nature of the relationship? Social Science & Medicine 1999, 48: 693-705. 10.1016/S0277-9536(98)00385-2View Article
- Ettner SL: New evidence on the relationship between income and health. Journal of Health Economics 1996, 15: 67-85. 10.1016/0167-6296(95)00032-1View ArticlePubMed
- Economist Debates: GDP: Statements. [http://www.economist.com/debate/days/view/501#con_statement_anchor] 
- Robalino DA, Picazo O, Voetberg A: Does Fiscal Decentralization Improve Health Outcomes? Evidence from a Cross-Country Analysis. SSRN eLibrary 2001.
- Murray CJ, Lopez AD: Alternative projections of mortality and disability by cause 1990-2020: Global Burden of Disease Study. The Lancet 1997, 349: 1498-1504. 10.1016/S0140-6736(96)07492-2View Article
- Berger MC, Messer J: Public financing of health expenditures, insurance, and health outcomes. Applied Economics 2002, 34: 2105. 10.1080/00036840210135665View Article
- Macinko J, Starfield B, Shi L: The Contribution of Primary Care Systems to Health Outcomes within Organization for Economic Cooperation and Development (OECD) Countries, 1970–1998. Health Services Research 2003, 38: 831-865. 10.1111/1475-6773.00149PubMed CentralView ArticlePubMed
- Bloom DE, Canning D: The Health and Wealth of Nations. Science 2000, 287: 1207-1209. 10.1126/science.287.5456.1207View ArticlePubMed
- Bloom DE, Canning D, Sevilla J: The Effect of Health on Economic Growth: Theory and Evidence. National Bureau of Economic Research Working Paper Series 2001. No. 8587
- Bloom DE, Canning D, Sevilla J: The Effect of Health on Economic Growth: A Production Function Approach. World Development 2004, 32: 1-13. 10.1016/j.worlddev.2003.07.002View Article
- Spence M, Lewis MA: Health and Growth. World Bank Publications; 2009.View Article
- Organization WH: Macroeconomics and health: investing in health for economic development: report of the Commission on Macroeconomics and Health. World Health Organisation (WHO); 2002.
- Marmot M, Friel S, Bell R, Houweling TA, Taylor S: Closing the gap in a generation: health equity through action on the social determinants of health. The Lancet 2008, 372: 1661-1669. 10.1016/S0140-6736(08)61690-6View Article
- World Development Indicators Data. [http://data.worldbank.org/data-catalog/world-development-indicators] 
- United Nations Statistics Division - National Accounts. [http://unstats.un.org/unsd/snaama/dnllist.asp] 
- IMF World Economic Outlook Database List. [http://www.imf.org/external/ns/cs.aspx?id=28] 
- Penn World Table, index. [http://pwt.econ.upenn.edu/php_site/pwt_index.php] 
- Home Maddison. [http://www.ggdc.net/MADDISON/oriindex.htm] 
- Maddison A: Explanatory Note on Historical Statistics. 2010.
- McCulla SH, Smith S: Measuring the Economy: A Primer on GDP and the National Income and Product Accounts. 2007.
- Agénor P-R, McDermott CJ, Prasad ES: Macroeconomic Fluctuations in Developing Countries: Some Stylized Facts. The World Bank Economic Review 2000, 14: 251-285. 10.1093/wber/14.2.251View Article
- Global Burden of Disease Project. [http://www.globalburden.org/] 
- Basu D: Review: [untitled]. The Economic Journal 1995, 105: 1666-1668. 10.2307/2235133View Article
- World Economic Outlook Database - Assumptions and Data Conventions. [http://www.imf.org/external/pubs/ft/weo/data/assump.htm] 
- King G, Honaker J, Joseph A, Scheve K: List-wise deletion is evil: what to do about missing data in political science. Boston: In Annual Meeting of the American Political Science Association; 1998.
- Honaker J, King G: What to Do about Missing Values in Time‒Series Cross‒Section Data. American Journal of Political Science 2010, 54: 561-581. 10.1111/j.1540-5907.2010.00447.xView Article
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.