The assessment of progress toward meeting the MDGs will be measured through national-level indicators
 that can mask substantial inequalities across nations
[12, 18, 56]. The development of cartographic approaches to transforming georeferenced data on health and development metrics into valuable spatial datasets is opening opportunities for quantitative assessments of these inequalities, the targeting of interventions and measurement of progress toward the MDGs, but demographic spatial datasets to support such efforts remain reliant on coarse and outdated input data for accurately locating risk groups.
While high-resolution spatial data on population distributions in resource poor areas are now becoming available (e.g., http://www.afripop.org, http://www.asiapop.org, http://www.census.gov/population/international/data/mapping/demobase.html), comprehensive and contemporary subnational information on the demographic attributes of these populations remain scattered across national statistics office reports and household surveys
. Here approaches to combining these publicly available disparate datasets are presented, enabling the production of Africa-wide datasets depicting age and sex compositions at subnational scales. The datasets and analyses highlight the importance of accounting for subnational demographic variations in deriving health and developments metrics. Both the large subnational variations in age and sex population structures that are evident (Figures
3), and the resulting impacts that these have on metric derivation (Figures
4) underline the need to obtain and utilize the most spatially refined data available.
The ranges of proportions of the population that is under 5 years old seen when comparing the subnational versus national-level estimates (Figure
3) highlight the need for more spatially detailed demographic data to better capture these variations. Differences of +/-5% in the proportions are common, and the spatial configuration of those areas that are substantially greater or less than the UN estimates in relation to the spatial distribution of disease risks or access, as seen in Figures
4, can have major implications on the derivation of indicators. Whilst the distributions of predicted malaria risk or travel times are mapped as continuous variables at 1km spatial resolution, if the population distribution data used to derive numbers at risk is based upon an assumption of age and sex structure homogeneity through national-level estimates, it is clear that this can result in some significant inaccuracies that consistently remain unacknowledged. Clear urban and rural differences (Additional file
1: Protocol S1) also highlight the need for accounting for such variations, and when indicators such as malaria risk or access to health facilities that vary substantially by urban-rural divides are being estimated, the large effects of this are evident (Figures
4). For example, in Kenya some of the most rural areas have the highest malaria transmission, the largest travel times to health facilities, and the highest proportions of children under 5/lowest proportions of women of childbearing age. Thus, accounting for all three of these factors subnationally compared to assuming a homogenous demographic structure results in substantial differences in outcome metrics (Figures
4). As funding for health and poverty-related mapping and the number of new cartography projects (e.g.
[57–60]) continues to grow, the need for accurate spatial population distribution data will also grow if denominator-reliant metrics are required.
While accounting for subnational heterogeneity in population attributes likely results in significant improvements in the accuracy of health metrics, it is clear that many sources of uncertainty and error remain. All of the census and survey-based data used here are subject to various sources of error and bias, many of which have been well documented
. Indigenous groups, informal settlements, places experiencing civil unrest, and refugees are often entirely unsampled, either because of political biases, missing sampling frames, or prohibitive difficulties in carrying out a survey. Uncertainties also arise over comparisons being made between primarily census-based national estimates of age/sex proportions from the UN Population Prospects
 and the household survey-derived subnational age/sex proportions used here for some countries. Differences between the way these proportions were measured contributes to uncertainties in comparisons between outcome health metrics, though strong correlations between the household survey-derived age structures and those derived from census data suggest that such differences may be small (Additional file
1: Protocol S1). Further, the underlying AfriPop population datasets contain uncertainties
, while for some countries, the input data used here remains outdated and coarse (Figure
1, Additional file
1: Protocol S1). Like most other population parameters reported for administrative polygons, the age and sex proportions are also subject to the modifiable areal unit problem
. Discretising (by gridding) a phenomenon that is continuous (or in this case, varying at a far higher resolution) is an arbitrary process. In the case of the datasets presented here, whilst the precision with which heterogeneities in vulnerable population distributions are mapped is improved over simple national adjustments, we are still faced with a dataset containing one set of values for Libya and thousands for Tanzania. There is therefore a need to more rigorously quantify the uncertainties inherent in spatial demographic datasets. The advancement of theory, increasing availability of computation, and growing recognition of the importance of robust handling of uncertainty have all contributed to the emergence in recent years of new, sophisticated Bayesian approaches to the large-scale modeling and mapping of disease
[4, 7, 25], but such methods have yet to cross over to the spatial demographic databases with which such maps are used. The regular availability of new national household surveys means that more contemporary data is continually becoming available to aid in updating and improving the accuracy of the datasets presented here, potentially through automated systems that can rapidly adapt to new incoming data and integrate them into the output spatial datasets, alongside robust methods to account for temporal differences
The international focus on health-related goals coupled with a growing trend in research and funding for cartographic approaches to deriving metrics are increasing needs for spatial demographic data of similar scope for use in estimating denominator sizes and characteristics of populations at risk. The importance of accounting for subnational demographic variations in deriving health metrics is clear and the size of the differences that exist between ignoring subnational variations in age and sex structures, compared to accounting for them, is large enough to make the difference between success and failure in meeting a MDG. Here we have shown that sufficient data exists to produce a continent-wide subnational picture of demographic attributes and the mapping of key risk group distributions. Gridded age-structured datasets for 2000, 2005, 2010, and 2015 are freely available to download from the AfriPop project website (http://www.afripop.org) and will be regularly updated as new data become available. Similar datasets for Asia and Latin America will soon be made available through the AsiaPop (http://www.asiapop.org) and AmeriPop (http://www.ameripop.org) projects.