Application of space-time disease clustering by administrative databases in Italy: Adverse Reproductive Outcomes (AROs) and residential exposure

Background The aims of this study were to estimate the existence of clusters of AROs in the municipalities of the Marches Region (Central Italy) after complaints from residents living near an abandoned landfill site. Methods Cases of AROs (i.e., congenital malformation, chromosomal abnormalities, and low birth weight) were retrieved from hospital discharge data. SaTScan and GeoDa were used to check for the presence of clusters at a regional and a small area level. Moreover, at a small area/neighborhood level, smoothed rates were calculated, and a case–control approach was used to assess the residence in proximity to the abandoned landfill as an independent risk factor for AROs. Results AROs were associated with the price per square meter of the accommodations in the area of residence (OR 2.53, 95 % CI 2.06-3.10). On the other hand, residence within one kilometer of the landfill (OR 0.04, 95 % CI 0.01-0.23) and maternal age greater than 35 years (OR 0.96, 95 % CI 0.92-0.99) were protective. Conclusions Residency in proximity to the abandoned landfill was not a risk factor for the occurrence of AROs. The results show that basic information, such as the price of accommodations in different neighborhoods, could be of interest in order to target training programs for women living in difficult conditions and highlights the potential role of the building environment in perinatal health. However, we note that aside from the data provided by Geographic Information Systems in public health, collection of the patient’s residential address was unreliable for selected conditions. Future efforts should emphasize the patient’s residential address as information important for evaluating the health of individuals instead of being merely administrative data.


Introduction
In autumn 2006, the population living in Fano (a town located on the Adriatic coast of the Marches Region in Italy) began to complain about a presumed high frequency of congenital malformations and other adverse birth defects in an area surrounding an abandoned landfill site. As a consequence of these complaints, a cluster investigation was carried out. The aims of this study included regional and small area approaches. For the regional approach, the aims were to estimate the Adverse Reproductive Outcomes (ARO) incidence rate at a municipality level and to evaluate the existence of clustering of AROs in the municipalities of region. At a small area level, in the town of Fano, the incidence of AROs was evaluated by neighborhood, a cluster analysis was performed, and the proximity of the residence to the landfill site was assessed as an independent risk factor for ARO occurrence.
The analysis of epidemiologic data at a small area level has been increasingly used to measure the need for target interventions and to evaluate the impact of local health policies [1]. Small-area studies investigate the role of the neighborhood level in population health, and most of them have found detrimental health impacts on residents of deprived neighborhoods [2,3]. The specific value of small-area analysis is that it permits the examination of data for populations that tend to be more homogeneous in character and environmental circumstances than the larger and more widely spread populations [3]. In fact, besides the great capability of these studies to integrate information from multiple levels of interest (social, personal, etc.), these methods are also interesting for assessing human exposure to environmental pollution, especially in large studies when an approach based on individual exposure is often rather demanding for participants and requires extensive, thus often inadequate, resources. Therefore, the need to analyze a spatial gradient in environmental epidemiology is crucial. Multiple approaches have been used to estimate exposure of individuals, using the area of residence as a proxy [4,5]. The smallest territorial unit that can be used in small area studies depends on data availability that may vary in different countries. For example, in Italy small area studies can be carried out at the census tract (average of 200 residents) and municipality levels [6].
Administrative databases represent a useful source of information to study the epidemiology of diseases and to analyze trends of various health conditions [7]. They offer important benefits from a practical point of view, and in recent years they have become a basic source of data for disease surveillance, evaluation of health resource use, and assessment of healthcare outcomes [8,9]. Recently, many studies have explored spatio-temporal patterns of disease incidence in order to identify areas of significantly elevated or decreased risk and to suggest potential causes [10,11]. An exploration of the spatiotemporal clustering of the incidence of a given disease can provide useful information for policymakers and health planners in their studies of possible ecological or group-level risk factors and in focusing the administration of specific public health initiatives [11].
Among the major concerns of public health importance is the possible impact of environmental pollutants on the developing fetus. The linkage between built environment and socioeconomic conditions has been increasingly identified in recent years [12], and the effects of the built environment on perinatal health at the neighborhood level may be mediated by different mechanisms [13][14][15][16]. In this context, it has been reported that the linkage between characteristics of the built environment of mothers and their socioeconomic position may partially confound some of the long-observed association between exposure to environmental factors and adverse health outcomes [17][18][19].

Study area
The Marche region is located on the Adriatic Sea, in Central Italy, and the town of Fano is located in the northern part of it (Fig. 1). The region had a total population of 1,450,000 inhabitants at the time of the events, while the town of Fano had about 40,000 inhabitants. Soil samples from the abandoned landfill site were analyzed by the local Environmental Health Authority, which did not find any particular contamination by known toxic substances. The landfill site had been in use until 2003 and then abandoned.

Case definition
Since the Marches Region did not have a congenital malformations (CM) registry at the time of the events [20], the integration of different existing healthcare information systems was used to evaluate the phenomenon. An Adverse Reproductive Outcome (ARO)-associated hospitalization was defined as hospitalization occurring at birth, or during the first year of life, for which one of the ICD-9-CM codes for CM or low birth weight (LBW) was listed in any of the discharge diagnosis fields. AROs were analyzed as a whole and according to the following classes: malformations of the central nervous system, cardiovascular malformations, orofacial malformations, ear malformations, gastrointestinal malformations, genitourinary malformations, and musculoskeletal malformations. Infants with more than one malformation were counted in each relevant category; individual cases were traced through the analysis of hospital discharge records (HDR), and repeated hospitalizations were excluded from the analysis. The one-year term of observation from birth was used to enhance the sensitivity of case identification. The analysis of HDR was also used to detect cases of voluntary termination of pregnancy (ICD-9 635) with a secondary diagnosis of malformation of the CNS (ICD-9 655.1), as they were likely to be due to cases of CM identified through prenatal diagnosis. Similarly, cases related to the prenatal diagnosis of chromosomal abnormalities (CA) were identified and included in the ARO definition.

Regional-level analysis
The first step of this analysis was done by carrying out a retrospective cohort study in order to assess the risk of AROs in the resident population of the town compared with that of the resident population in the same region.
Rates were calculated by use of denominators derived from the Italian National Institute of Statistics (http:// demo.istat.it/) and were expressed as the estimated number of cases per 1,000 infants. According to the EUROCAT methodology [21] for calculation of rates, a newborn with several anomalies is counted once within each class of anomaly. Therefore, the number of cases in different classes cannot be added to reach a total number of single cases. A baby is counted once in the calculation of ARO rates, even if affected by multiple AROs. Confidence intervals were calculated by using Poisson approximation. A multilevel mixed-effects linear regression modeling approach was used to evaluate the variables related to AROs. The model included local health authority, the period of study, mean age of mothers at delivery, and deprivation index by municipality.
The model design followed the general format of generalized linear mixed effects models; in particular, the AROs incidence rate at the regional level (y ij ) was assumed to be associated with two level-related factors-municipality level and local health authority level, as follows: In this model, three fixed coefficients at the municipality level ( i ), the mean age of mothers, year of events, and the deprivation index have been introduced, while as a second-level factor ( j ), the local health authority of residence has been considered.
Among the possible confounding variables available through the current health information system, mean maternal age at delivery at a municipality level and an index of local deprivation were selected for evaluation. The deprivation index included in the regional analysis at a municipality level was calculated according to the index validated by Cadum and included the following variables: percentage of single-parent families, unemployment rate, percentage of people with a primary education, the percentage of homes without a bathroom in the house, and the percentage of households that rent [22].
The SaTScan™ software was used for analysis of clusters of AROs in the different municipalities of the region [22,23]. SaTScan statistic identifies the most likely (unusual) cluster [24][25][26][27]. A Poisson model was used during the analysis. Clustering was performed using purely spatial, temporal, and spatial-temporal scenarios, separately. The maximum cluster size was set at different levels (i.e., 20 % and 50 % of the total population at risk), and for temporal analysis a one-year interval was chosen. For the analysis of clusters, distribution of cases by municipality was assessed; the center of each of the 246 towns belonging to the Marches Region was calculated, and their coordinates have been used as reference positions for each geographic entity.
Although the spatial scan statistic used in SaTScan is widely accepted, there are acknowledged sensitivities of results depending on input parameters. For instance, Tango [28] pointed out that when using SaTScan, often the most likely cluster is very large and "swallows" neighboring regions that have non-elevated risk. In conjunction with SaTScan analysis, cluster detection was carried out with the help of the GeoDa software, calculation of smoothed rates, and computation of Local Moran's I [29]. Moreover, local indicators of spatial association (LISA) maps generated by a Local Moran statistical test were used to visualize clusters. Significance of clusters was assessed using Monte Carlo simulations with 999 permutations.

Small area-level analysis, Fano municipality
Through the linkage of data provided by the georeferencing service active in the municipality of interest, the census office of the same municipality, and the data processing center of the local health authority, it was possible to georeference individual cases of AROs occurring in Fano. The cases and controls for which it was not possible to associate a single address at the time of pregnancy and delivery were excluded from subsequent analysis. Moreover the municipality's georeferencing service has provided the exact coordinates of location of the former landfill site. In particular, HDR and personnel data were linked by the local health authority's dataprocessing center to obtain the full names of newborns. Personnel from the data-processing center did not know the study's objective or the health status of newborns. The names, ages, and addresses of mothers at the time of delivery were ascertained through the linkage of data provided by Fano's census office. With the help of the georeferencing service active in the same municipality, it was possible to georeference individual cases of AROs occurring in the town. Cases of AROs were then grouped at a neighborhood level, and rates underwent Bayesian smoothing by the GeoDa software. LISA maps generated by a Local Moran statistical test were used to visualize clusters; significance of clusters was assessed using Monte Carlo simulations with 999 permutations. Cluster analysis was performed by SaTScan both on data aggregated by electoral arrangement and by means of a case control approach, using newborns' addresses at birth. The maximum cluster size was set at different levels (20 % and 50 % of the total population at risk), and for temporal analysis a one-year interval was chosen.
Finally, a case-control study was carried out to assess the risk of AROs in infants born in the index town during the period 2001 to 2006. At least two controls were initially selected for each case from hospital discharge data, with a random sample stratified by year of birth. After record linkage with municipality records, exact address at the time of birth was available in 97.5 % of cases and 62.4 % of potential controls previously identified on the basis of hospital discharge data. The final number of controls was 331, with a ratio of 1.6 controls for each case.
Cases and controls were stratified by risk variables such as proximity to the former landfill site, maternal age at birth greater than 35 years, presence of comorbidities during pregnancy (i.e., diabetes, mood disorders requiring hospitalization, alcohol abuse, and use of toxic substances requiring hospitalization), and the value of the house of residence. The latter was classified into five levels according to the assessed price per square meter of the accommodations in the area of residence from the Italian Revenue Agency [30]. The incidence of ARO for each electoral arrangement of the town was calculated, together with the incidence of the CMs by apparatus, in toto, and LWB.
Multilevel logistic regression models were developed to adjust for confounding and to evaluate which factors were independently associated with an ARO. The criteria for entry of the variables in the model were selected from among those with a value lower than 0.20 in the bivariate, using the stepwise method. Independent variables were coded as follows: residence in the vicinity of the former landfill site (more than three kilometers (km) = 1; residence less than one km = 2, between one and two km = 3, between two and three km = 4), maternal age of 35 years or more at the time of delivery (yes = 1, no = 0), and class of price per square meter of the accommodations in the area of residence (categorized into quintiles: 1 = higher, up to 5 = lower). The significance level was set at p < 0.05. In Fano, the incidence rate of newborns with AROs was lower than that registered in the other municipalities of the region (with 213 AROs detected in 3,409 infants, an incidence rate of 62.48 per 1,000 infants, 95 % CI 54.59-71.13 per 1,000). In particular, while the incidence of CM and CA was similar to that registered in the whole region, LBW was less common in the index city. The detailing of cases by type of ARO is reported in Table 1. Distribution of Bayesian-smoothed ARO rates in the Marches Region is represented in Fig. 2. In particular, the SaTScan procedure, whether using the 20 % of the total population at risk or the 50 %, identified a most-likely cluster that included 19 municipalities (the cluster radius was 14.92 km, RR 1.68, p < 0.001). The space-time permutation found the same cluster in the 2003 to 2005 period, while the temporal approach found a not-significant cluster in the 2002 to 2004 period. Moreover, four secondary clusters have been identified (Fig. 3). The Local Moran's I, represented by the LISA map, highlighted different municipalities showing high rates (Fig. 3). However, none of these clusters involved the town of Fano.

Regional-level analysis
The multilevel mixed-effect linear regression model showed that ARO rates were related to deprivation index in the area of residence and were more frequent in the second period of observation; mean mothers' age at delivery was not independently related to the occurrence of AROs (Table 2).

Small area study
Rates calculated at the census tract level are represented in Fig. 4 (raw) and Fig. 5 (after Bayesian smoothing with the GeoDa software). Moreover, Fig. 4 details the georeferenced cases and controls, as well as the position of the abandoned landfill site. Cluster analysis, performed by SaTScan both on data aggregated by electoral arrangement and by using the coordinates of residence available for cases and controls, allowed the identification of a single cluster of AROs at a distance of about six kilometers from the landfill site in the 2001 to 2003 period. The cluster was also detected by the use of Local Moran's I statistic, and visualized by the Moran's I and LISA (Fig. 6).
The final sample for the small area study included 207 cases and 331 controls. The study of variables associated with congenital malformations has highlighted the role of price per square meter of the accommodation of newborns (OR 2.53, 95 % CI 2.06-3.10) ( Table 3). Conversely, increase in maternal age was protective (OR 0.96, 95 % CI 0.92-0.99).
Residence within one kilometer of the landfill reduced the risk of AROs by 96 % (OR 0.04, 95 % CI 0.01-0.23); although not significant, the gradient of risk seems to move away from the former landfill site to increase beyond three kilometers.

Discussion
The results reveal a lack of association between living near the former landfill site and any ARO. The finding of a reduced risk of ARO in proximity to the landfill site may seem in contrast with the general hypothesis linking environmental exposure to fetal adverse events. However, we must underline the absence of any known toxic substance from the landfill in this case. Moreover, other evidence from the literature has similarly found mixed [31] or negative [32] results.
Both the analysis performed at a regional level and that performed at a small area level highlight that the only independent factor significantly associated with an adverse outcome was a low price per square meter of the accommodations in the area of residence.
If we consider the low price per square meter of houses as a good proxy of the socioeconomic position of  [33]. Timmermans suggests that among women who live in situations of distress, several variables that adversely affect the products of conception (such as smoking, abuse of alcohol and drugs during pregnancy, poor eating habits, lack of use of multivitamin supplements, and lack of interest in health and hygiene standards in general) may be simultaneously present [17]. On the other hand, older mothers seem to be protected from AROs, probably because of the availability of free prenatal screening. Moreover, the above studies have been confirmed by a recent meta-analysis concluding the detrimental role deprived areas have on perinatal outcomes [18]. Our data, although based on a limited number of observations, suggest that the people who live in conditions of socioeconomic disadvantage may have a higher risk of giving birth to infants with congenital malformations. These results confirm the association between poor  neighborhoods and adverse pregnancy outcomes, and, in particular, that a meaningful relationship between the quality of the residential built environment and birth outcomes may be of interest as a good measure of general community health [2,6,12,34,35]. A wide range of maternal, socioeconomic, and environmental factors may mediate the impact of socioeconomic status on the prevalence of congenital anomalies, including personal factors (i.e., nutritional factors, parity and maternal age, maternal distress, and ethnic origin), as well as lifestyle factors (e.g., smoking habits), environmental and occupational exposures, and access to and use of health care services [13,[36][37][38][39][40]. In this context, information bias has been highlighted as one of the main findings of the paper; given a multifactorial condition, such as the AROs, the availability of a pathology registry is of crucial importance. We may discuss our results in light of a general lack of clusters; however, we were not able to inspect some well-known risk factors for congenital anomalies at an individual level. Spatial clustering maintains its importance in order to highlight possible foci; however, its integration with personal, socioeconomic, organizational, and political aspects is crucial, especially when dealing with complex events.
Moreover, when analyzing individual outcomes at an area level we should not forget the ecological fallacy, which links correlations observed at the group level to individuals. Ecological studies can provide useful exploratory information, but conclusions about individuals may be only weakly supported by data on groups.  In light of those limitations, we think the most important result of the study was the lack of availability of reliable health care data, affecting the ability to correctly verify the existence of clusters and the possible association with the various risk factors using the current information systems. In fact, data collection was carried out by a complex linkage between hospital, administrative, and geospatial data, rather than of pathology registries, leading to a loss of information both quantitative (it was possible to retrieve more specific data at an individual level, such as address of residence at the time of the events, mother's name, for only 59 % of cases and 44 % of controls selected from the regional database) and qualitative (lack of personal risk factors). Intrinsic and relevant are the limits to the use of data relating to hospital admissions linked to the coding of diagnoses that may be relatively unreliable for reasons of expediency of the dispensers, for the inevitable errors of accuracy, precision, and reproducibility encoding the clinical data.
Moreover, we think that the above errors may even be more frequent when dealing with AROs, since we have had the need to link health data belonging to two different persons, the mother and the newborn, usually having a different family name and sometimes a different municipality  of residence. These difficulties are even more important when reviewing rare events over time since the retrieval of information about the actual address at the time of events may be challenging. We should not forget the important role played by small numbers when dealing with rare events in small population [41].
Despite indicating some feasibility of cluster analysis at a municipality level by the current health information system, the lack of a congenital anomalies registry is of crucial importance in adequately assessing confounding factors. This puts into evidence, once more, the need for careful design and analysis to better characterize environmental effects on human reproduction [42]. Registries are one of the most accurate instruments for case ascertainment and surveillance of health events, assuring reliability in estimating rates of AROs [43]. Nevertheless, limited resources in birth defect surveillance programs sometimes require the use of electronic health archivesalready available and designed for administrative purposesfor surveillance by public health researchers [44,45]. In this context, hospital discharge data are increasingly used to estimate the occurrence of a wide range of diseases and, more recently, to estimate neonatal morbidity and birth defects at a national level in Italy [46].
These considerations point out the need for effective and up-to-date surveillance systems as active companions in monitoring healthcare phenomena. Aside from the ongoing advancement and crucial importance of utilizing Geographic Information Systems in public health, at least in this region, the routine activities of collecting the residential address of the patient seems to be a bit outdated and unreliable. Also, with respect to cancer epidemiology, the challenges associated with successfully identifying community clusters of disease is a real current problem [47]. Future efforts may be placed on emphasizing the importance of residential address as information strictly associated with the health of individuals instead of being merely administrative data. On the other hand, the importance of the accuracy of the home address is even higher, considering that one of the strengths of this study was the utilization of the price per square meter of accommodations in the area of residence as a proxy for socioeconomic status. In fact, these data could be quite easy to collect on an ongoing basis and link to the socioeconomic status of the family. Moreover, this basic information could be of interest in order to target training programs for women living in difficult conditions, at a local level, with the ultimate aim of bridging the knowledge gap and addressing the obstacles that prevent or discourage a change in lifestyle or access to healthcare services.

Competing interests
We have no conflicts of interest to declare. The local health authority provided funding for the investigation. The study was performed after complaints were raised by the resident population; therefore the primary objective was not a research focus but was done for health protection matters. Therefore, approval by an ethics committee was not necessary.
Authors' contributions PB made substantial contributions to conception and design, acquisition of data, analysis and interpretation of data, drafting and revising the article; MA and FF made contributions to conception of the study, acquisition of data, and revising the article; SG contributed in drafting the article, and revising it critically for important intellectual content; MMDE, and FDS contributed in drafting the article, and revising it critically for important intellectual content, EP made contributions to conception of the study, acquisition of data, drafting the article, and revising it critically for important intellectual content. All authors read and approved the final manuscript.