A Bayesian spatio-temporal analysis of mortality rates in Spain: application to the COVID-19 2020 outbreak

Background The number of deaths attributable to COVID-19 in Spain has been highly controversial since it is problematic to tell apart deaths having COVID as the main cause from those provoked by the aggravation by the viral infection of other underlying health problems. In addition, overburdening of health system led to an increase in mortality due to the scarcity of adequate medical care, at the same time confinement measures could have contributed to the decrease in mortality from certain causes. Our aim is to compare the number of deaths observed in 2020 with the projection for the same period obtained from a sequence of previous years. Thus, this computed mortality excess could be considered as the real impact of the COVID-19 on the mortality rates. Methods The population was split into four age groups, namely: (< 50; 50–64; 65–74; 75 and over). For each one, a projection of the death numbers for the year 2020, based on the interval 2008–2020, was estimated using a Bayesian spatio-temporal model. In each one, spatial, sex, and year effects were included. In addition, a specific effect of the year 2020 was added ("outbreak"). Finally, the excess deaths in year 2020 were estimated as the count of observed deaths minus those projected. Results The projected death number for 2020 was 426,970 people, the actual count being 499,104; thus, the total excess of deaths was 72,134. However, this increase was very unequally distributed over the Spanish regions. Conclusion Bayesian spatio-temporal models have proved to be a useful tool for estimating the impact of COVID-19 on mortality in Spain in 2020, making it possible to assess how the disease has affected different age groups accounting for effects of sex, spatial variation between regions and time trend over the last few years.


Background
Data on mortality attributed to coronavirus disease 2019  in Spain-one of the most affected western European countries-have been very controversial. Disagreements between data-providing sources arise from the difficulty of telling apart patients who died "from COVID-19" as the main cause from those suffering other underlying health problems that together with the virus caused death, i.e., "having contracted COVID- 19." In addition to that, the highly decentralized Spanish Health System has been a serious handicap for a sound data collection, thus leading Spanish health authorities to reconsider the published data: even today, criteria adopted for this count are being discussed. Uncertainties in the approach to the correct diagnosis about excess mortality produced by COVID- 19 have also been suggested in other places such as England and Wales [1].
The real impact of the epidemic goes far beyond determining how many people died from COVID-19 or suffered COVID-19. In fact, as indicated in [2], there are people who have died indirectly from COVID (for example, because they had not received adequate medical attention for other problems) and should be considered as side effects of the epidemic. There are also people who actually died from COVID-19, especially in the case of the elderly or seriously ill, but perhaps the disease only accelerated the moment of death, which would have happened anyway. In addition, the lockdown measures that were taken in the country during the first wave of the pandemic also had the effect of reducing mortality from other causes: there were fewer deaths in road accidents; it is even possible that isolation facilitated the reduction of infections due to influenza viruses, also reducing mortality from this cause. Therefore, any assessment of the impact of COVID on mortality in Spain during 2020 will include a balance between all these effects since it is not possible to estimate them separately.
An attractive way to assess the impact of the epidemic on mortality is to compare the deaths observed in 2020 with those projected from a sequence of previous years. This has been understood by the authors of [3,4], who have evaluated the total excess of deaths during periods of the epidemic in Italy, by geographical areas, sex, and age, with respect to what was expected from 2015 to 2019 [3], or from previous months [4].
These estimates have sometimes focused on identifying or predicting the trend in mortality rates during a given period of the pandemic ( [5] during lockdown in Spain [6]; during a fortnight in India [7]; for a quarter in Italy, Spain, and France [8]; for a 4-month period in Iran). Other authors have focused on the investigation of the relationship of spatial patterns with say ecological factors in the USA [9].
The application of spatio-temporal study methods has proven to be ideal to optimize the observation of epidemics behavior, using an information exchange based on the influence of location proximity and time closeness [10]. These methods have been deemed essential to describe the spread and to make decisions about the mitigation of the pandemic [11,12].
The purpose of this study is to compare the mortality rates observed in the year 2020 with the trend projected for this year based on the sequence of mortality rates in years 2008-2020 according to age group, sex, and autonomous communities (administratively, Spain is divided into seventeen autonomous communities). Obviously, mortality rates over time cannot be assumed to be generated by a stationary process, which would imply that rates obtained from averages of death counts in past years (e.g., 2015-2019, using a 5-year period as is usually done in mortality studies) do not estimate any characteristic parameters of the process. Therefore, this assumption does not offer an optimal way to make a projection for the year 2020. Since mortality patterns showed differences between age groups, a spatio-temporal model for the death counts was estimated for each group. In each one the effects of sex, autonomous community and year (sequence 2008-2020) were included. In addition, a specific effect for year 2020 ("outbreak") was added. Models were estimated in the Bayesian framework via the integrated nested Laplace approximation (INLA). Once the projected deaths for 2020 were estimated, the excess mortality was obtained as the difference between observed and projected deaths.

Data source
All data have been obtained from the Spanish official Institute of National Statistics (INE). Population data can be found in [13], mortality data by autonomous community, sex, and age until 2019 have been downloaded from [14], and mortality data for 2020 are located in [15].
For each one of the age cohorts (< 50, 50-64, 65-74, and 75 over), the dataset has the form: where Pop s, a, t and Death s, a, t denote the population size and the number of deaths during 2020, respectively in the cohorts determined by sex (s), autonomous community (a), and year (t). Values of (a) correspond to the INE codes of communities.
Observed mortality rates by sex, community, and time The mortality rate (deaths by 100,000 persons-year) in cohort {s, a, t} was estimated as R s;a;t ¼ 10 5 Â Death s;a;t Pop s;a;t Expected number of deaths assuming uniformity of the rates by autonomous communities, years, and sex Under the hypothesis of uniformity of the rates, the expected number of deaths is defined as where r = ∑ s, a, t Death s, a, t /∑ s, a, t Pop s, a, t (total proportion of deaths) Spatio-temporal model to project mortality from 2008-2019 to 2020 In each of the four age groups, we assume that the number of deaths in the {s, a, t} cohort is a random variable with negative binomial probability distribution with mean λ s, a, t where: Here, α is the intercept, β s, a denotes the interaction sex-autonomous community (for all a = 1, …, 17, β female,a = β 0 and β male,a = β a ), u a the spatially structured community effect modeling spatial adjacency, v a the community-specific (unstructured) effect, γ t the temporally structured effect and d a a random slope that modulates the trend in each autonomous community. In addition, f a · I t (2020) models the specific effect of COVID-19 on the excess mortality in 2020 (outbreak), being I t (2020) = 1 for t = 2020 and zero for t < 2020, and f a , the different effects of the epidemic for the autonomous community a.

Estimation
Model parameters were estimated using the Bayesian framework. Briefly, Bayesian methods assume a certain a priori knowledge of the parameters, which is formulated as a probability distribution (prior distribution). This distribution, together with the available data, leads to the socalled posterior distribution, whose characteristic values constitute the parameter estimates. So, the 2.5th and 97.5th percentiles of the posterior distributions yield a credibility interval (CI) for the corresponding parameter. Therefore, the estimation of each parameter of the model is presented via the mean, median, 2.5th and 97.5th percentiles of its posterior probability distribution.
For the four spatio-temporal models, one for each age cohort, we use the following prior distributions, namely: for α, we use a centered Gaussian distribution with variance 10 4 (α~N(0; 10 4 )), whereas for the sex-autonomous community interactions β s, a we assume priors centered, independent and normally distributed random variables IIDN(0; 10 4 ).
The probability distribution of the deaths count was based on the deviance information criterion [19]. It is a generalization of the Akaike information criterion (AIC), developed for the comparison of two models in the Bayesian framework. According to this criterion, a value of DIC is associated to each model. Then, given two models, the one with the lower DIC will be preferred.
From this model, the projection of deaths for 2020 in the cohort {s, a}, based on the sequence 2008-2019 is And the rate-ratio between the real rate and project for the (a)-community is All models were estimated using the procedure known as integrated nested Laplace approximation (INLA), through the R-interface [20,21]. Data were analyzed using the R language and environment, version 3.6.1 [22]. Figure 1 shows the annual evolution of the mortality rates observed from 2008 to 2020 for entire Spain and per Autonomous Community, according to age group and sex. Possible probability distributions for the death counts are the Poisson and negative binomial distributions. According to the deviance information criterion (see Table 1), the negative binomial distribution led to better fits than the Poisson distribution for the models corresponding to the age groups 50-64 years and 75 and over. For the other age groups, both distributions led to similar fits. Figure 2 displays the fitted versus observed death counts for each of the four models, when using the negative binomial distribution. As can be seen, all models present a very good fit. Table 2 shows the observed and projected counts of deaths according to age group. The total number of deaths in the 17 Spanish autonomous communities during 2020 was 499,104, while the number projected by the spatio-temporal models for the same period was 426,970, the difference being 72,134 deaths. However, this increase was very unequally distributed among the communities. Table 3 shows the fitted mortality rates for 2020 (projected and actual) and the mortality rates ratios (real/ projected) are also displayed in Fig. 3. In the group of age less than 50 years, there was no significant increase in mortality rates. In the other age groups, Madrid was the community that experienced the highest growth rates (24.8%, 42.8%, and 39.4%) in the groups of 50-64 years, 65-64 years, and 75 and over, respectively). Increases in mortality rates were also high in the two communities adjacent to Madrid (Castilla-La Mancha and Castilla y León) and decreased in all directions but to the northeast. Catalonia also showed growth rates greater than 20% in groups aged 50 and over. In the island communities (the Canary and the Balearic Islands), the mortality rates estimated for 2020 did not show significant increases in relation to those projected in any of the age groups. The relationship between the mortality rates between the sex groups also showed spatial differences. For example, in the Basque Country and in the group aged 75 and over, the mortality rate was 31% higher in men than in women, while in Murcia it was only 20.9% higher. Figure 4 displays the evolution of the mortality rates for Canary Islands and Madrid for the age group 75 and over. These autonomous communities exhibited the greatest differences in the evolution pattern of mortality rates. The increase in the mortality rate in 2020 was not statistically significant in the Canary Islands (RR a = 1.01; 95% CI = 0.89; 1.13), while Madrid had the highest in Spain (RR a = 1.39; 95% CI = 1.23; 1.56).

Discussion
This study aims to assess the spatial distribution of excess mortality in Spain in 2020 in relation to the projection of mortality for that year based on the 2008-2020 sequence. For this purpose, four spatio-temporal models for the evolution of mortality rates through the 2008-2020 period, including the effects of autonomous  community and sex-and the appropriate interactionswere estimated. The impact of COVID-19 in 2020 as the joint result of all effects converging on mortality-as cited in [2]-was evaluated by introducing a specific effect for 2020 (community-year interaction). The modeling of the temporal effect in two components must be highlighted, namely: a general component (γ t ) dynamically modeled by means of a random walk of order two plus a specific linear trend for each community (d a ). Through these slopes, a significant decreasing trend was detected in mortality rates for the Canary Islands (d Canary = − 0.0077; 95% CI = − 0.0113; − 0.0042) and an increasing trend for Castilla La Mancha (d La Mancha = 0.0078; 95% CI = 0.0045; 0.0112). It should be noted that a closer study of the model revealed that the effects of sex were not uniform for all the autonomous communities, and hence a sex-community interaction was introduced through the parameters β s, a . Therefore, it is reasonable to think that the model is an adequate tool for estimating the spatial effects of an epidemic during a given period.
Our spatio-temporal model is estimated in the Bayesian framework by using the INLA procedure. INLA does not require iterative computations to obtain an approximation to posterior distributions, thus making it computationally efficient. Alternatively, Markov chain Monte Carlo (MCMC) methods (based on simulations and so more time and computing resources consuming) could be used, but many studies show that both perform similarly in a wide range of situations [21,23].
The difference between the observed deaths and those projected for the year 2020 found by our models was 72,    [25]. The INE has published a press release reporting that during the five first months of 2020 (the first and deadliest wave of the epidemic) the death toll attributable to COVID-19 (directly or because this disease acted as comorbidity) was 45,684 (51.5% men, 48.9% women, 87.1% older than 70 years) [26], but the assessment for the second half of the year has yet to be made. All these organizations focus on estimating the direct mortality due to COVID-19, but do not intend in any way to evaluate the global impact considering either indirect deaths or the effects of lockdown period.  Fig. 3 Ratio rates (RR) between observed and projected mortality rates for 2020 in each autonomous community. Projected mortality rates were obtained from the mortality data corresponding to years 2008-2019. Autonomous communities have been numbered as in Table 3 Fig . 4 Evolution of mortality rates for the age group of 75 year or over in Canary Islands and Madrid: observed and fitted rates according sex. The temporally effect was modeled dynamically by means of a random walk of order 2. Bands corresponding to the 95% credibility intervals All evaluations carried out by the organizations above cited agree on the fact that the effect on mortality has been greater in men and in the older age group. It has also been observed in European countries that the effect of sex on the number of deceased was higher in men that in women (relative risk = 1.60; CI 1.53-1.68) [27]. In another study with data from the first quarter 2020 in Wuhan [28], the authors observed a significantly higher mortality in men compared to women (odd ratio = 3.4; 95% CI 1.2-9.1).
The excess mortality estimates provided by our model, which includes the balance between the different components of excess mortality during the full year 2020, are consistent, albeit different, with those calculated by INE, ISCIII, and MoMo. While INE and ISCIII only show the number of deaths directly attributable to COVID19 (the INE only during the period covering the first wave of the pandemic), MoMo calculates deaths from all causes, but only considering the estimated periods of excess mortality.
Always taking as reference the projected mortality rates for 2020, Madrid was the Autonomous Community that showed the highest increase in the mortality rate, followed by the neighboring community of Castilla-La Mancha. This spread seems to be related to mobility between them in the period prior to lockdown. Similarly, Castilla y León, also adjacent to Madrid, experienced one of the largest increases in mortality. Madrid, Castilla y León and Castilla La Mancha form the central plateau of the Iberian Peninsula, and a very high percentage of Madrid's population has their roots and strong familiar relationships in those adjacent communities. Madrid was also the European city where the highest excess mortality was recorded during the pandemic, comparing deaths from all causes in Europe in the first half of the year according to the UK Office for National Statistics [29]. In a study on the impact of COVID-19 in metropolitan counties in the USA, it was found that larger metropolitan areas (measured in terms of population) presented higher mortality rates [30]. However, in the present study the effect of territorial contiguity was more influential than the size of the population, since in large cities such as Valencia (Valencian Community) or Seville (Andalusia), the impact of mortality was lower than in cities and towns of Castilla la Mancha adjacent to Madrid.
Catalonia also showed an increase in mortality rates in all age groups, which could be partly attributed to the high occupation rates in the railway and air corridors between Madrid and Barcelona. Catalonia presented the highest daily percentage increase in mortality in Spain (up to 33.96% of daily deaths) in the first 50 days after lockdown [5]. However, in those communities away from Madrid (Galicia, Murcia, Andalusia, and the Balearic, and Canary Islands), mortality rates did not show statistically significant variations in any of the age groups. This could be explained by the fact that when the lockdown began, the number of cases that had reached these communities, especially the island communities, was not large enough to cause an alteration in mortality rates. Other authors reached this same conclusion, based on the low mortality rates observed in the two Spanish autonomous cities in North Africa during lockdown [5]. The lockdown itself could act as a protective factor of mortality for other infectious diseases and thus compensate for the deaths caused by COVID-19. Something similar happened in Italy, where the impact of COVID-19 was much lower in the islands of Sardinia and Sicily [3]. Also, the fact that the mortality rate in Denmark did not increase during the COVID-19 pandemic compared to the mortality rates in the same period during 2015-2019 has been attributed to the protective effect of lockdown measures [31].

Conclusions
The excess of deaths in Spain in 2020 in relation to projected deaths for that period using the mortality data of the 2008-2019 sequence was 72,146 people, but their spatial distribution was very uneven. In central Spain, the greatest increase in mortality was detected in the adjacent communities of Madrid and Castilla La Mancha. On the contrary, the smallest increments were detected in autonomous communities more distant from Madrid, including the islands, with the only exception of Catalonia. All four models showed good fits (see Fig. 3) and have proved to be a useful tool for estimating the impact of COVID-19 on mortality in Spain in 2020, making it possible to assess how it affected different age groups while accounting for effects of sex, spatial variation between regions and time trend over the last few years.