Skip to main content

Quantifying the magnitude of the general contextual effect in a multilevel study of SARS-CoV-2 infection in Ontario, Canada: application of the median rate ratio in population health research

Abstract

Background

Regional variations in SARS-CoV-2 infection were observed in Canada and other countries. Studies have used multilevel analyses to examine how a context, such as a neighbourhood, can affect the SARS-CoV-2 infection rates of the people within it. However, few multilevel studies have quantified the magnitude of the general contextual effect (GCE) in SARS-CoV-2 infection rates and assessed how it may be associated with individual- and area-level characteristics. To address this gap, we will illustrate the application of the median rate ratio (MRR) in a multilevel Poisson analysis for quantifying the GCE in SARS-CoV-2 infection rates in Ontario, Canada.

Methods

We conducted a population-based, two-level multilevel observational study where individuals were nested into regions (i.e., forward sortation areas [FSAs]). The study population included community-dwelling adults in Ontario, Canada, between March 1, 2020, and May 1, 2021. The model included seven individual-level variables (age, sex, asthma, diabetes, hypertension, congestive heart failure, and chronic obstructive pulmonary disease) and four FSA census-based variables (household size, household income, employment, and driving to work). The MRR is a median value of the rate ratios comparing two patients with identical characteristics randomly selected from two different regions ordered by rate. We examined the attenuation of the MRR after including individual-level and FSA census-based variables to assess their role in explaining the variation in rates between regions.

Results

Of the 11 789 128 Ontario adult community-dwelling residents, 343 787 had at least one SARS-CoV-2 infection during the study period. After adjusting for individual-level and FSA census-based variables, the MRR was attenuated to 1.67 (39% reduction from unadjusted MRR). The strongest FSA census-based associations were household size (RR = 1.88, 95% CI: 1.71–1.97) and driving to work (RR = 0.68, 95% CI: 0.65–0.71).

Conclusions

The individual- and area-level characteristics in our study accounted for approximately 40% of the between-region variation in SARS-CoV-2 infection rates measured by MRR in Ontario, Canada. These findings suggest that population-based policies to address social determinants of health that attenuate the MRR may reduce the observed between-region heterogeneity in SARS-CoV-2 infection rates.

Peer Review reports

Background

The coronavirus disease 2019 (COVID-19) pandemic in Canada and other countries was marked by regional variations in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection [1,2,3]. An extensive evidence base has reported socioeconomic inequalities over time in SARS-CoV-2 infection [4]. Socioeconomic inequalities in SARS-CoV-2 infection have also been reported between regions; that is, regions with the highest infection rates often coincided with a high proportion of socially disadvantaged population groups [5,6,7,8,9]. The extent to which the heterogeneity of SARS-CoV-2 infection between regions is associated with individual-level factors (e.g., age, sex, income, employment) and area-level factors (e.g., population density) is challenging to parse. For instance, a systematic review focused on socioeconomic inequalities in SARS-CoV-2 infection reported that people living in areas with high population density were more likely to come into close contact with others and have a higher risk of SARS-CoV-2 infection [10]. Measuring how individual- and area-level factors may explain heterogeneity in SARS-CoV-2 infection requires relevant multilevel data and the application of multilevel methods while considering potential sources of biases in the multilevel study design [11,12,13].

Key features of multilevel analyses are the ability to model the relationship, measure heterogeneity, and partition variance between individual-level factors and area-level factors on individual outcomes [14, 15]. These methods are conceptually consistent with our current understanding that the risk of SARS-CoV-2 infection is influenced by the socioecological contexts (e.g., home, workplace, neighbourhood) in which people live. In the multilevel analysis literature, the general contextual effect (GCE) describes how an individual’s context influences the individual outcomes [16]. Studies have used multilevel analyses to examine how a context, such as a neighbourhood, can affect the SARS-CoV-2 infection rates of the people within it [17,18,19]. However, few multilevel studies have quantified the magnitude of the GCE in SARS-CoV-2 infection rates and assessed how it may be associated with individual- and area-level characteristics [20, 21].

The median rate ratio (MRR) is a summary measure that quantifies the magnitude of GCE in multilevel Poisson regression [22]. There are also similar measures for different multilevel analyses. The median odds ratio (MOR) is used in multilevel logistic regression [22], and the median hazard ratio is used in multilevel survival analyses [23]. The MRR is the median value of the rate ratio comparing two patients with identical measured characteristics randomly selected from two different areas, where the higher rate is the numerator, and the lower rate is the denominator. If we were to compute the rate ratios across all possible randomly selected pairs of individuals with identical measured characteristics from different areas, we would produce a distribution of rate ratios that are always greater or equal to 1. The MRR is the median of this distribution of rate ratios. The smallest possible value of the MRR is 1, which means there is no heterogeneity in the outcome between geographic areas. MMR values greater than 1 indicate heterogeneity in the outcome between geographic areas. The attenuation of the MRR after including individual- and area-level characteristics can identify potential characteristics associated with the variation between geographic areas.

In this paper, we will illustrate the application of the MRR in a multilevel Poisson regression to quantify the magnitude of GCE in SARS-CoV-2 infection using data from Ontario, Canada. Second, we assess how the heterogeneity in SARS-CoV-2 infection between areas measured by the MRR is associated with individual- and area-level characteristics. Finally, we aim to inform investigators about the potential opportunities and challenges in applying the MRR (and similar measures) in future multilevel studies in population health research.

Methods

Study design, setting, and population

We conducted a population-based, multilevel observational study using census, laboratory, and health administrative data. The study population included community-dwelling adults in Ontario, Canada. Ontario residents are covered by a universal, publicly funded health care plan. Ontario had substantial geographic variation in SARS-CoV-2 infection rates [6]. The study period included SARS-CoV-2 infections from March 1, 2020, to May 1, 2021. This study period encompasses Ontario’s first to third COVID-19 waves before the general adult population was eligible for COVID-19 vaccination [24]. Our multilevel study included two levels: individuals nested into areas. The area level was the census geographic unit of the forward sortation area (FSA), representing the first three characters of the six-character Canadian postal code [25].

Data sources, linkages, and inclusion criteria

We captured laboratory-identified SARS-CoV-2 infections using Ontario Laboratories Information System (OLIS) data and linked this information to relevant health administrative and census data. These datasets were linked using unique encoded identifiers and analyzed at ICES (formerly the Institute for Clinical Evaluative Sciences) [26]. The OLIS captured approximately 90% of all laboratory-identified SARS-CoV-2 infections reported in Ontario [6].

We obtained individual-level data from the Registered Persons database, the Ontario Health Insurance Program, the Canadian Institute for Health Information Discharge Abstract Database, the National Ambulatory Care Reporting System, the Continuing Care Reporting System, and the Ontario Drug Benefit claims database. We used validated algorithms to identify chronic disease conditions in the administrative data [27,28,29,30,31]. We obtained area-level information at the FSA using the 2016 Canadian Census data linked using the Postal Code Conversion File (PCCF + 2016, Version 7B) [32]. The 2016 Canadian census area profiles contain 513 FSAs for Ontario, with a median population of 22,260 [33]. The 2016 census FSAs for Ontario were mapped in Additional file 1: Figure S1.

If a person had more than one positive SARS-CoV-2 test during the study period, only the first positive test result was used. The SARS-CoV-2 infection cases included Ontario adults, 20 to 114 years old, with a laboratory-confirmed SARS-CoV-2 infection who were alive at the start of the study period. Individuals were excluded if they were missing age and postal code information, were not eligible for Ontario Health Insurance, or were residing in a long-term care facility in the 90 days before March 1, 2020. The Ontario population used as the offset variable for rates in this study included Ontario adults from the Ontario register data, age 20 to 114 years old, alive at the start of the study period. Individuals were excluded if they were missing postal code information, were not eligible for Ontario Health Insurance, or were residing in a long-term care facility in the 90 days before March 1, 2020.

Measures

For the study outcome, we investigated test-positive SARS-CoV-2 infection rates per 1000 people during the study period [34]. The study outcome is interpreted as a rate because we included the Ontario population as an offset (or exposure) variable for unequal exposure in the population size at risk [22].

We selected individual-level variables previously shown to be associated with SARS-CoV-2 test positivity in Ontario [6]. The individual-level variables included age (20–34, 35–49, 50–64, 65–114), sex (male, female), history of asthma (yes, no), history of diabetes (yes, no), history of hypertension (yes, no), history of congestive heart failure (CHF) (yes, no), and history of chronic obstructive pulmonary disease (COPD) (yes, no). Several area-level characteristics have established associations with geographic variation in SARS-CoV-2 rates [35,36,37]. We first created a comprehensive list of potential census variables for study inclusion using eight broad domains: age, ethnicity, family characteristics, immigration, income, labour, language, and education. Because census variables are often strongly correlated [38], we used the SAS VARCLUS procedure to conduct a hierarchical cluster analysis of the census variables to inform FSA census-based variable selection and reduce multicollinearity [39]. The FSA census-based variables included were household size, median after-tax household income, the proportion of employed people in sales/service jobs, and the proportion of people who primarily drive to work. Further details about the definitions of each census variable used in the study are included in Additional file 1: Table S1. To aid comparison between the area-level variables and to further reduce potential multicollinearity in our study, the area-level covariates were standardized to have a mean of zero and a standard deviation of 1.

Statistical analyses

Before fitting these regression models, we aggregated the person-level data by summing the number of test-positive SARS-CoV-2 infections and the number of people in the Ontario population at risk separately across the different covariate combinations of individual and area-level variables. This meant that each row of the aggregated data represented all the cases and the Ontario population who shared the same individual-level and area-level characteristics. Aggregating the data increases the computational efficiency of the statistical analysis when the data is large. We include a schematic diagram of the aggregated data structure in the Additional file 1: Figure S2. Due to rate instability concerns, we excluded rows with a population size of less than 20 people. We also excluded rows with missing census variables.

Because of the hierarchical structure of study data (with individuals nested within FSAs) and the outcome was a rate (i.e., the rate of test-positive SARS-CoV-2 infection per 1000 people), we applied multilevel Poisson regression with FSA-specific random intercepts [22]. We adopted a sequential modelling strategy [14]. Model I was the null model. We analyzed the quantified variation in the rate of SARS-CoV-2 infection before accounting for any individual or area variables. In model II, we included the seven individual-level variables. We expanded model II by including the four area-level variables in model III. We calculated the proportional change in the FSA variance in models II and III to assess how adding individual- and area-level characteristics accounts for some of the FSA variance in the null model [40] We also evaluated how individual and area-level covariates might account for GCE measured by MRR, by adapting the formula for the percentage excess risk explained to measure the attenuation of the MRR after including each set of variables [41, 42]: (MRRU – MRRA) / (MRRA – 1) * 100. The MRRU represents the unadjusted MRR from the null model as the reference, and the MRRA represents the MRR from each subsequent model in the sequential model-building strategy.

We conducted additional analyses to ensure adequate model fit and robustness of the results. We graphically assessed the linearity assumption between the continuous FSA census-based variables and the outcome using restricted cubic splines. Model fit statistics were produced for each model using the deviance, Akaike’s information criterion (AIC), and the Bayesian information criterion (BIC). A key distributional assumption of the Poisson regression model is equidispersion; the response variable’s variance equals the mean [43]. When the Poisson regression model is extradispersed, the variance in the response variable is smaller than the mean (underdispersed), or the variance in the response variable is larger than the mean (overdispersion). Overdispersion is especially concerning because it underestimates the standard errors. All models were assessed for equidispersion by examining whether the dispersion statistic was approximately 1. We also conducted sensitivity analyses to determine the robustness of results to changes in the study period (i.e., COVID-19 waves) and geographic unit of analysis. We used SAS Enterprise Guide v.8.15 (SAS Institute Inc, Cary, NC) for all analyses. The SAS GLIMMIX procedure was used to estimate the multilevel Poisson regression models [44].

Results

Descriptive statistics

The study flowchart for the cases is shown in Fig. 1, and the study flowchart for the Ontario population counts used as the offset is shown in the Additional file 1: Table S2. The cases included a total of 343 787 individuals (median age, 44 years, [interquartile range {IQR}, 30–57; range, 20–107 years]; 51% female) with a SARS-CoV-2 infection between March 1, 2020, and May 1, 2021. Table 1 shows the distribution of the study cohort’s demographic, chronic health conditions, and census-based area-level characteristics. Hypertension (22%) and asthma (15%) were the most prevalent chronic conditions. The Ontario population included 11 789 128 individuals (median age, 49 years, [IQR 34–63; range, 20–114]; 51% female). Except for the oldest age category, the summary distribution of the demographic, chronic conditions, and area-level characteristics in the Ontario population are similar to those who tested positive for SARS-CoV-2. After additional exclusion after data aggregation for rows with population sizes less than 20 and missing FSA census-based variables, the analytic study cohort used in the subsequent regression analyses had 342 779 SARS-CoV-2 cases, and the Ontario population size used as the offset was 11 762 208.

Fig. 1
figure 1

SARS-CoV-2 infection cases flow diagram Abbreviation COVID-19: coronavirus disease 2019; OLIS: Ontario Laboratories Information System

Table 1 Baseline characteristics of individuals with a SARS-CoV-2 infection and the Ontario population

Multilevel regression analyses

Table 2 shows the estimated incidence rate ratios and 95% confidence intervals from the three multilevel Poisson regression models that were sequentially adjusted using demographic characteristics, chronic conditions, and area-level characteristics. In the null multilevel Poisson regression model (model I), the MRR was 2.1 per 1000 people. This means that, on average, the rate of SARS-CoV-2 infection is 110% higher in one FSA compared to another randomly selected FSA. The MRR from the null model represents how much heterogeneity in the outcome is attributed to the difference between clusters before including additional variables.

After adjusting for the individual-level characteristics (i.e., age, sex, and chronic conditions) (model II), the MRR was attenuated to 2.07 per 1000 people (MRR attenuated by 3%). This means that, on average, the rate of SARS-CoV-2 infection is 107% higher between two randomly selected individuals with the same individual-level characteristics from two randomly selected FSA ordered by rates. The small attenuation of the MRR also coincided with a small 4.97% proportional change in the variance of the FSA random effect. This suggests that the individual-level variables in the model accounted for little of the between-FSA heterogeneity – the FSA contextual effect – in the rates of SARS-CoV-2. While adjusting for the other variables (i.e., individual binary or categorical values set to the reference level, and random effect being set to the same FSA) [45], an age gradient was observed from the youngest to oldest age group, where young adults had a higher incidence rate of SARS-CoV-2 infection per 1000 people compared to the oldest adult group. In addition, after adjustment for the other covariates, individuals with diabetes and congestive heart failure had a 20% and 30% higher incidence rate of SARS-CoV-2 infection per 1000 people compared to people who did not have these conditions, respectively.

After adjusting for both individual- and FSA census-based characteristics (model III), the MRR was attenuated to 1.67 per 1000 people (MRR attenuated by 39% from the null MRR). This means that, on average, the rate of SARS-CoV-2 infection is 67% higher between two randomly selected individuals with the same individual-level and FSA census-based characteristics from two randomly selected FSA ordered by rates. The large attenuation of the MRR also coincided with a large 52.46% proportional change in the variance of the FSA random effect. This suggests that individual-level and FSA census-based characteristics in the model accounted for a sizeable portion of the between-FSA heterogeneity – the FSA contextual effect – in the rates of SARS-CoV-2. After adjusting for the individual-level and FSA census-based characteristics, the results suggest that the incidence rate of SARS-CoV-2 infection per 1000 people in an FSA increased by 83% for each 1 unit increase in the standard deviation from the mean household size. In addition, after adjusting for the other covariates, the results suggest that the incidence rate of SARS-CoV-2 infection per 1000 people in an FSA decreased by 32% for each 1 unit increase in the standard deviation from the mean proportion of the labour force that primarily drives to work. After adjusting for both individual- and area-level characteristics, the magnitude of the individual-level rate ratios was not altered by including the area-level characteristics.

Because the MRR is on the ratio scale, it allows for comparing its magnitude with the association between the explanatory variables in the study and the outcome. In model III, the MRR was 1.67 per 1000 people, and the reciprocal of the MRR (i.e., 1/1.67) was 0.60 per 1000 people. In examining the rate ratios (excluding categorical age), 0 of the 6 individual-level characteristics had a rate ratio that exceeded the MRR interval (0.60,1.67), and 1 of the 4 FSA census-based variables had one that lay outside of the MRR interval. Household size had a rate ratio of 1.83 per 1000 (95% CI: 1.71–1.97), which exceeded 1.67. The magnitude of the FSA contextual effect (or clustering in an FSA) was larger than 9 out of 10 binary individual-level and continuous FSA census-based variables included in the study. This indicates that between-FSA variation on SARS-CoV-2 infection rates appears greater than the effect of the explanatory variables in the study.

The null model had a dispersion statistic of 1.27, which suggests the Poisson model was overdispersed. However, this is likely apparent overdispersion due to missing explanatory variables rather than real overdispersion, given that the subsequent models had a dispersion statistic close to 1 [46]. Therefore, the Poisson regression model fits the data well. The results of the graphical assessment of the linearity assumption are included in Additional file 1: Figure S3. Across each continuous FSA census-based variable and the predicted rate of the outcome in the fully adjusted model, the lines are close to linear, except at the 95% tail for the proportion of people driving to work. Therefore, the linearity assumption is reasonable for these variables in our study. The additional model fit statistics – deviance, AIC and BIC– represent how well each model fits the data [47]; lower values indicate better model fit. Model III had the best model fit across all three models. There were large decreases in the model fit statistics between model I to model II, but modest decreases in the fit statistics between model II and model III. Although including the FSA census-based variables did not substantially alter the model fit, it did account for a large proportion of the unexplained variance in the outcome between the FSAs based on the proportional change in the variance and the attenuated MRR. Furthermore, the results were robust regarding changes in the study period and geographic unit of analysis, as shown in Additional file 1: Tables S3-S6.

Table 2 Sequential multilevel Poisson regression models for individuals with a SARS-CoV-2 infection in Ontario

Discussion

We conducted a multilevel analysis to illustrate the utility of the MRR as a summary measure to quantify the magnitude of the GCE in SARS-CoV-2 infection and whether it could be explained by individual- and area-level characteristics in our study. In the fully adjusted model, the MRR was attenuated by approximately 40% from 2.1 in the null model to 1.67 per 1000. This means the rate of SARS-CoV-2 infection is 67% higher between two randomly selected individuals with the same individual-level and FSA census-based characteristics from two randomly selected FSA ordered by rates. However, a large FSA contextual effect still exists in the rate of SARS-CoV-2 infection even after accounting for the individual-level and FSA census-based variables in the study. The fully adjusted MRR of 1.67 was still larger than 9 out of the ten binary individual-level and continuous FSA census-based variables included in the study. In a prior study examining disparities in COVID-19 mortality, the authors described their attenuated MRR of 1.7 as a “fairly large contextual effect” [48]. This suggests that other factors not included in our study may explain even more of the between-FSA heterogeneity in the rates of SARS-CoV-2 infection. For example, our analysis did not include environmental measures (e.g., ambient air pollution) that have been shown to affect respiratory viral infection rates and may explain some of the unexplained variability between the FSAs [49]. Environmental risks often disproportionately impact socially disadvantaged groups and may be more amendable to intervention than the FSA census-based variables in our study [50].

Our study revealed a strong association between larger household sizes in a FSA and a higher rate of SARS-CoV-2 infection. This finding aligns with several other studies identifying a similar relationship [37, 51]. The higher infection rate is likely caused by close and frequent contact with people indoors. Larger household sizes are often associated with smaller physical house sizes, poor housing conditions (e.g., ventilation), more people working outside the home as essential workers, and more household members sharing a room [51]. Public health investment and policy recommendations to provide essential workers housing options to isolate outside of their homes, investments in housing, and better protective gear for essential workers are potential targets for intervention.

We identified a negative association between an increased number of people driving to work in an FSA and a lower rate of SARS-CoV-2 infection. After driving to work, the second most common form of commuting is public transportation, then walking or cycling. Several studies have identified a relationship between public transportation and risk of SARS-CoV-2 infection [52, 53]. The reduced infection risk in people driving to work is likely associated with avoiding close contact with people that would have occurred on public transportation. The lack of ventilation and crowded public transportation systems can increase the risk of infection on public transportation systems compared to driving. Policy recommendations to support working from home to reduce public transport crowding and improved ventilation systems in public transportation may be potential targets for intervention.

The heterogeneity in the between-FSA rates accounted for by the variables in the study can, through attenuation of the MRR, alert policymakers to factors to address at the population level and more explicitly consider how much between-region heterogeneity would still exist after accounting for individual- and area-level characteristics. This can inform important decisions, such as prioritizing resources and suggesting potentially modifiable intervention targets that may have the greatest impact. For example, out of the explanatory variables included in our study, our findings suggest potential interventions to address social determinants of household size may have the most influence on reducing the between-FSA rates of SARS-CoV-2. The MRR can be used as a summary measure to monitor the heterogeneity of an outcome between regions. For example, it can assess the before-and-after impact of a large-scale policy intervention on addressing the heterogeneity of an outcome between regions.

Our study results need to be interpreted considering the completeness of the individual model. Our study lacks individual- and area-level variables of the same sociodemographic factors (e.g., individual-level median income and average median-level income in the FSA). Previous research has shown that individual– and area-level measures do not measure the same construct [54,55,56,57]. The lack of individual-level measures of the same sociodemographic factors as the area-level measures makes it unclear whether the variation explained in the MRR by the area-level variables would disappear after including the corresponding individual-level variables. Therefore, we cannot parse whether the individual or area-level variables explain more FSA-level variation. One of the proposed potential benefits of the MRR is the ability to compare the magnitude of the MRR to the magnitude of the fixed effect rate ratios [22]. Our results suggest that compared to individual- and area-level variables included in our study, unmeasured factors in the FSA may have more relevance to the rate of SARS-CoV-2 infection. However, it is often difficult to compare the magnitude of measures of association given differences in the underlying units of the variable, even when the variable is standardized [41, 58]. Our analyses were focused on the MRR, but future studies can explore the inclusion of the variance partition coefficient to understand the systematic differences between geographies [22].

Some limitations of this study should be noted. The results of our study of positive test results for SARS-CoV-2 were conditioned on being a laboratory-confirmed case from a lab that provided data to OLIS. Our analysis assumes that the distribution of the positive SARS-CoV-2 infections are randomly distributed across the FSAs in Ontario. However, the relationship between the individual- and area-level variables on the between-FSA heterogeneity is likely different in individuals with a positive test result for SARS-CoV-2 not represented in OLIS or had a SARS-CoV-2 infection that was not laboratory confirmed. In addition, geographical differences in access to tests and testing strategies may influence rates in ways unaccounted by the varying FSA random intercepts used in our study. Our analysis used FSA-specific random intercepts that assume the infection incidence rate for individuals with a given set of characteristics varies between FSA, and the association (or slope) between infection incidence rate and explanatory variables is consistent (on average) across all the FSAs [14]. However, the association between the outcomes and the explanatory variables may vary across FSAs. In Ontario, some priority groups (i.e., health care workers) and settings (i.e., high-risk congregate settings) received COVID-19 vaccinations before May 1, 2021. We were not able to account for these individuals in our analysis.

The Canadian census data are only collected every five years, and because 2021 data were not available, we used 2016 census data, which has the potential for misclassification [38]. The 2016 census data might not accurately reflect current geographic areas, especially areas affected by recent gentrification and rapid development. In addition, the 2016 census data cannot capture how people were affected by the COVID-19 pandemic (e.g., job loss) and how people changed their behaviours in response to the pandemic (e.g., drove instead of taking public transport). The use of census data can result in the modifiable areal unit problem because a particular census geography might not reflect the most relevant spatial units [38]. However, in our sensitivity analysis, our results were robust even when we changed the geographic unit from FSA to dissemination area in Additional file 1: Table S6. Our multilevel Poisson regression analysis was non-spatial because the spatial proximity of the FSAs was not directly modelled. A non-spatial multilevel analysis with random effects for geographic clusters can indirectly account for some spatial structure. However, our multilevel model does not allow for spatial smoothing or account for spatial autocorrelation. such as more complex hierarchical Bayesian models [5, 59, 60]. If strong spatial autocorrelation exists, this could bias estimates and underestimate the variance. However, the multilevel Poisson model is better at handling large population-based data and is easier to implement than hierarchical Bayesian models.

The main challenge in implementing multilevel models is the need for multilevel data that contains relevant individual- and area-level variables. The accuracy of the measure of geographic variation and the potential relevance of the individual- and area-level variables is determined by including relevant variables, especially individual-level variables. The lack of relevant individual data is common in multilevel studies [61, 62]. The existing tutorials on summary measures of the magnitude of geographic variation tend to focus more on theory, application, and interpretation [22, 23, 45, 63] rather than bias and study design. In our application of these models, we have highlighted some of the challenges to interpretation, considering study design limitations and strategies for dealing with potential sources of error. Our analysis assumed a steady-state population and a constant incidence rate over the study period [64]. A multilevel survival analysis using the median hazard ratio may be more appropriate for longitudinal studies interested in modelling interactions, competing risks, variable person-time at-risk with changing immunity status, or variable incidence rate over time. Future applications of the MRR would be improved by examples of how to compute credible intervals for the MRR using commonly available software (e.g., SAS, R, and Stata). Future multilevel studies should continue to consider the theoretical underpinnings and strategies to overcome potential threats of validity when applying these methods [12,13,14, 65].

Conclusions

Understanding how social determinants affect population health outcomes and measuring how between-region heterogeneity in health outcomes are associated with individual- and area-level characteristics is an important goal in population health research. The use of multilevel models with the inclusion of summary measures of area-level variation (i.e., MRR, MOR, MHR) could help move closer to this goal. This study has demonstrated how MRR and similar measures could be valuable to the population health toolkit to measure geographic inequities in population health outcomes and understand potential factors driving the heterogeneity.

Data availability

The dataset used in this study is held securely in coded format at ICES. ICES is a prescribed entity under section 45 of Ontario’s Personal Health Information Protection Act. Section 45 authorizes ICES to collect personal health information, without consent, for the purpose of analysis or compiling statistical information with respect to the management of, evaluation or monitoring of, the allocation of resources to or planning for all or part of the health system. Legal restrictions and data sharing agreements prohibit ICES from making the dataset publicly available. Access may be granted to those who meet the conditions for confidential access, available at https://www.ices.on.ca/DAS

Abbreviations

AIC:

Akaike information criterion

BIC:

Bayesian information criterion

CHF:

Congestive heart failure

COPD:

Chronic obstructive pulmonary disease

COVID-19:

CORONAVIRUS disease 2019

FSA:

Forward sortation area

GCE:

General contextual effect

IQR:

Interquartile range

MHR:

Median hazard ratio

MOR:

Median odds ratio

MRR:

Median rate ratio

OLIS:

Ontario Laboratories Information System database

PCCF:

Postal code conversion file

SARS-CoV-2:

Severe acute respiratory syndrome coronavirus 2

References

  1. Mishra S, Ma H, Moloney G, Yiu KCY, Darvin D, Landsman D et al. Increasing concentration of COVID-19 by socioeconomic determinants and geography in Toronto, Canada: an observational study. Ann Epidemiol [Internet]. 2022;65:84–92. https://doi.org/10.1016/j.annepidem.2021.07.007

  2. Ma Q, Gao J, Zhang W, Wang L, Li M, Shi J et al. Spatio-temporal distribution characteristics of COVID-19 in China: a city-level modeling study. BMC Infect Dis [Internet]. 2021;21(1):1–14. https://doi.org/10.1186/s12879-021-06515-8

  3. Fatima M, O’keefe KJ, Wei W, Arshad S, Gruebner O. Geospatial analysis of covid-19: a scoping review. Int J Environ Res Public Health. 2021;18(5):1–14.

    Article  Google Scholar 

  4. Beese F, Waldhauer J, Wollgast L, Pförtner TK, Wahrendorf M, Haller S, et al. Temporal Dynamics of Socioeconomic Inequalities in COVID-19 outcomes over the course of the Pandemic—A scoping review. Int J Public Health. 2022;67(August):1–14.

    Google Scholar 

  5. Zelner J, Trangucci R, Naraharisetti R, Cao A, Malosh R, Broen K, et al. Racial disparities in Coronavirus Disease 2019 (COVID-19) mortality are driven by unequal infection risks. Clin Infect Dis. 2021;72(5):E88–95.

    Article  CAS  PubMed  Google Scholar 

  6. Sundaram ME, Calzavara A, Mishra S, Kustra R, Chan AK, Hamilton MA et al. Individual and social determinants of SARS-CoV-2 testing and positivity in Ontario, Canada: a population-wide study. Can Med Assoc J [Internet]. 2021;193(20):E723–34. http://www.cmaj.ca/lookup/doi/https://doi.org/10.1503/cmaj.202608

  7. Cohen-Cline H, Li HF, Gill M, Rodriguez F, Hernandez-Boussard T, Wolberg H, et al. Major disparities in COVID-19 test positivity for patients with non-english preferred language even after accounting for race and social factors in the United States in 2020. BMC Public Health. 2021;21(1):1–9.

    Article  Google Scholar 

  8. Millett GA, Jones AT, Benkeser D, Baral S, Mercer L, Beyrer C et al. Assessing Differential Impacts of COVID-19 on Black Communities. Ann Epidemiol [Internet]. 2020; https://doi.org/10.1016/j.annepidem.2020.05.003

  9. Chen JT, Krieger N. Revealing the unequal burden of COVID-19 by income, race/ethnicity, and household crowding: US county versus zip code analyses. J Public Heal Manag Pract. 2021;27(1):S46–56.

    Google Scholar 

  10. Benita F, Rebollar-Ruelas L, Gaytán-Alfaro ED. What have we learned about socioeconomic inequalities in the spread of COVID-19? A systematic review. Sustain Cities Soc. 2022;86(April).

  11. Riva M, Gauvin L, Barnett TA. Toward the next generation of research into small area effects on health: a synthesis of multilevel investigations published since July 1998. J Epidemiol Community Health. 2007;61(10):853–61.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Blakely TA, Woodward AJ. Ecological effects in multi-level studies. J Epidemiol Community Health [Internet]. 2000;54(5):367–74. https://jech.bmj.com/lookup/doi/https://doi.org/10.1136/jech.54.5.367

  13. Diez-Roux AV. Bringing context back into epidemiology: variables and fallacies in multilevel analysis. Am J Public Health [Internet]. 1998;88(2):216–22. https://doi.org/10.2105/AJPH.88.2.216

  14. Leyland AH, Groenewegen PP, Multilevel Modelling for Public Health and Health Services Research [Internet]. Cham: Springer International Publishing; 2020. 293 p. https://link.springer.com/book/10.1007/978-3-030-34801-4

  15. Merlo J, Wagner P, Leckie G. A simple multilevel approach for analysing geographical inequalities in public health reports: The case of municipality differences in obesity. Heal Place [Internet]. 2019;58(December 2018):102145. https://doi.org/10.1016/j.healthplace.2019.102145

  16. Merlo J, Wagner P, Austin PC, Subramanian SV, Leckie G. General and specific contextual effects in multilevel regression analyses and their paradoxical relationship: a conceptual tutorial. SSM - Popul Heal. 2018;5(March):33–7.

    Article  Google Scholar 

  17. Padellini T, Jersakova R, Diggle PJ, Holmes C, King RE, Lehmann BCL et al. Time varying association between deprivation, ethnicity and SARS-CoV-2 infections in England: A population-based ecological study. Lancet Reg Heal - Eur [Internet]. 2022;15:100322. https://doi.org/10.1016/j.lanepe.2022.100322

  18. Saville CWN, Thomas DR. Social capital and geographical variation in the incidence of COVID-19: an ecological study. J Epidemiol Community Health. 2022;76(6):544–9.

    Article  PubMed  Google Scholar 

  19. Consolazio D, Murtas R, Tunesi S, Gervasi F, Benassi D, Russo AG. Assessing the Impact of Individual Characteristics and Neighborhood Socioeconomic Status during the COVID-19 pandemic in the provinces of Milan and Lodi. Int J Heal Serv. 2021;51(3):311–24.

    Article  Google Scholar 

  20. Griffith GJ, Davey Smith G, Manley D, Howe LD, Owen G. Interrogating structural inequalities in COVID-19 mortality in England and Wales. J Epidemiol Community Health. 2021;75(12):1165–71.

    Article  PubMed  Google Scholar 

  21. Griffith GJ, Owen G, Manley D, Howe LD, Davey Smith G. Continuing inequalities in COVID-19 mortality in England and Wales, and the changing importance of regional, over local, deprivation. Heal Place [Internet]. 2022;76(February):102848. https://doi.org/10.1016/j.healthplace.2022.102848

  22. Austin PC, Stryhn H, Leckie G, Merlo J. Measures of clustering and heterogeneity in multilevel Poisson regression analyses of rates/count data. Stat Med. 2018;37(4):572–89.

    Article  PubMed  Google Scholar 

  23. Austin PC, Wagner P, Merlo J. The median hazard ratio: a useful measure of variance and general contextual effects in multilevel survival analysis. Stat Med. 2017;36(6):928–38.

    Article  PubMed  Google Scholar 

  24. Canadian Institute for Health Information. Canadian Data Set of COVID-19 Interventions - Data Tables [Internet]. Ottawa, ON. 2022. https://www.cihi.ca/en/covid-19-intervention-scan

  25. Forward Sortation Area. - Definition [Internet]. [cited 2024 Mar 13]. https://ised-isde.canada.ca/site/office-superintendent-bankruptcy/en/statistics-and-research/forward-sortation-area-fsa-and-north-american-industry-classification-naics-reports/forward-sortation-area-definition

  26. ICES. ICES Data Repository data sets [Internet]. [cited 2024 Mar 14]. https://datadictionary.ices.on.ca/Applications/DataDictionary/Default.aspx

  27. Gershon AS, Wang C, Guan J, Vasilevska-Ristovska J, Cicutto L, To T. Identifying patients with physician-diagnosed asthma in Health administrative databases. Can Respir J. 2009;16(6):183–8.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Gershon AS, Wang C, Guan J, Vasilevska-Ristovska J, Cicutto L, To T. Identifying individuals with physcian diagnosed COPD in health administrative databases. COPD J Chronic Obstr Pulm Dis. 2009;6(5):388–94.

    Article  CAS  Google Scholar 

  29. Schultz SE, Rothwell DM, Chen Z, Tu K. Identifying cases of congestive heart failure from administrative data: a validation study using primary care patient records. Chronic Dis Inj Can. 2013;33(3):160–6.

    Article  CAS  PubMed  Google Scholar 

  30. Hux JE, Ivis F, Flintoft V, Bica A. Diabetes in Ontario: determination of prevalence and incidence using a validated administrative data algorithm. Diabetes Care [Internet]. 2002;25(3):512–6. https://diabetesjournals.org/care/article/25/3/512/21950/Diabetes-in-OntarioDetermination-of-prevalence-and

  31. Tu K, Campbell NR, Chen Z-L, Cauch-Dudek KJ, McAlister FA. Accuracy of administrative databases in identifying patients with hypertension. Open Med [Internet]. 2007;1(1):e18-26.

  32. Statistics Canada. Postal Code OM Conversion File Plus (PCCF+) Version 7E, Reference Guide November 2021 Postal codes OM. 2021; https://library.carleton.ca/sites/default/files/2022-06/PCCF%2BUserguide-2021.pdf

  33. Myran DT, Chen JT, Giesbrecht N, Rees VW. The association between alcohol access and alcohol-attributable emergency department visits in Ontario, Canada. Addiction. 2019;114(7):1183–91.

    Article  PubMed  Google Scholar 

  34. Ayoub HH, Mumtaz GR, Seedat S, Makhoul M, Chemaitelly H, Abu-Raddad LJ. Estimates of global SARS-CoV-2 infection exposure, infection morbidity, and infection mortality rates in 2020. Glob Epidemiol [Internet]. 2021;3(November):100068. https://doi.org/10.1016/j.gloepi.2021.100068

  35. Antonova L, Somayaji C, Cameron J, Sirski M, Sundaram ME, McDonald JT et al. Comparison of socio-economic determinants of COVID-19 testing and positivity in Canada: A multi-provincial analysis. PLoS One [Internet]. 2023;18(8 August):1–16. https://doi.org/10.1371/journal.pone.0289292

  36. Niedzwiedz CL, O’Donnell CA, Jani BD, Demou E, Ho FK, Celis-Morales C et al. Ethnic and socioeconomic differences in SARS-CoV-2 infection: prospective cohort study using UK Biobank. BMC Med [Internet]. 2020;18(1):160. https://www.medrxiv.org/content/

  37. van Ingen T, Brown KA, Buchan SA, Akingbola S, Daneman N, Warren CM et al. Neighbourhood-level socio-demographic characteristics and risk of COVID-19 incidence and mortality in Ontario, Canada: A population-based study. PLoS One [Internet]. 2022;17(10 October):1–13. https://doi.org/10.1371/journal.pone.0276507

  38. Messer LC, Kaufman JS. Using Census Data to approximate Neighborhood effects. In: Oakes JM, Kaufman JS, editors. Methods in Social Epidemiology. 1st ed. San Francisco: Jossey-Bass; 2006. pp. 209–38.

    Google Scholar 

  39. SAS Institute Inc. The VARCLUS Procedure. SAS/STAT 13.2 user’s guide. Cary, NC; 2014.

  40. Merlo J, Chaix B, Yang M, Lynch J, Råstam L. A brief conceptual tutorial on multilevel analysis in social epidemiology: interpreting neighbourhood differences and the effect of neighbourhood characteristics on individual health. J Epidemiol Community Health. 2005;59(12):1022–8.

    Article  PubMed  PubMed Central  Google Scholar 

  41. Szklo M, Neito FJ. Epidemiology: beyond the basics. Third. Jones & Bartlett Learning; 2018.

  42. Schempf AH, Kaufman JS. On the percent of excess risk explained. J Epidemiol Community Heal [Internet]. 2011;65(2):190–190. http://jech.bmj.com/cgi/doi/https://doi.org/10.1136/jech.2010.118190

  43. Hilbe JM. Modeling Count Data [Internet]. Cambridge University Press; 2014. https://www.cambridge.org/core/product/identifier/9781139236065/type/book

  44. SAS Institute Inc. Chapter 51: the GLIMMIX Procedure. SAS/STAT® 152 User’s Guide. Cary, NC: SAS Institute Inc.; 2020.

    Google Scholar 

  45. Austin PC, Merlo J. Intermediate and advanced topics in multilevel logistic regression analysis. Stat Med. 2017;36(20):3257–77.

    Article  PubMed  PubMed Central  Google Scholar 

  46. Hilbe JM. Chapter 3: testing overdispersion. Model Count Data. 2014. 74–107 p.

  47. Stokes ME, Davis CS, Koch GG. Categorical Data Analysis Using SAS. 2012.

  48. Rostila M, Cederström A, Wallace M, Brandén M, Malmberg B, Andersson G. Disparities in Coronavirus Disease 2019 Mortality by Country of Birth in Stockholm, Sweden: A Total-Population-based Cohort Study. Am J Epidemiol. 2021;190(8):1510–8.

    Article  PubMed  PubMed Central  Google Scholar 

  49. Zoran MA, Savastru RS, Savastru DM, Tautan MN. Assessing the relationship between surface levels of PM2.5 and PM10 particulate matter impact on COVID-19 in Milan, Italy. Sci Total Environ [Internet]. 2020;738:139825. https://doi.org/10.1016/j.scitotenv.2020.139825

  50. Mohai P, Pellow D, Roberts JT. Environmental justice. Annu Rev Environ Resour. 2009;34(July):405–30.

    Article  Google Scholar 

  51. Liu P, McQuarrie L, Song Y, Colijn C. Modelling the impact of household size distribution on the transmission dynamics of COVID-19. J R Soc Interface. 2021;18:177.

    Article  Google Scholar 

  52. Gartland N, Fishwick D, Coleman A, Davies K, Hartwig A, Johnson S et al. Transmission and control of SARS-CoV-2 on ground public transport: A rapid review of the literature up to May 2021. J Transp Heal [Internet]. 2022;26(January):101356. https://linkinghub.elsevier.com/retrieve/pii/S2214140522000287

  53. Park J, Kim G. Risk of covid-19 infection in public transportation: the development of a model. Int J Environ Res Public Health. 2021;18:23.

    Article  Google Scholar 

  54. Geronimus AT, Bound J, Neidert LJ. On the validity of using Census Geocode characteristics to Proxy Individual socioeconomic characteristics. J Am Stat Assoc. 1996;91(434):529–37.

    Article  Google Scholar 

  55. Buajitti E, Chiodo S, Rosella LC. Agreement between area- and individual-level income measures in a population-based cohort: Implications for population health research. SSM - Popul Heal [Internet]. 2020;10:100553. https://doi.org/10.1016/j.ssmph.2020.100553

  56. Diez Roux AV, Kiefe CI, Jacobs DR, Haan M, Jackson SA, Nieto FJ, et al. Area characteristics and individual-level socioeconomic position indicators in three population-based epidemiologic studies. Ann Epidemiol. 2001;11(6):395–405.

    Article  CAS  PubMed  Google Scholar 

  57. Pichora E, Polsky JY, Catley C, Perumal N, Jin J, Allin S. Comparing individual and area-based income measures: impact on analysis of inequality in smoking, obesity, and diabetes rates in canadians 2003–2013. Can J Public Heal. 2018;109(3):410–8.

    Article  Google Scholar 

  58. Greenland S, Maclure M, Schlesselman JJ, Poole C, Morgenstern H. Standardized regression coefficients: a further critique and review of some alternatives. Epidemiology. 1991;2(5):387–92.

    Article  CAS  PubMed  Google Scholar 

  59. Rohleder S, Costa DD, Bozorgmehr PK. Area-level socioeconomic deprivation, non-national residency, and Covid-19 incidence: A longitudinal spatiotemporal analysis in Germany. eClinicalMedicine [Internet]. 2022;49:101485. https://doi.org/10.1016/j.eclinm.2022.101485

  60. Rothman KJ, Greenland S, Associate TLL. Modern Epidemiology, 3rd Edition. Hastings Cent Rep [Internet]. 2014;44 Suppl 2:insidebackcover. http://www.ncbi.nlm.nih.gov/pubmed/24644503

  61. O’Campo P, Xue X, Wang MC, Brien Caughy MO. Neighborhood risk factors for low birthweight in Baltimore: a multilevel analysis. Am J Public Health. 1997;87(7):1113–8.

    Article  PubMed  PubMed Central  Google Scholar 

  62. Groenewegen PP, Leufkens HG, Spreeuwenberg P, Worm W. Neighbourhood characteristics and use of benzodiazepines in the Netherlands. Soc Sci Med. 1999;48(12):1701–11.

    Article  CAS  PubMed  Google Scholar 

  63. Larsen K, Merlo J. Appropriate assessment of neighborhood effects on individual health: integrating random and fixed effects in multilevel logistic regression. Am J Epidemiol. 2005;161(1):81–8.

    Article  PubMed  Google Scholar 

  64. Lash TL, VanderWeele TJ, Haneuse S, Rothman KJ. Chapter 04 - measures of occurrence. Modern epidemiology. 4th ed. Lippincott Williams & Wilkins; 2021. pp. 54–77.

  65. Morrison CN, Mair CF, Bates L, Duncan DT, Branas CC, Bushover BR et al. Defining Spatial Epidemiology: A Systematic Review and Re-Orientation. Epidemiology [Internet]. 2024; https://journals.lww.com/https://doi.org/10.1097/EDE.0000000000001738

Download references

Acknowledgements

The authors thank IQVIA Solutions Canada Inc. for use of their Drug Information File.

Funding

This study was supported by ICES, which is funded by an annual grant from the Ontario Ministry of Health (MOH) and the Ministry of Long-Term Care (MLTC). This document used data adapted from the Statistics Canada Postal CodeOM Conversion File, which is based on data licensed from Canada Post Corporation, and/or data adapted from the Ontario Ministry of Health Postal Code Conversion File, which contains data copied under license from ©Canada Post Corporation and Statistics Canada. Parts of this material are based on data and information compiled and provided by: MOH, MLTC, Statistics Canada, and the Canadian Institute for Health Information. The analyses, conclusions, opinions and statements expressed herein are solely those of the authors and do not reflect those of the funding or data sources; no endorsement is intended or should be inferred. This study was supported by the Ontario Health Data Platform (OHDP), a Province of Ontario initiative to support Ontario’s ongoing response to COVID-19 and its related impacts. The opinions, results and conclusions reported in this paper are those of the authors and are independent from the funding sources. No endorsement by the OHDP, its partners, or the Province of Ontario is intended or should be inferred. LCR is funded by a Canada Research Chair in Population Health Analytics (950-27302). JCK is supported by a Clinician-Scientist Award from the University of Toronto Department of Family and Community Medicine. SM is funded by a Tier 2 Canada Research Chair in Mathematical Modeling and Program Science (950-232643).

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: TW, LCR; Methodology: TW, JCK, SM, LCR; Formal analysis: TW; writing-original draft preparation: TW, KK; Writing – review and editing: JCK, SM, LCR. Supervision: LCR. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Tristan Watson.

Ethics declarations

Ethics approval and consent to participate

This study was approved by the University of Toronto Health Sciences Research Ethics Board (Protocol #39356).

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

12963_2024_348_MOESM1_ESM.pdf

Supplementary Material 1: Additional file 1: Figure S1, Ontario FS, Area (FSA), Map, Table S. Definitions of census-based area characteristics: constructs, statistical units, and operational definitions using the Ontario, Canada, census area profiles, 2016. Figure S2. Schematic diagram of the aggregated data structure used with Poisson multilevel models. Table S2. Ontario population study flow table. Figure S3. Graphical assessment of the linearity assumption of the FSA census-based continuous variables and the rate of SARS-CoV-2 infection using restricted cubic splines in the fully adjusted multilevel Poisson regression. Figure S4. Raw and Pearson Residuals vs Predicted Values Plots. Table S3. COVID-19 Wave 1 Analysis: Sequential multilevel Poisson count regression models for individuals with a SARS-CoV-2 infection in Ontario, Canada between March 1, 2020, and July 31, 2020. Table S4. COVID-19 Wave 2 Analysis: Sequential multilevel Poisson count regression models for individuals with a SARS-CoV-2 infection in Ontario, Canada, between August 1, 2020, and March 1, 2021. Table S5. COVID-19 Wave 3 Analysis: Sequential multilevel Poisson count regression models for individuals with a SARS-CoV-2 infection in Ontario, Canada, between March 2, 2021, and May 1, 2021. Table S6. Dissemination Area Sensitivity Analysis: Sequential multilevel Poisson count regression models for individuals with a SARS-CoV-2 infection in Ontario, Canada, between March 1, 2020, and May 1, 2021 (PDF 619 kb).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Watson, T., Kwong, J.C., Kornas, K. et al. Quantifying the magnitude of the general contextual effect in a multilevel study of SARS-CoV-2 infection in Ontario, Canada: application of the median rate ratio in population health research. Popul Health Metrics 22, 27 (2024). https://doi.org/10.1186/s12963-024-00348-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12963-024-00348-8

Keywords