Hospital utilization rates for influenza and RSV: a novel approach and critical assessment

Background Influenza and respiratory syncytial virus (RSV) contribute significantly to the burden of acute lower respiratory infection (ALRI) inpatient care, but heterogeneous coding practices and availability of inpatient data make it difficult to estimate global hospital utilization for either disease based on coded diagnoses alone. Methods This study estimates rates of influenza and RSV hospitalization by calculating the proportion of ALRI due to influenza and RSV and applying this proportion to inpatient admissions with ALRI coded as primary diagnosis. Proportions of ALRI attributed to influenza and RSV were extracted from a meta-analysis of 360 total sources describing inpatient hospital admissions which were input to a Bayesian mixed effects model over age with random effects over location. Results of this model were applied to inpatient admission datasets for 44 countries to produce rates of hospital utilization for influenza and RSV respectively, and rates were compared to raw coded admissions for each disease. Results For most age groups, these methods estimated a higher national admission rate than the rate of directly coded influenza or RSV admissions in the same inpatient sources. In many inpatient sources, International Classification of Disease (ICD) coding detail was insufficient to estimate RSV burden directly. The influenza inpatient burden estimates in older adults appear to be substantially underestimated using this method on primary diagnoses alone. Application of the mixed effects model reduced heterogeneity between countries in influenza and RSV which was biased by coding practices and between-country variation. Conclusions This new method presents the opportunity of estimating hospital utilization rates for influenza and RSV using a wide range of clinical databases. Estimates generally seem promising for influenza and RSV associated hospitalization, but influenza estimates from primary diagnosis seem highly underestimated among older adults. Considerable heterogeneity remains between countries in ALRI coding (i.e., primary vs non-primary cause), and in the age profile of proportion positive for influenza and RSV across studies. While this analysis is interesting because of its wide data utilization and applicability in locations without laboratory-confirmed admission data, understanding the sources of variability and data quality will be essential in future applications of these methods. Supplementary Information The online version contains supplementary material available at 10.1186/s12963-021-00252-5.


Background
Despite the large burden of lower respiratory infections globally [1], it is difficult to estimate the proportion of the hospitalizations attributable to influenza and respiratory syncytial virus (RSV) across countries or over time. Heterogeneous coding practices in hospital records across countries limit the comparability of administrative datasets from different locations and pose a challenge to producing global hospitalization estimates using influenza and RSV-coded inpatient admissions alone. Without the addition of laboratory test result data, administrative data may not accurately estimate inpatient disease burden, further complicating efforts to model burden at the population level. Absent accurate population estimates of the burden of specific respiratory diseases, it will be challenging to conduct crosscountry comparison, a hallmark of linking health policies (e.g., masking, vaccination campaigns) to outcomes.
The Burden of Influenza and RSV Disease (BIRD) project has developed an alternative method that may be useful for producing estimates of county-specific influenza and RSV burdens using administrative hospitalization data. This method generates rates of influenza and RSV-related acute lower respiratory illness (ALRI) hospitalizations across 44 countries by modeling the proportion of ALRI hospitalizations specifically attributable to RSV and influenza from literature estimates of laboratory-confirmed influenza and RSV among ALRI hospitalizations. The model can be applied to administrative data on country-specific influenza and RSV utilization. By comparing the results of the BIRD project method to those produced by raw extraction of ICD-coded RSV and influenza admission rates, we can estimate the potential under-attribution of ALRI to these specific causes.

Methods
At a high level, this study estimates influenza and RSV admission rates by modeling the proportion of ALRI admissions that are due to influenza and RSV respectively, and then multiplying these proportions by ALRI admission rates from clinical administrative data. Figure 1 below is a detailed flowchart of the processing steps used in this analysis, and each step is described in further detail in the following sections.

ALRI admissions calculation
We extracted admission counts for ALRI from 29 inpatient all-cause admission datasets covering 44 countries and containing hospitalizations spanning the years 1990 to 2017, stratified by age in years or age groups depending on the source. These datasets included approximately 43 million admissions and represent all ICDcoded inpatient admission data used in the Global Burden of Disease Study, an international collaborative study led by the Institute for Health Metrics and Evaluation (IHME) at the University of Washington and supported by over 4800 researchers in more than 140 Fig. 1 Flowchart of ALRI admission processing and meta-analysis modeling. Flowchart of data processing and analysis conducted under this study. This diagram describes processing of ALRI admissions from clinical administrative data as well as the modeling and processing performed on RSV and Influenza meta-analysis proportions countries [1]. Additional detail on inpatient data from IHME is listed in Additional file 1. Because only 11 of the 44 datasets utilized in this study recorded secondary diagnoses, ALRI admissions were defined as those with a primary diagnosis code listed in Table 1 below.
The majority of clinical datasets in this analysis contain a subset of the country's total inpatient utilization. For these non-comprehensive clinical sources, counts of ALRI admissions by age were divided by the total number of admissions in the dataset to produce age-specific proportions of inpatient utilization that have a primary ALRI diagnosis. This proportion is multiplied by IHME's total inpatient utilization envelope to approximate a comprehensive rate of ALRI utilization by age and country. The envelope is produced using a spatio-temporal Gaussian process regression that smooths over geographic distance and year of hospitalization and that models admission rate per capita by age using IHME's healthcare access quality indicator, supply of inpatient hospital beds, and all-cause mortality as predictive covariates. More detail on the envelope estimation process, covariates used in the model, and results can be found in related Global Burden of Disease (GBD) publications [1].
The UK Hospital Episode Statistics dataset [2] and Healthcare Cost and Utilization Project National Inpatient Sample (HCUP NIS) [3] are considered comprehensive datasets and the scaling described above was not applied to these sources. Instead, counts of admissions with a primary ALRI diagnosis in these sources were divided by the total population of that country to produce rates of ALRI admission by BIRD age group and year. Population estimates are produced as part of IHME's GBD study and detailed information on the methods to produce these estimates are available in related publications [1].
Most clinical administrative data is provided in age in years or occasionally in various aggregated age bins. The age groupings used for the BIRD analysis were at a higher level of aggregation than the majority of administrative sources used. Therefore, the final step in ALRI admission processing was to aggregate rate-space estimates to the BIRD analysis age groups, by summing both the numerator and denominator so that the rates of ALRI utilization are binned appropriately to match the rest of the analysis.
While many of the data sources used in this analysis are also used in creating annual GBD estimates, there were some differences in data processing methods between the two projects that led to different estimates of rates of ALRI. GBD analysis adjusts inpatient data to ICD codes used to identify ALRI primary admissions. Note that all more detailed codes below those listed were also included account for readmissions, potential missingness of secondary inpatient diagnoses, unavailable outpatient data, and healthcare access and quality for every location. It aggregates inpatient data with claims and outpatient data to produce estimates of individuals who received any care for an ALRI diagnosis. Because this study was primarily focused on inpatient diagnoses of influenza or RSV, these additional corrections were not applied.

Influenza and RSV proportion estimation
Influenza and RSV admission rates were estimated by modeling the proportion of admissions for ALRI that were attributable to each cause respectively, and then estimating the proportion of total ALRI hospitalizations represented by these diseases, stratified by age, year, and country. The meta-analysis for this model included 156 independent studies on influenza-associated hospitalization rates covering 46 countries with data between 1979 and 2015 for influenza , and 204 studies on RSV admission rates covering 56 countries with data between 1982 and 2017 [4,19,73,107,133,146,. Sample size of the study, age range, and location in study cohort, total admissions for ALRI, and admissions for influenza and RSV respectively were extracted from each study. The proportion of ALRI admissions due to influenza and RSV were calculated for each location, age, and year present in the input study data.
A Bayesian regularized trimmed meta-regression (MR-BRT) model was generated using ALRI admission metaanalysis data to produce estimates of the proportion of ALRI admissions due to each cause while accounting for within-study heterogeneity by age and location as well as error and bias between sources. Within the MR-BRT framework, the trend over age was modeled as a cubic spline with linear tails on the youngest and oldest age groups and an uninformative Gaussian prior. Linear tails on the age ends were used to smooth behavior of the age pattern at the poles in cases of sparse data, which can be highly unstable in MR-BRT modeling.
Location was used as a covariate at the IHME Global Burden of Disease's super-region and regional levels, to account for potential geographic variation while informing estimations for locations with sparse data by the trend of those with a larger input evidence base. Region was used as a proxy for country-level heterogeneity in order to produce estimates where meta-analysis data was available and admissions data was not or vice versa. IHME's regional categorization by country is available in related literature. Both region and super-region were modeled as a fixed effect with an uninformative Gaussian prior on each. The hierarchical structure of the super-regional and regional models results in child models that follow the same age trend as those of the parents.
The equation for the influenza and RSV MR-BRT models is shown in Eq. 1 below. Detail on the assumptions made by the mixed effects framework, the use of cubic splines on fixed effects, and estimation of the posterior using maximum likelihood estimation are available in related literature [357]. The MR-BRT framework is an R wrapper for the open source mixed effects LimeTr package, which could be used to replicate the modeling methods described here [358].
Where p (flu |RSV), i, j is the proportion of ALRI admissions that are positive for flu or RSV in observation i for study j, age i, j is computed using a spline based matrix for age midpoint, region i, j and super region i, j are the fixed effects on GBD region and super region, Z i is a linear map, u i, j are the random effects from meta-analysis study j at observation i, and ϵ ij are measurement errors with a specified covariance.
A hierarchical method was chosen a priori for this analysis as it allowed us to produce estimates for locations with little or no meta-analysis data while still accounting for location-specific randomness in metaanalysis estimates. In the final results of this analysis, location-level estimates maintain age heterogeneity based on the differences of age patterns for ALRI admission rates by each location.
Bootstrapping was performed by taking 1000 samples on the posterior of the MR-BRT model, and uncertainty from the samples was propagated through the remainder of the estimation process as 95% credible intervals.

Final admission rate estimation
Admission counts and rates for influenza and RSV were calculated by multiplying the proportions from the influenza and RSV mixed effects attribution models to annual ALRI admission count estimates by age group and location. Seasonality was excluded from the scope of this analysis because seasonal information was not consistently available in influenza and RSV meta-analysis literature. Each location with clinical data received the attribution model fit for the corresponding GBD region, unless no input data for the model existed, in which case an average of the models within the GBD super-region was used. Uncertainty was quantified using the upper and lower uncertainty interval from the fit of the mixed effects model. Due to meta-analysis data sparsity in older ages for the RSV attribution mixed effects model, admission rates and counts for RSV were only calculated for children under five.
Influenza and RSV-coded primary admissions were extracted from a subset of clinical administrative datasets as illustrative scenarios in order to compare results of the BIRD analysis to direct ICD extraction with no adjustments. ICD codes used for this comparison can be found in Additional file 2. All locations used to illustrate the comparison contained at least 4-digit ICD detail, which was required to identify primary admissions for RSV.
To assess the limitation of using primary diagnosis alone for ALRI admissions, we extracted non-primary diagnosis detail from the HCUP NIS data which was used to produce US estimates [3]. Diagnosis levels available in HCUP NIS vary by state, but all available diagnosis detail up to the 30 th inpatient diagnosis was included for this analysis. We compared primary and nonprimary utilization for the year 2012 from this dataset, and applied influenza-attributable proportion estimates to the complete dataset in order to generate a comparison of influenza rates that include non-primary hospitalizations. We focused specifically on influenza for this sub analysis because of the substantial ALRI utilization as non-primary diagnosis in older ages, as there may be competing complications that would end up coded as primary discharge diagnosis in this population [359][360][361][362].

Results
Figures 2 and 3 represent the number of sources of meta-analysis data for the proportion of ALRI admissions attributable to influenza and RSV, respectively. Meta-analysis sources varied in their age ranges and granularity, sample size, and the time range over which studies were conducted. All meta-analysis sources were used to inform the meta-regression analyses as described above.
Metadata about each of IHME's inpatient data sources is available in Additional file 1. Only the inpatient sources that were ICD-9 or ICD-10 coded were used in this analysis. While all sources listed had sufficient ICD detail to extract ALRI utilization rates, not all locations with inpatient admission data have at least 4-digit ICD coding which is required to identify RSV cases by ICD diagnosis alone (see Additional file 2 for the list of 4-digit RSV codes). Figure 4 shows the proportion of ALRI admissions attributable to influenza and RSV at the super-regional level. Due to limited meta-data availability in older ages for RSV as seen in the figure, admission rates for RSV were only estimated for the under 1 and 1 to 4 year age groups. Data for selected regions are tabulated in Table 2 below.
In these results, influenza represents a significant proportion of ALRI admissions in individuals aged 15 to 55 years, and a lower proportion in the oldest and youngest age groups. Conversely, RSV represents over 30% of all ALRI admissions for infants under 1 year and over 18% for infants aged 1-4, but the proportion of ALRI admissions attributable to RSV drops dramatically in age groups beyond the age of 5 years.
Comparisons of admission rates calculated through the BIRD analysis versus those coded directly with influenza and RSV ICD codes for locations with sufficient ICD granularity are shown in Figs. 5 and 6, and tabulated in Tables 3 and 4. For almost all age groups, the methods as described in this paper estimated a higher national admission rate than the rate of directly coded influenza or RSV admissions in the same inpatient sources. Many inpatient data sources used at IHME are coded only to three or four digits, in which case it is less accurate or even not possible to estimate RSV admission rates. Detail on inpatient clinical sources and ICD granularity is listed in Additional file 1, and the ICD codes used to determine influenza and RSV inpatient admissions are listed in Additional file 2. The full dataset Fig. 2 Map of influenza meta-analysis source data. Influenza meta-analysis data availability by country of BIRD estimates of influenza and RSV admissions by age, year, and country are available in Additional file 3.
As non-primary diagnoses were not available for the majority of sources of inpatient admission data, only primary diagnosis was used to expand the number of useable sources and retain consistency across locations. We conducted a sensitivity analysis comparing the average primary and non-primary admission rates for ALRI in the USA from 2002 to 2012 to illustrate the potential impact of limiting the analysis to ALRI as primary diagnosis only.
Influenza admission rates in the USA by primaryonly diagnosis and primary and non-primary diagnosis are shown in Fig. 7. The impact of non-primary diagnoses was a 1.4-fold increase in rates estimates for children < 1 year, and nearly a 2.5-fold increase in rates estimated in the 18-49, 50 to 64, and 65 plus age groups.

Discussion
While influenza and RSV-associated healthcare utilization is acknowledged as a global problem, gaps in quantifying the magnitude of this problem exist due to lack in representative data availability across locations that makes assessing admission rates within or across countries challenging. Traditional methods of burden  year and 5.31 (4.46-6.59) per 1000 in children age 1-4 years old in England [364]. Estimates from the BIRD analysis as shown in Table 4 are lower in high-income settings for children under 1 year of age than either study, but fall between estimates of older children as described in the literature. Further discussion and comparisons of the results of the BIRD analysis for RSV to other RSV estimation methods are available in related literature [365]. Our estimated admission rates for influenza are generally an underestimate of rates previously published, particularly in the 65+ age group [366,367]. For the USA and Sweden at age 65+, the simple extracted ICD-coded admission rate from administrative datasets surpasses the rate produced by this study. The inclusion of nonprimary diagnoses did increase estimates for influenza in the USA by more than 50%. Nonetheless, these rates are still lower than those produced by comparable studies in the oldest age group. Previous studies estimate that anywhere between 39.5 and 96.6% of all admissions across all ages for influenza have a primary diagnosis related to influenza, and the relative proportion of burden as a primary diagnosis in this analysis fall within that range [359][360][361][362]. While using only the primary diagnosis allowed us to maintain consistency with the 33 sources containing only primary diagnostic detail, future iterations of this method should consider inclusion of non-primary diagnoses for more comprehensive utilization estimation, if at the expense of geographic coverage.
Estimates of the proportion of influenza-positive adults age 65+ were also generally lower than existing literature. Jain et al. estimate that 4% adults aged 65-79 years and 5% adults 80 or older hospitalized for pneumonia in select US cities test positive for influenza [32]. Monto et al. report that 10.9% of adults aged 50 or older presenting with acute respiratory illness are influenza positive, in a study of families in Ann Arbor Michigan over 3 years [69]. Our analysis estimates 1.9% (0.02-8.4) of ALRI admissions in ages 65+ in IHME high-income settings are influenza positive cases. While the upper bound of this estimate more closely aligns with existing published literature, the proportion positive estimated from the BIRD project is low because of data sparsity in oldest ages. The age spline method used in the MR-BRT analysis depends on age midpoint of meta-analysis input data instead of accounting for an age range, which narrows the number of estimates representing older ages. Inclusion of additional meta-analysis data and incorporation of more sophisticated age range splitting could produce more robust proportion estimates in older ages.
The methodology employed by this analysis is comparable to previous burden estimates for influenza produced by IHME in the application of a proportion model to estimates of total lower respiratory infection [368]. However, estimates from the BIRD project were formed using a categorical approach that did not account for the relative risk of ALRI in cases of confirmed influenza or RSV. Instead, the proportion of ALRI hospitalizations was assumed to be a proxy of total utilization. Additionally, the BIRD analysis focuses exclusively on inpatient hospital utilization instead of incidence or mortality, which reduced the assumptions made about how trends in utilization can be extended to other metrics. Finally, the hierarchical method of modeling proportion positive  by region and super-region was a novel approach used in burden analysis to allow for estimates in locations with sparser meta-analysis data to have more robust proportion estimates over age. IHME's GBD global influenza admission rate estimates were higher than most of those predicted for countries included in BIRD analysis, at 123 This study met limitations that are consistent with any analysis developed from clinical administrative data.
Availability of inpatient admissions data in some lowerto middle-income countries and meta-analysis data for RSV in older children and adults limited the scope of this analysis, and additional sources of both types of data would improve accuracy of estimates. Availability of inpatient data and proportion meta-analysis at a seasonal or monthly granularity would allow for more relevant analysis during peak influenza and RSV seasons. Additionally, we encountered technical limitations in handling of meta-analysis with point estimates for proportion positive spanning large age ranges, and in the assumption made that influenza and RSV proportions across  BIRD estimates of rates of RSV admission as compared to the rate from a raw ICD code extraction. BIRD and raw coded rate are produced across all years of available data for each country countries will follow the same pattern over age. Finally, the rates estimated in this analysis represent utilization rates of influenza and RSV present in individuals who have a primary admission diagnosis of acute lower respiratory infection. Accounting for non-inpatient care including urgent or emergency departments and adjustments for non-primary diagnosis when ALRI is not the primary reason for visit would further improve the estimates produced by this analysis. In addition to addressing the limitations described, future iterations of this methodology could be expanded to estimates of incidence or prevalence from utilization by accounting for health care access and care-seeking behavior. Furthermore, deeper investigation of goodness-of-fit of the proportion models through out of sample estimation would provide additional validation for the methods proposed here and potentially identify additional areas for refinement of the proportion models.

Conclusions
Because of heterogeneity in coding practices between countries and limited availability of data at sufficient granularity for precise burden estimation, there are few reliable sources of influenza and RSV hospital utilization or incidence that are provided on a global scale. The application of meta-analysis for proportion positive to overall ALRI utilization is a non-traditional means of estimation that indicate promise in other applications where direct measurement of ICD diagnoses cannot provide accurate estimates of rates of disease and where surveillance data are not available. However, the method shows much uncertainty when considering influenza in older adults that could be a function of considerable heterogeneity in ALRI coding between countries (i.e., as primary vs secondary cause), and in the age profile of proportion positivity for influenza and RSV across studies. While this method is interesting because it is based on clinical administrative data that is available from many countries globally, additional refinement of admission processing methodology and inclusion of more data over ages would enable greater comparability to existing influenza and RSV utilization literature.
Additional file 1. IHME Inpatient Data Metadata. Description of data: Detailed information including number of years of data, length of ICD codes, and total number of inpatient admissions for each source of clinical administrative data used in this study. All data is in the custody of the Institute of Health Metrics and Evaluation, and is available in the Global Health Data Exchange (ghdx.healthdata.org).
Additional file 2. Influenza and RSV ICD Codes. Description of data: The ICD-9 and ICD-10 codes used to identify influenza and RSV admissions Uncertainty is capped in order to show estimated age pattern from raw ICD extraction, to compare against the utilization rates produced by the BIRD study.

Additional file 3. Influenza and RSV Inpatient Admission Rates for All
Country-Years of Clinical Administrative Data. Description of data: Tabulated inpatient admission rates with uncertainty for all ages and years available for each country included in the BIRD analysis. Countries where clinical administrative data from IHME was available are all included in this dataset.

Acknowledgements
Thanks to Wil Van Cleve, Greg Roth and Zachary Jones who contributed to the production of this manuscript.
Authors' contributions EJ contributed to data extraction, led data analysis and interpretation, and drafted the manuscript. DS contributed to data extraction and analysis and provided critical review of the manuscript. SC designed the study, contributed to data interpretation, and provided critical review of the manuscript. CC designed the study, contributed to data interpretation, and provided critical review of the manuscript. YL contributed to data extraction, analysis, and interpretation and provided critical review of the manuscript. CM designed the study, contributed to data interpretation, and provided critical review of the manuscript. HN designed the study, contributed to data interpretation, and provided critical review of the manuscript. JP designed the study, contributed to data analysis, and interpretation and provided critical review of the manuscript. TP contributed to data analysis and interpretation and provided critical review of the manuscript. TS contributed to data extraction and provided critical review of the manuscript. CV designed the study, contributed to data interpretation and provided critical review of the manuscript. SJ designed the study, contributed to data interpretation, and provided critical review of the manuscript. All authors have read, provided comments, and approved this manuscript.

Funding
The BIRD project was supported by a grant from the Foundation for Influenza Epidemiology (www.ghisn.org). The funding source for this project was not involved in study design, data extraction and analysis, interpretation of results or drafting of the manuscript. Sandra S Chaves and Cedric Mahe contributed to the, interpretation of results and writing up of the manuscript and report as members of the Foundation.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Competing interests
Sandra S Chaves and Cedric Mahe are employees of Sanofi Pasteur, but the content of this paper is not representative of the views of their organization. Spencer L James is an employee of Genentech, a subsidiary of Roche, but was an employee of IHME during his involvement in the grant. Cecile Viboud is an employee of the NIH, but this study does not necessarily represent the views of the NIH or the US government. All other authors declare no competing interests.