Multiple biomarker models for improved risk estimation of specific cardiovascular diseases related to metabolic syndrome: a cross-sectional study.

BACKGROUND
Metabolic syndrome (MetS) is the co-occurrence of several conditions that increase risk of chronic disease and mortality. Multivariate models for calculating risk of MetS-related diseases based on combinations of biomarkers are promising for future risk estimation if based on large population samples. Given biomarkers' nonspecificity and commonality in predicting diseases, we hypothesized that unique combinations of the same clinical diagnostic criteria can be used in different multivariate models to develop more accurate individual and cumulative risk estimates for specific MetS-related diseases.


METHODS
We utilized adult biomarker and cardiovascular disease (CVD) data from the National Health and Nutrition Examination Survey as part of a cross-sectional analysis. Serum C-reactive protein (CRP), glycohemoglobin, triglycerides, high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, total cholesterol, fasting glucose, and apolipoprotein-B were modeled. CVDs included congestive heart failure, coronary heart disease, angina, myocardial infarction, and stroke. Decile analysis for disease prevalence in each biomarker group and multivariate logistic regression for estimation of odds ratios were employed to measure the joint association between multiple biomarkers and CVD diagnoses.


RESULTS
Of the biomarkers considered, glycohemoglobin, triglycerides, and CRP were consistently associated with the CVD outcomes of interest in decile analysis and were selected for the final models. Associations were overestimated when using single-marker models in comparison with full models; individual odds ratios decreased an average of 16.4% from the single-biomarker models to the joint association models for CRP, 6.6% for triglycerides, and 1.4% for glycohemoglobin. However, joint associations were stronger than any single-marker estimate. Additionally, reduced models produced unique combinations of biomarkers for specific CVD outcomes.


CONCLUSION
The reduced joint association modeling results suggest that unique combinations of biomarkers with their related measure of association can be used to produce more accurate cumulative risk estimates for each CVD. Additionally, our results indicate that the use of multiple biomarkers in a single multivariate model may provide increased accuracy of individual biomarker association estimates by controlling for statistical artifacts and spurious relationships due to co-biomarker confounding.

Clinical criteria are important risk factors for CVDs associated with MetS, but they are nonspecific. Ridker et al. used data from the Women's Health Study to examine potential differences in risk factors between women with and without nonspecific CVD [6]. They observed significant elevations among women with CVD for several biomarkers, including body mass index (BMI), CRP, total cholesterol (TC), low-density lipoprotein cholesterol (LDL-C), and apolipoprotein-B (apoB). Ridker and Anand et al. also reviewed the literature for use of biomarkers to predict CVD and observed no evidence of an optimal biomarker [7]. However, multivariate models calculating risk of disease based on combinations of biomarkers have provided reasonable estimates and offer promising options for future risk estimation if based on large populations [8]. Recent large-scale studies where risk of CVD was estimated from combinations of biomarkers have included the Framingham Heart Study [9], the Systematic Coronary Risk Evaluation system by the European Society of Cardiology [10], and the Prospective Cardiovascular Münster Study [11]. Although biomarker models are not designed to predict causality of disease, such multibiomarker associations are useful for estimating risk.
Given the nonspecificity and overlapping of biomarkers in predicting disease, we hypothesize that unique combinations of biomarkers can be used in different multivariate models to develop more accurate risk estimates for specific MetS-related diseases by reducing statistical artifacts and spurious relationships due to co-biomarker confounding. Hence, the objective of this study is to test for unique models in which multiple MetS biomarkers are employed jointly to assess more accurately their association with specific CVDs for different age groups, obtain a cumulative perspective of their impact on CVD, and provide a basis for future development of predictive models. Such predictive models may be used in the future both for estimates of population risk and to approximate an individual's risk based on her/his combination of biomarkers. Decile analysis and regression are used to test these relationships for the nationally representative National Health and Nutrition Examination Survey (NHANES) sample dataset. These analyses are presented for the purpose of estimating risk rather than to make a specific assessment of causality.

Study design
NHANES is comprised of interview (demographics, socioeconomic status, dietary habits, and medical history), examination (dental, medical, and physiological evaluation), and laboratory segments [12][13][14][15][16][17]. NHANES uses a complex, multistage, unequal probability of selection cluster design to provide a nationally representative sample of the non-institutionalized US civilian population. We conducted a cross-sectional study using data from six independent two-year cycles of publicly available NHANES data spanning 1999-2010. The NHANES protocol has been approved by the National Center for Health Statistics Institutional Review Board, and written informed consent was obtained from all participants. NHANES methodology and sample design have been previously detailed elsewhere [18,19].

Study population
The study population consisted of adults ≥ 20 years who completed the laboratory component of the survey and answered questions about their history of CVD and related health outcomes. Within the target age group, n = 32,458 participants answered questions regarding their cardiovascular health. The subjects' health data included SBP and variables describing if they were ever told by a health care professional that they had congestive heart failure (CHF), coronary heart disease (CHD), stroke, myocardial infarction (MI), or angina. Demographic data (poverty-income ratio, sex, age, race/ethnicity), smoking history, and BMI were also employed in the analysis.

Biomarkers
The biomarkers tested included CRP, glycohemoglobin, plasma fasting glucose, ApoB, TC, LDL-C, HDL-C, and triglycerides. These biomarkers were included because they were obtained from blood samples during laboratory testing of NHANES participants. Fasting glucose, ApoB, LDL-C, and triglyceride serum levels were only measured in a subsample consisting of one-third of all persons 12 years and older in each NHANES cycle. The subsamples were nationally representative, and appropriate sample weights were applied to account for oversampling. Blood samples were drawn, stored, and analyzed according to specific protocols [20][21][22][23][24][25].

Statistical analyses
Summary statistics were tabulated to describe the general population characteristics, biomarkers, and health outcomes. Appropriate stratum values, sampling units, and survey weights were applied so that nationwide inference can be drawn [26]. Likewise, sample weights were applied in all regression models.
Multivariate logistic regression was used to estimate odds ratios (ORs) and joint association ORs to measure the association between multiple biomarkers and CVDs [27]. For the purpose of this study, joint associations measure the odds of prevalent CVD respective to the simultaneous concentrations of each measured biomarker. Whereas an interaction would measure how an individual association between a CVD outcome and a biomarker concentration varies given the change in concentration of another biomarker, a joint association measures the aggregated impact on odds of disease for the selected markers by controlling for potential confounding and co-explanation of disease between the co-related variables. Biomarkers used in the final models were determined by examining significant changes in CVD prevalence across deciles of biomarker serum concentrations. Next, we removed highly correlated biomarkers. Although joint associations are robust to correlated variables [28], we were also interested in the individual associations of single biomarkers within the joint biomarker model, which can be impacted by correlation. We used log-transformation to account for highly skewed biomarkers in the models.
After the final biomarkers were determined, adjusted single biomarker regression models were calculated for each cardiovascular health outcome to serve as a level of comparison for the joint effect models. Base joint effect models for each CVD had one term for each log-transformed biomarker as an independent variable. By including multiple biomarkers, we simultaneously controlled for potential confounding and overlapping associations between biomarkers. We then built unique models for each CVD by removing nonsignificant biomarkers from the base model using iterative backwards elimination [29]. R 2 values are often used to assess model fit and predictive power in linear regression but cannot be calculated directly for logistic regression models. Statistical software calculated pseudo-R 2 values for logistic regression, but we decline to present them because they are not a measure of explained variability and could potentially mislead the reader [30,31]. In these joint association models, the log-transformed biomarker terms serve simultaneously as independent variables making up a portion of the overall joint association and control variables when examining the individual impact of a specific biomarker within the model. Because the biomarkers serve as control variables, we compared the reduced model single-biomarker OR estimates to the full model single-marker estimates to assess meaningful changes due to the removal of biomarker terms from the models. In addition to base and reduced models for the total study population, we stratified the base models by age to examine potential differences.
Joint association ORs were calculated by taking the exponentiated sum of the product of each log-transformed biomarker regression coefficient and that respective logtransformed biomarker's interquartile range (IQR). Using log-transformed IQR increments provided standardization across biomarkers, making it easier to interpret the joint association OR. Standard errors used for calculating the joint association model confidence intervals (CIs) were determined using the covariance matrices for each individual biomarker estimate, as described in Winquist et al. and in Additional file 1 [27].
All analyses were conducted using SAS v9.3 (SAS Institute Inc., Cary, NC). SAS sampling and survey analysis procedures were used to implement NHANES stratum values, sampling units, and survey weights to account for unequal selection probability and the intentional oversampling of demographic groups as a part of the NHANES complex, multistage cluster design [26]. Survey weights were recalculated for the 10-year period before they were applied.

Study population
The study population consisted of n = 32,458 subjects meeting the inclusion criteria. Sample size for the final regression models ranged from n = 2,119 to n = 28,348, depending on age-stratification and inclusion of certain biomarkers that were measured in a subsample of the study population. There were more females n = 16,936 than males n = 15,522, while whites were the most highly represented racial/ethnic group at just over 70% ( Table 1). All study population statistics reflect application of survey weights.

Serum biomarker concentrations
There were significant sex-related differences in surveyweighted means of each biomarker ( Table 2). Geometric means were calculated for biomarkers with skewed distributions, resulting in a measure of central tendency that is less influenced by extreme outliers. Men had significantly higher geometric mean glycohemoglobin, fasting glucose, ApoB, and triglycerides levels than women, as well as higher mean LDL-C. Meanwhile, women had higher TC and HDL-C mean levels and CRP geometric mean levels than men. There were significant differences in all biomarkers among age, race/ethnicity, and BMI quartile groups, while disparities across smoking status and income-to-poverty quartile were present for most biomarkers.

Serum biomarker deciles and CVD prevalence
We calculated survey-weighted deciles of biomarker concentrations to examine unadjusted associations between biomarkers and CVD outcomes. Participants within the top deciles of CRP, glycohemoglobin, fasting glucose, and triglycerides had significantly higher prevalence of each measured outcome than subjects in the bottom decile (Table 4) and thus were considered for our joint association models. Among the four remaining biomarkers, there were significant negative associations of decile-level TC, LDL-C, and HDL-C with CVD. With the exception of a negative association with CHD, associations with ApoB were not significant.

Joint association models
CRP, glycohemoglobin, and triglycerides were considered for the final models. Although we found negative associations of TC and LDL-C with CVD, it is widely reported that the opposite relationship exists, and therefore we did not include any of the cholesterol variables in our final models [32]. Additionally, we found that participants with self-reported high cholesterol had a much higher prevalence of MI, further indicating that the negative associations observed are an artifact of the study design and not indicative of the true associations (Additional file 1: Table  S1a and b and accompanying note). We eliminated fasting glucose from consideration because it was highly correlated with glycohemoglobin (r = 0.83, Additional file 1: Table S2). Furthermore, the associations between glycohemoglobin and CVD observed in the decile analysis were more robust than those involving fasting glucose (Additional file 1: Table  S3). As mentioned previously, despite the robust nature of joint association models, the individual measures of association that comprise the joint estimate can still be impacted by correlation. This resulted in a three-biomarker base     n (case/total) % a 95% CI a n (case/total) % a 95% CI a n (case/total) % a 95% CI a n (case/total) % a 95% CI a n (case/total) % a 95% CI a    model for each outcome, from which backwards elimination yielded outcome-specific reduced models. Single-biomarker models of CRP, triglycerides, and glycohemoglobin were strongly associated with each CVD, though the associations between triglycerides and stroke or MI were not significant ( Table 5). The base models' joint association estimates combining all three biomarkers (Table 6) were stronger than any estimate from the single-biomarker models. Joint associations from the base models ranged from 25.1% increased odds for CHD (OR = 1.25; 95% CI: 0.92, 1.71) to 152.5% for CHF (OR = 2.53; 95% CI: 1.86, 3.44). These estimates are smaller than the exponentiated sum of the three estimates from the single-biomarker models, consistent with the work of Winquist et al., because the singlebiomarker models do not control for covariate confounding [27]. Controlling for other co-predictors in the joint association models resulted in lower individual OR estimates than in the single-biomarker models. While all but two individual ORs were significant in the single-biomarker models, seven biomarker ORs were not significant in the joint association models. The largest decrements in individual estimates occurred for CRP, with ORs decreasing an average of 16.4% from the single-biomarker models to the joint association models, compared to 6.6% for triglycerides and 1.4% for glycohemoglobin. While CRP was significantly associated with each of the CVD outcomes in the singlebiomarker models, it was only significantly associated with CHF (OR = 1.86; 95% CI: 1.44, 2.43) and stroke (OR = 1.36; 95% CI: 1.07, 1.72) after controlling for triglycerides and glycohemoglobin in the joint association models. Triglycerides were strongly associated with angina in the joint association model, with an IQR increase in log-triglycerides being associated with a 23.6% increase in odds of angina (OR = 1.24 CI: 1.02, 1.50). Glycohemoglobin was significantly associated with every CVD outcome, with the odds of disease increasing between 8.5% and 16.9% for every IQR increase in logglycohemoglobin, depending on the specific disease.
After removing nonsignificant biomarkers from each model, we arrived at a unique reduced model for each outcome. CHF and stroke joint association models included CRP and glycohemoglobin; the angina joint association model included triglycerides and glycohemoglobin; and MI and CHD models included only glycohemoglobin ( Table 7). The ORs for the fully adjusted reduced joint association models were all significant, indicating that a joint increase in serum concentrations of the selected log-transformed biomarkers was associated with an increase in the odds of the corresponding cardiovascular outcome. Specifically, joint IQR increases in log-CRP and log-glycohemoglobin were associated with increased odds of CHF (OR = 2.03; 95% CI: 1.70, 2.42) and stroke (OR = 1.58; 95% CI: 1.34-1.87), and a joint increase in log-triglycerides and logglycohemoglobin was associated with increased odds of angina (OR = 1.36; 95% CI: 1.13, 1.65). Log-transformed glycohemoglobin was the only biomarker that was significant in every outcome-specific reduced model. Neither triglycerides nor CRP were significant for MI or CHD, such that the reduced models were single-biomarker glycohemoglobin models. For those single-biomarker models, the OR for an IQR increase in log-glycohemoglobin was 1.16 (95% CI: 1.11, 1.20) for MI and 1.19 (95% CI: 1.14, 1.23) for CHD. In addition to the base and reduced models, we used age-stratified models to examine age-related differences in biomarker-CVD associations. The lowest age groups (20-34 and 35-44) had such a limited number of cardiovascular events that most estimates were imprecise and unreliable (Additional file 1: Tables S3a and S4e). The associations in the older age groups (45-60 and 60+) did not deviate noticeably from the nonstratified models.

Discussion
Our study explored associations between serum biomarkers and CVD. Using a joint association approach to logistic regression modeling, we saw a variety of associations of individual and joint biomarkers with CVD. Consistent with previous studies, we found that single-biomarker models of CRP, triglycerides, and glycohemoglobin were significantly associated with CVD [33][34][35][36][37]. However, we found that the magnitude and significance of these associations decreased after controlling for covariate confounding of the selected biomarkers in the joint association models. This demonstrates the likely presence of confounding among the biomarkers; the potential exists for overestimation of the  association between individual biomarkers and CVD when failing to adjust for other co-varying biomarkers.
Biomarker associations with CVD were overestimated when using single-biomarker models in comparison with the full models. While single-marker CRP models showed strong associations with angina, MI, and CHD, those results were no longer significant when controlling for triglycerides and glycohemoglobin. While a number of studies have shown a strong association between CRP and CVD, there is evidence that the relationship may lessen in relation to diabetes status [33]. A 2002 long-term followup case control study by Sakkinen et al. found that the predictive effect of CRP for MI was diminished in men with diabetes [35]. Sakkinen et al. hypothesized that this attenuation was likely due to an overlap in information between CRP concentrations and diabetes diagnosis, which is supported by multiple studies that found significant correlations among CRP, diabetes, and other features of MetS [34,38,39]. These concerns have also been raised regarding evidence supporting an association between triglycerides and CHD. In a review of the literature, Sarwar et al. concluded that associations between triglycerides and CHD remain uncertain due to potential codependence of other risk factors, such as other lipids [40]. Our findings substantiated this uncertainty, as a significant association between triglycerides and CHD in the single-biomarker model was no longer significant when controlling for CRP and glycohemoglobin. These examples demonstrate the advantage of a multivariate approach. While these biomarkers on their own can be important predictors of CVD, by controlling for confounding between the biomarkers, it may be possible to achieve a more accurate evaluation of how biomarkers affect CVD risk on an individual basis.
Glycohemoglobin was the only biomarker in our analyses that was significantly associated with MI or CHD, and thus it was the only biomarker to be significantly associated with every CVD outcome in the reduced joint association models. This is consistent with existing evidence of an association between diabetes mellitus and CVD. A review of epidemiologic studies shows both cross-sectional associations and prospective temporal relationships between diabetes and CVD incidence and mortality [41]. Our study also found that of all variations of CVD examined, glycohemoglobin had the strongest association with CHD, which has previously been established as the most common CVD outcome in adults with diabetes [41,42].
Although the individual biomarkers are associated with the CVD tested here, the reduced joint association modeling results suggest that unique combinations of biomarkers with their related measures of association for each model can be used to produce a unique risk estimate for each CVD. For example, CRP and glycohemoglobin were jointly associated with CHF and stroke, whereas triglycerides and glycohemoglobin were jointly associated with angina. Hence, where biomarkers have served as general indicators of CVD risk, joint models can be utilized to indicate risk for specific CVDs. Moreover, the reduced joint association models indicated large increases in CVD odds for joint increases in biomarker concentrations that were larger than the OR estimate from any single marker within that model but still lower than if the overall association were estimated from single-biomarker models. As in the individual biomarker estimates, controlling for co-predictor confounding prevents an overestimation of the joint biomarker association [27].
Our conclusions are limited by several factors of our study design. As with all cross-sectional studies, we are unable to examine temporality between the biomarkers and outcomes. Thus, incidence of disease cannot be assessed here. Using a proxy measure creates the potential for subjects with recently increased or decreased biomarker concentrations to have measured levels that do not match their historical exposure. This could be particularly relevant to patients who have reported CVD but are currently taking medications or undergoing other health interventions, including improved diet and exercise, that may have lowered their biomarker levels. This potential differential exposure misclassification may have led to observed odds ratios that underestimate the true magnitude of the associations [43]. Observed ORs may also be underestimated due to survival bias, whereby high biomarker levels may be predictive of CVD mortality [44][45][46][47]. We speculate that the negative associations seen between CVD prevalence and TC and LDL-C levels are at least partially the result of survival bias and/or the use cholesterol medication post-CVD diagnosis. Another potential limitation of the lack of temporality is the possibility that CRP levels were elevated by post-event inflammation in participants who reported CVD [48], resulting in CRP associations biased away from the null.
The age-stratified analysis was hindered by a low number of CVD cases in the younger age groups, resulting in imprecise and unreliable ORs, making it difficult to infer potential relationships among age, biomarker levels, and CVD. Sex-related differences were not explored for this project, because division by sex would have further reduced the numbers of cases. Additionally, CVD status was self-reported and thus was not verified. In general, self-reporting of disease status and exposures, including medication, diet, and exercise, add uncertainty to the analysis. Although we do not expect the accuracy of selfreporting to vary across biomarker levels, it is possible that self-reporting accuracy varied on some factor simultaneously affecting biomarker levels, which may have introduced unknown bias into our measures of association. However, the analysis presented here does not presume to judge causality between self-reported disease status and the combinations of biomarkers tested here. This analysis reports associations for the purpose of estimating risk.
Despite these limitations, our study had a number of strengths. To our knowledge, this is the first large-scale analysis using a joint association approach to assessing the relationship between multiple biomarker concentrations and CVD. NHANES provides a nationally representative sample of the US, such that our results can be generalized to the US adult civilian population. The large sample size of NHANES lends confidence to the assessment that helps to offset the uncertainties listed above. Given the consistency of NHANES sample design and data collection methodology, our results can provide a basis for comparison when analyzing relationships between CVD and biomarkers among future cohorts.

Conclusions
Our work has built upon evidence from a multitude of previous studies that have demonstrated associations between triglycerides, CRP, and glycohemglobin and variations of CVD. Specifically, this study highlights the need to consider a joint effects approach to determining both individual biomarker associations as well as the impact of simultaneous increases in multiple biomarker concentrations. This approach may lead to more accurate individual biomarker risk estimation through co-predictor confounding control, a cumulative perspective of the impact of related biomarkers on CVD, and the potential to observe unique combinations of biomarkers that may be predictive of variations of CVD. Future longitudinal studies on the joint effect of multiple biomarkers on CVD are needed to assess temporal relationships and determine whether these models can be developed to predict future onset of CVD. Additionally, expanding joint association models to include interaction terms could provide detail as to how the joint associations vary given specific combinations of biomarkers (i.e., high CRP vs. low glycohemoglobin).

Additional file
Additional file 1: Multiple biomarker models for improved risk estimation of specific cardiovascular diseases related to metabolic syndrome: a cross-sectional study.

Competing interests
We have read and understood BMC policy on declaration of interests and declare that we have no competing interests. One of the authors is a US federal employee. Therefore, the corresponding author grants on behalf of himself and his co-author a non-exclusive worldwide license to the Publishers and its licensees in perpetuity, in all forms, formats and media (whether known now or created in the future), to i) publish, reproduce, distribute, display and store the Contribution, ii) translate the Contribution into other languages, create adaptations, reprints, include within collections and create summaries, extracts and/or, abstracts of the Contribution, iii) create any other derivative work(s) based on the Contribution, iv) to exploit all subsidiary rights in the Contribution, v) the inclusion of electronic links from the Contribution to third party material where-ever it may be located; and, vi) license any third party to do any or all of the above. Secondary data from the National Health and Nutrition Examination Survey were used, and details for study participant consent can be found at: http://www.cdc.gov/ nchs/data/series/sr_01/sr01_056.pdf.
Authors' contributions EC designed and ran the statistical analysis and took the lead on drafting the manuscript. JRB conceived of the study, participated in the design of the study, and helped draft the manuscript. Both authors read and approved the final manuscript.
Author details