A longitudinal analysis of the risk factors for diabetes and coronary heart disease in the Framingham Offspring Study

Background The recent trends in sedentary life-styles and weight gain are likely to contribute to chronic conditions such as hypertension, diabetes, and cardiovascular diseases. The temporal sequence and pathways underlying these conditions can be modeled using the knowledge from the biomedical and social sciences. Methods The Framingham Offspring Study in the U.S. collected information on 5124 subjects at baseline, and 8, 12, 16, and 20 years after the baseline. Dynamic random effects models were estimated for the subjects' weight, LDL and HDL cholesterol, and blood pressure using 4 time observations. Logistic and probit models were estimated for the probability of diabetes and coronary heart disease (CHD) events. Results The subjects' age, physical activity, alcohol consumption, and cigarettes smoked were important predictors of the risk factors. Moreover, weight and height were found to differentially affect the probabilities of diabetes and CHD events; body weight was positively associated with the risk of diabetes while taller individuals had lower risk of CHD events. Conclusion The results showed the importance of joint modeling of body weight, LDL and HDL cholesterol, and blood pressure that are risk factors for diabetes and CHD events. Lower body weight and LDL concentrations and higher HDL levels achieved via physical exercise are likely to reduce diabetes and CHD events.


Background
The prevalence of chronic conditions such as hypertension, non-insulin dependent diabetes mellitus, and coronary heart disease (CHD) in developed countries demand substantial medical resources [1]. Moreover, these conditions are becoming increasing common among the welloff groups in middle and low-income countries [2]. While the quality of life and health of individuals is adversely affected by chronic conditions, there is a concomitant loss in work productivity [3,4]. This is especially important for middle and low-income countries where skilled labor is relatively scarce and the treatment of chronic conditions may entail a reduction in health care resources available for diseases afflicting the poor [5]. A preventive approach to chronic diseases is therefore appealing.
Population surveys covering a large number of individuals such as NHANES in the U.S. [6] can provide useful insights into risk factors for chronic diseases. However, the gradual evolution of the multiple risk factors and the onset of chronic diseases cannot be addressed using crosssectional data. By contrast, longitudinal studies such as the Framingham Offspring study (FOS) [7] spanning over decades, can provide insights into the evolution of the risk factors and their partial and/or joint effects on chronic conditions such as diabetes and CHD.
There is a growing interest among policy makers in identifying preventive strategies for tackling chronic conditions. For example, the Women's Health Initiative is an ongoing longitudinal study in the U.S. covering 48,000 women for investigating the risk factors of breast cancer [8]. Detailed analyses of existing longitudinal data sets can provide useful insights into the effects of factors such as smoking, physical activity, and alcohol consumption on the incidence of diabetes and CHD. Because these conditions develop gradually over time, it is important to analyze their effects on intermediate risk factors such as blood pressure, and LDL and HDL cholesterol. Application of longitudinal econometric models incorporating the inter-dependence between the multiple risk factors can provide further insights.
The purpose of this paper is to analyze 4 time observations available on over 5,000 subjects in the FOS and develop empirical models for the subjects' weight, HDL and LDL cholesterol, and systolic and diastolic blood pressure that are potential risk factors for diabetes and CHD. Models were also developed for the probability of diabetes and CHD events in the 20-year period. A comprehensive analysis of the multiple risk factors in the FOS using alternative statistical techniques can facilitate an assessment of the likely effects of behavioral modifications for reducing the incidence of diabetes and CHD.

Study sample
From 1971, a cohort of 5124 men and women who were the children or spouses of the subjects in the original Framingham Heart study were recruited for the FOS [7,9]. The subjects were examined at the baseline (Exam 1) and in Exams 2, 3 and 4 that took place, respectively, 8, 12 and 16 years after Exam 1. The diabetes, CHD, and survival status was again assessed 20 years after Exam 1.

Variables measured longitudinally in Exams 1-4 in the Framingham Offspring public-use files
The subjects' age, sex, alcohol intake, and the number of cigarettes smoked per day were investigated in Exams 1-4. Systolic and diastolic blood pressure was measured in the left arm after the subjects had been sitting still for at least 5 minutes. Height was measured in inches in Exams 1 and 3 and weight was measured in pounds in all 4 exams using a standard beam balance. The surveys investigated physical activity patterns in Exams 2 and 4. An index of physical activity was constructed on the basis of the reported hours per day of sedentary, slight, moderate and heavy activities.
An index of alcohol consumption was constructed using the reported intakes of beer, wine, and other alcoholic beverages.
In each exam, blood was drawn after a 12-hour fast for determination of plasma glucose; non-insulin dependent diabetes mellitus was defined as glucose greater than 140 mg per deciliter of blood or if the subjects were taking prescription medication. HDL cholesterol was measured after precipitation of the plasma with heparin-manganese. LDL cholesterol was determined according the techniques described in Lipid Research Clinic Program [10]. A CHD event was defined as the occurrence of angina pectoris, myocardial infarction, coronary insufficiency, or coronary death.

The analytical framework
The risk factors for diabetes and CHD events such as weight, HDL and LDL, and blood pressure respond gradually over time to dietary intakes, life-style, smoking, etc. Of these, changes in body weight resulting from energy imbalance are apparent to the subjects themselves. While the nutrient composition of the diet was not measured in the FOS, one would expect body weight to be a predictor of LDL because fat intakes have increased in the observation period [2]. Moreover, body size is a predictor of energy needs. Thus, one would expect height and weight to be predictors of nutrient intakes, LDL, and other risk factors. Because height is a good approximation for skeletal size, height is an important predictor of weight [11][12][13]; height also reflects nutrition in the early years that is influenced by socioeconomic factors.
In models for the risk factors, it may be inappropriate to combine height and weight as the BMI [14]. From the standpoint of CHD events, it may be more risky for shorter individuals to gain weight than for taller individuals because coronary artery diameter is likely to be higher in taller subjects [15]. By contrast, the risk of diabetes may be less dependent on height; persisting energy imbalances may lead to similar outcomes in terms of the development of insulin resistance. It is therefore desirable to include height and weight in the empirical models and test the null hypothesis that these variables can be combined as the BMI [13].
Because specification of empirical models is limited by data availability, it was necessary to make simplifying assumptions in analyzing the risk factors. The empirical model for the data from the FOS is outlined in the Figure  1. The subjects' age, sex, and height were fixed characteristics. However, due to a paucity of socioeconomic variables such as education and income, the box containing the background characteristics also included the number of cigarettes smoked, alcohol intake, and physical activity.
Because weight changes can influence risk factors such as blood pressure and LDL cholesterol, weight of the n subjects was first explained by a dynamic formulation allowing the current weight to depend on previous measurement (i = 1,2,..., n; t = 2,3,4): ln (WT) it = a 0 + a 1 (Sex) i + a 2 ln (Age) i + a 3 [ln (Age)] 2 i + a 4 ln (Height) i (1) + a 5 (Alcohol index) it + a 6 (Cigarettes) it + a 7 (Physical activity) it + a 8 ln (WT) it-1 + u 1it Here, ln represents natural logarithm. Subjects' weight, age, and height were transformed into natural logarithms, partly to reduce heteroscedasticity [16]. The coefficients of the explanatory variables in logarithms were thus the "elasticities" (percentage change in the dependent variable resulting from a 1% change in the independent variables). Because the model in equation (1) was dynamic, the long run impact of an explanatory variable on weight was the estimated coefficient in equation (1) divided by (1-a 8 ). Indicator variables for the observations from Exams 3 and 4 were included in the model to allow different time means in all 4 examinations. The u 1it in equation (1) were random error terms that could be decomposed in a random effects fashion as: where, δ i were subject specific random effects that were assumed to be normally distributed with zero mean and a constant variance, and v 1it were normally distributed error terms with zero mean and constant variance [17].
The models for HDL and LDL cholesterol and systolic and diastolic blood pressure were also dynamic and, in addition to the explanatory variables in equation (1), contained weight as an explanatory variable. For example, the dynamic random effects model for HDL could be written as (i = 1,2,..., n; t = 2,3,4): The error term u 2it in equation (3) also had a random effects structure as in equation (2). Moreover, the random effects affecting u 2it could be correlated with those influencing weight. Such problems can be addressed using the econometric techniques for "endogeneity" briefly outlined in the next section.
The binary logistic or probit models for whether the subjects' developed diabetes or had a CHD event in the 20year period, explained by the characteristics in Exam 1, were, respectively: (Diabetes) i = c 0 + c 1 (Sex) i + c 2 (Age) i + c 3 (Height) i + c 4 (Alcohol index) i1 (4) + c 5 (Cigarettes) i1 + c 6 (Physical activity) i1 + c 7 (HDL) i1 + c 8 (LDL) i1 Here, the variables Diabetes or CHD were latent variables assuming values 0 or 1 depending on if the subject reported to have had diabetes or a CHD event, respectively. Explanatory variables in equations (4) and (5) were the measurements taken at Exam 1. The analyses were also done separately for men and women and the null hypothesis of constancy of the model parameters in the two groups was tested using likelihood ratio tests. The estimation methods assumed the errors u 3i and u 4i to be drawings from a logistic distribution or from a normal distribution for the probit models. For probit analysis using current levels of explanatory variable in the 4 time periods, random effects models were estimated under the assumptions that u 3i and u 4i were normally distributed with a random effects structure as in equation (2). Finally, Cox proportional hazard models were estimated for the age at first CHD event.

Statistical methods
Because only 4 time observations were available, estimation of the dynamic models given in equation (1) was based on the assumptions that the number of subjects (n) was large but the number of time periods was fixed. Thus, initial observations on the dependent variables (e.g. WT i1 in equation (1)) were treated as "endogenous" variables (correlated with the errors, [18]). The errors on equations (1) were assumed independent across subjects but correlated over time with a positive definite variance-covariance matrix. The random effects decomposition in equation (2) was a special case of this formulation.
The joint determination of the 4 observations in the dynamic models for weight (and HDL, LDL, and systolic and diastolic blood pressure) implied that econometric techniques for estimating simultaneous equations models were useful. Details of the maximum likelihood estima-tion method are presented in [18]. The profile log-likelihood functions were optimized using a numerical scheme E04 JBF from [19]; asymptotic standard errors of the parameters were obtained by approximating second derivatives of the function at the maximum. Possible correlation between the random effects δ i and the mean over time of the subject's body weight was tested using a likelihood ratio statistic that was distributed, for large n, as a Chisquare variable with 4 degrees of freedom. The restrictions for combining height and weight as the BMI in equation (3), for example, were: These were tested by a likelihood ratio test that was distributed for large n as a Chi-square variable with 1 degree of freedom.
For the descriptive statistics, the package SPSS [20] was used. Binary logistic models for diabetes and CHD events were estimated using SPSS; Cox proportional hazard models were also estimated using SPSS. Probit models were estimated using the packages LIMDEP [21] and STA-TA [22].

Descriptive statistics
The sample means and standard deviations of selected variables from the 4 exams in the FOS are presented in Table 1. The subjects' average age was 36 years and 52% were women. There was an increase in the body weight over time. This was also true for systolic blood pressure and for LDL cholesterol. Table 1 also reports the mean number of cigarettes smoked per day calculated for smokers; there was a decline over time in the cigarettes smoked. For the subjects that were diagnosed with diabetes, the mean age at the time of the diagnosis was 52 years; the first CHD event was diagnosed at the average age of 54 years. Table 2 presents the maximum likelihood estimates of dynamic random effects models for weight, and HDL and LDL cholesterol; the results for diastolic and systolic blood pressure are presented in Table 3. In all cases, the dependent variables were transformed into natural logarithms; the independent variables age, height, and weight were also transformed into logarithms.

Body weight
The results for body weight in the second column of Table  2 showed that men were significantly heavier than women. Both age and age-squared were significant predictors of weight thereby showing an increase in weight with age, though at a declining rate. From the estimated parameters, the turning point of weight with respect to age occurred at approximately 38 years. However, this estimate was subject to considerable estimation error and may have also been influenced by attrition in the sample.
The coefficient of physical activity score was negative but was not statistically significant. This could be because physical activity was measured only in Exams 2 and 4. The alcohol index was positively associated with weight, whereas cigarettes smoked were negatively associated; both these coefficients were statistically significant at the 5% level. Height was a significant predictor of weight, though the coefficient estimate 0.796 was significantly lower than the value 2; the data did not indicate a preference for modeling the BMI. Coefficient of the lagged dependent variable was estimated to be approximately 0.5 and was significant. Thus, the long run effects of an independent variable on weight were twice the magnitude of the corresponding short run coefficients in Table 2 (i.e. the coefficients divided by 1-the coefficient of the lagged dependent variable). Coefficients of the indicator variables for Exams 3 and 4 were positive and statistically significant.

HDL cholesterol
The results for HDL cholesterol are in the third column of Table 2. Men had significant lower concentrations of HDL than women. There was an increase in HDL with age though at a declining rate. The physical activity score and alcohol index were significant predictors of HDL with positive coefficients; the number of cigarettes smoked was negatively associated with HDL. The coefficient of height was positive and that of weight was negative in the model for HDL. However, the likelihood ratio test for combining height and weight as the BMI rejected the restrictions in  (6). This may have been partly due to the relatively large samples used in the estimation. The likelihood ratio test for exogeneity of the mean over time of body weight in the model for HDL also rejected the null hypothesis. Thus, the factors affecting body weight appeared to be correlated with the unobserved random effects affecting HDL. The results in Table 2 took account of these correlations in the estimation.

LDL cholesterol
The results for LDL cholesterol reported in the last column of Table 2 were based on the observations in Exams 1, 2 and 3; only the indicator variable for Exam 3 was included in this model. Sex differences in LDL were not statistically significant. The relationship between age and LDL was again a quadratic one though the turning point occurred at age 6.8 years indicating that, for the age range in the sample, there was a steady increase in LDL with time. Both the physical activity score and the index of alcohol intake were insignificant predictors of LDL. However, the number of cigarettes smoked was positively associated with LDL. Height was negatively associated whereas weight was positively associated with LDL. Although the likelihood ratio test again rejected the combination of height and weight as the BMI, one can broadly interpret the results as implying that subjects with higher BMI had lower HDL and higher LDL concentrations. Coefficient of the lagged dependent variable was statistically significant and was smaller than in the model for HDL presumably due to greater changes in LDL in response to the dietary factors [23]. Table 3 presents the maximum likelihood estimates of the dynamic random effects models for the systolic and diastolic blood pressure. There were no significant sex differences in the results from the two models. The quadratic terms in age were significant in both models though with opposite signs. The coefficients of age variables implied that systolic blood pressure declined until the age of approximately 18 years and thereafter increased. By contrast, diastolic blood pressure increased until the age 51 years and then showed a decline. These non-linear estimates were indicative of the complex time profiles of blood pressure [24].

Results from binary logistic and probit regressions for diabetes in the 20-year period
The results from estimating binary logistic and probit models for whether the subjects had diabetes during the 20-year period are in Table 4. To circumvent the problems due to censoring, the risk factors included were the measurements taken at Exam 1. Table 4 also reports the confidence intervals for the odds ratio using the parameter estimates from the logistic regression, and "marginal effects" estimated from the probit model. Chi-square statistics for combining height and weight as BMI are also reported. The results for the model for diabetes using logistic and probit models were consistent across the respective models. The coefficients of sex, physical activity score, alcohol index, systolic blood pressure, and LDL concentrations were not statistically different from zero. However, cigarette smoked, diastolic blood pressure, and weight were significantly positively associated with the probability of diabetes, whereas HDL and height were negatively associated. The likelihood ratio statistic accepted the null hypothesis that height and weight could be combined as the BMI. Thus, for example, a unit increase in the BMI at Exam 1 increased the chances of getting diabetes by between 8%-15% in the 20-year period.

Results from binary logistic and probit regressions for a CHD event in the 20-year period
The results from the binary logistic and probit regressions for CHD events during the 20-year period are presented in Table 5. Women had between 43%-75% lower chances of CHD events. Coefficients of physical activity, alcohol index, and systolic blood pressure were not statistically different from zero in Table 5. However, LDL and HDL concentrations, cigarettes smoked, and diastolic blood pressure were significant predictors of CHD events.
Height and weight were not significant predictors of CHD events, though the P-value of the coefficient of height was 0.053. When height and weight were combined as BMI, the likelihood ratio test rejected the restrictions implied by the BMI transformation in the logistic regression, and the coefficient of BMI was not significantly different from zero. When weight was dropped from the model for CHD, height was significantly negatively associated with the probability of CHD in both the logistic and the probit models. By contrast, when height was dropped from the model, weight was not a significant predictor of CHD. These results indicated that diameter of coronary arteries was likely to be influenced by height and hence taller subjects had lower chances of CHD events. By contrast, the significance of BMI in the model for diabetes showed that, irrespective of height, being over-weight increased the chances of diabetes.
The coefficient of the variable for diabetes was positive and was a statistically significant predictor of CHD events; subjects with diabetes in Exam 1 had between 70%-534% higher chances of a CHD event. Lastly, the random effects probit models were estimated for diabetes and CHD events using current values of the explanatory variables in the 4 exams. Including the random effects, however, often led to boundary solutions using the algorithms in the software packages [21,22]. These results could be due to the serial correlation in the errors affecting longitudinal probit models. Moreover, Cox proportional hazard models were estimated for the age at which the subjects first had the CHD event using the explanatory variables measured at Exam 1. The predictive power of such models was poor in comparison with the results for the binary logistic and probit models indicating the uncertainties in predicting subjects' ages at the time of the first CHD event.

Discussion
This paper analyzed the effects of risk factors such as smoking, weight, HDL, LDL, and blood pressure for the development of chronic condition diabetes and CHD events using data from the FOS. Because of the gradual evolution of the risk factors, dynamic random effects models were used to explain the risk factors by age, physical activity, alcohol consumption, and cigarettes smoked. An advantage of the present approach was that one can discuss pathways through which the multiple risk factors affected the diabetes and CHD outcomes [25]; alternative approaches are available in the statistical literature [26].
While the inter-relationships between behavioral and biological factors are complex, the present analysis enables the estimation of the combined effects of the risk factors on CHD events under certain assumptions. The model represented by equations (4) and (5) was "triangular" and in the calculations reported below, we ignored the endogeneity of weight and potentially small bias in the estimate of the coefficient of diabetes in the model for CHD. The total effect of an explanatory variable on CHD events was thus the coefficient reported in Table 5, plus the coefficient of diabetes (1.195 in the logistic model) multiplied by the respective coefficient of the explanatory variable in the model for diabetes (Table 4).
First, after controlling for sex, age, physical activity, smoking, blood pressure, and LDL and HDL cholesterol, alco-hol intake was not significantly associated with the probability of diabetes and CHD events in Tables 4 and 5, respectively. This is in contrast with the beneficial effects of alcohol intake on CHD among diabetic nurses in the U.S. [27], and male British doctors [28]. An important aspect in the analyzing the effects of alcohol intake is the type of drinks consumed and if they were consumed with meals [29]. Because alcohol intake data in the FOS cannot make such distinctions, it was perhaps reasonable to expect that the analysis would not provide unambiguous evidence on this issue. Further, in the dynamic random effects models, alcohol intake was positively associated with body weight, HDL, and diastolic and systolic blood pressure. Of these 4 variables, only HDL predicted lower chances of CHD events. Thus, the analysis of the multiple risk factors in the FOS indicated an overall tendency of the beneficial and harmful effects of alcohol intake to cancel out.
Second, LDL was seen to be an important risk factor for CHD events. The average LDL concentration at the first examination was approximately 125 mg/ dL. A decrease of 35 mg in LDL to 90 mg, for example, would constitute a 28% decline. Using the estimated parameters from the logistic regression (Table 5), the effect of this decrease would be to lower the chances of CHD events by between 14%-39%. Because LDL was not significant in the model for diabetes, there were no additional effects of lowering LDL. By contrast, an increase of 15 mg/dL in HDL concen- tration would constitute an approximate 30% increase and predict a decline in CHD events by 15%-45%. Because HDL was negatively associated with diabetes, this decrease in HDL would further lower chances of CHD events by 3%-5%. These results also show the importance of disaggregating serum cholesterol into the HDL and LDL categories for the analysis of risk factors for cardiovascular diseases [30].
Third, the effects of smoking 14 cigarettes per day at the first examination would imply a likely increase in CHD events between 11%-32%; this effect would increase to between 12%-33% by taking into account the effects of smoking on diabetes. Using the estimates from Table 5, halving the average number of cigarettes smoked would reduce CHD events between 6%-16%. Lastly, the effects of body weight on CHD events in this population appeared to operate through the decline in diabetes. Using the results in Table 4, a 12% decrease in average BMI in Exam 1 to 22 was likely to reduce the number of subjects with diabetes from approximately 350 to 110. Because diabetic subjects had approximately a 3-fold greater chance of CHD events, the 12% reduction in BMI was likely to lead to a 10% decline in CHD events. Overall, the results from FOS indicated that the importance of reducing weight, LDL cholesterol and blood pressure [31] and increasing HDL for reducing the prevalence of diabetes and CHD events in the U.S. The econometric modeling of the risk factors indicated that it is better to rely on joint rather than the partial effects of risk factors in part because the time sequence of chronic diseases such as diabetes and CHD is known. The importance of modeling multiple risk factors for various diseases has also been emphasized in recent studies [32].