Skip to main content

OBAYA (obesity and adverse health outcomes in young adults): feasibility of a population-based multiethnic cohort study using electronic medical records



Although obesity is a risk factor for many chronic diseases, we have only limited knowledge of the magnitude of these associations in young adults. A multiethnic cohort of young adults was established to close current knowledge gaps; cohort demographics, cohort retention, and the potential influence of migration bias were investigated.


For this population-based cross-sectional study, demographics, and measured weight and height were extracted from electronic medical records of 1,929,470 patients aged 20 to 39 years enrolled in two integrated health plans in California from 2007 to 2009.


The cohort included about 84.4% of Kaiser Permanente California members in this age group who had a medical encounter during the study period and represented about 18.2% of the underlying population in the same age group in California. The age distribution of the cohort was relatively comparable to the underlying population in California Census 2010 population, but the proportion of women and ethnic/racial minorities was slightly higher. The three-year retention rate was 68.4%.


These data suggest the feasibility of our study for medium-term follow-up based on sufficient membership retention rates. While nationwide 6% of young adults are extremely obese, we know little to adequately quantify the health burden attributable to obesity, especially extreme obesity, in this age group. This cohort of young adults provides a unique opportunity to investigate associations of obesity-related factors and risk of cancer in a large multiethnic population.

Peer Review reports


For the first time in two centuries, life expectancy may decline due to the rapidly increasing prevalence of obesity[1]. In 2007 to 2008, 4.2% of young men and 7.6% of young women 20 to 39 years of age were severely obese, defined as having a body mass index (BMI) at or above 40 kg/m2[2]. Although obesity is a risk factor for many chronic diseases, including diabetes and diseases of the kidney and liver, we have only limited knowledge of the magnitude of these associations in young adults.

Managed care systems are a unique system to study associations between rare outcomes in young adults due to their large populations and the potential for long passive follow-up periods. However, lack of generalizability due to healthy worker bias (i.e., insured versus uninsured individuals) and the potential loss of subjects in epidemiologic studies using members of managed care systems are of concern because these factors can be a major of source of bias. Study subjects may lose their health insurance coverage or migrate out of the coverage area but also may re-enroll based on their employment status or other financial decisions and life events. If subjects who leave the health plan are systematically different from those who remain in the health plan in terms of exposure and the association with health outcomes, the estimates of association between exposure and outcome may be systematically biased. The potential of bias exists in all epidemiologic studies due to low and selective responses to recruitment attempts, survey fatigue, migration of subjects, and other factors. However, the control of this bias can be addressed through careful study design and interpretation of the data. As part of the study design, the potential existence of such bias has to be acknowledged, appropriate measures to assess such bias have to be made, and potential effects of such bias for direction and magnitude have to be estimated. Therefore, these potential biases are extremely important to understand.

The long-term goal of this large prospective cohort of young adults is to investigate the relationship between weight class, metabolic syndrome, diabetes, and obesity-related cancers and their risk factors. The analyses presented here show the detailed cohort demographics, as well as cohort retention and the potential influence of migration bias.


Study design, setting, and subjects

The present project was initiated to study the consequences of obesity in young adults (OBAYA) including cancer and leverages the resources of the Cancer Research Network (CRN)[3], an ongoing collaborative project with the National Cancer Institute (NCI) that is comprised of research programs and enrollee populations at 14 geographically dispersed health care delivery systems across the US. The current cohort includes members of Kaiser Permanente Northern and Southern California (KPNC and KPSC), which are the two largest sites participating in the CRN, but will be expanded to other CRN sites. Kaiser Permanente (KP) California health plans are integrated health care systems that jointly cover about 6.5 million members. Members received their care in medical offices and hospitals owned by KP throughout the state. Members enroll through their employer or the employer of a family member, individual prepaid plans, or state or federal programs such as Medi-Cal and Medicare. The OBAYA cohort is comprised of young adults, 20 to 39 years of age, who are enrolled in one of the KP health plans in California. The primary inclusion criterion for the study is at least one medical visit with a measurement of weight and height between 2007 and 2009. However, the study is continuously updated for new members joining the cohort who either 1) recently joined the health plan, 2) surpassed the lower age eligibility limit (≥20 years of age), or 3) had at least one valid weight and height measurement when this information was previously missing. The study protocol was reviewed and approved by the Institutional Review Boards (IRB) of KPSC and KPNC.


The cohort is followed by passive follow-up through linkage with data extracted from the KP electronic health records. Information on occurrence of cancers comes from linkage of cohort members to KPNC and KPSC tumor registries, which are compliant with data requirements of the NCI Surveillance, Epidemiology, and End Research (SEER) Program and the North American Associations of Central Cancer Registries (NAACCR). Additional data come from probabilistic linkage to the National Death Index (NDI) and to the state cancer registry. The maximum follow-up on December 31, 2010 was four years, the minimum being one year.

Outcome and demographic measures

The study used information that is routinely assessed during most ambulatory and hospitalization visits. This information captures administrative datasets containing membership and benefit information including all medical encounters at Kaiser Permanente facilities, out-of-system claims, laboratory and radiology test results, and dispensed prescription pharmaceuticals. The address information is routinely geocoded to the census block level, providing the ability to link to census-based group-level socioeconomic information. Laboratory data are also available from electronic medical records. Using the cohort members’ unique medical record numbers, incident diseases can be identified from electronic records, internal disease registries such as the cancer registry, and also from the state cancer registry and state death files.

Body weight and height are routinely measured during almost every medical encounter and were extracted from the electronic health records. BMI was calculated as weight (kilograms) divided by the square of the height (meters). Based on a validation study including 15,000 patients with 45,980 medical encounters, the estimated error rate in body weight and height data was <0.4%[4].

Census information

Population counts for California were retrieved from Census 2010 data for California for individuals who were 20 to 39 years of age (n = 10,657,405, accessed from the American Fact Finder 2, on July 26th, 2011).


After exclusion of members who did not have any medical encounters between 2007 and 2009, 2,285,278 young adults were potentially eligible for participation in the cohort study (Figure1). Of these KP members, 1,929,470 had at least one valid weight and height measurement from electronic clinical records between January 1, 2007 and December 31, 2009 and were included in the study. The cohort included about 84.4% of KP California members in this age group who had a medical encounter during the study period. The cohort showed an almost equal distribution of members across the four age groups (Table1), a higher number of women than men (57.5% vs. 42.5%), and a large proportion of ethnic/racial minorities with 26.2% non-Hispanic Whites.

Figure 1
figure 1

Flow chart of the Kaiser Permanente California young adult cohort.

Table 1 Characteristics of the KP California young adult cohort

The young adult cohort represented about 18.2% of the underlying population in the same age group in California (Table2). The age distribution of the cohort was relatively comparable to the underlying population in California as evaluated by the Census 2010. However, the young adult cohort has a slightly higher proportion of women and of Hispanics, Blacks, and Asians but a lower proportion of individuals of other races than the underlying Census population.

Table 2 Population demographics compared to California Census 2010

The current three-year retention rates for young adults who entered the cohort in 2007 is 68.4%, ranging from 58.1% in the youngest adults between 20.0 and 24.9 years of age to 75.6% in young adults between 25.0 and 29.9 years of age (Table3). To investigate changes in demographic characteristics due to disenrollment, we compared participants who were enrolled into the study in 2007 by their retention status (Table4). Those participants who were lost to follow-up were more likely to be younger and male but slightly less likely to be from a racial/ethnic minority. Neighborhood education and household incomes were comparable between those retained and those lost to follow-up.

Table 3 Retention* of young adults in the health plan
Table 4 Demographic characteristics of cohort members enrolled in 2007 by three-year retention status


This cohort of young adults provides an unparalleled opportunity to investigate associations of obesity and adverse health outcomes including cancer. Its large population of young adults is relatively representative for the underlying population in California. The availability of clinically assessed height and weight data for nearly 2,000,000 California members of the KP health system provides an outstanding basis for investigating the effects of weight class on health outcomes. In addition, there is substantial racial and ethnic diversity in this cohort, with a large proportion of Hispanics – about one-third of the cohort – and Asians – about 10%. The ability to link to various clinical and administrative databases including high-quality tumor registries, prescription medications, laboratory results, and diagnoses and procedures related to outpatient encounters or hospitalizations facilitates investigation of numerous outcomes of interest.

Health consequences of obesity may vary markedly by sex, race/ethnicity, and socioeconomic status. Whether variations in obesity are a principal reason for disparities in disease occurrence in young adults is unclear. Variations in disease occurrence in young adults have also been attributed to the lack of health insurance in this particular age group, with 40% lacking health insurance[5, 6]. Another problem may be underinsurance, which includes electing coverage in plans with relatively low premiums but substantial copayments and high deductibles. A lack of health insurance or underinsurance may result in underutilization of health care services, and consequently, delayed or underdiagnosis of disease. This may bias results from epidemiological studies that do not take insurance status into account and lead to an underestimation of obesity-related risks. Current knowledge gaps of the health risks associated with obesity in young adults include: 1) lack of robust risk estimates for young adults, because most studies investigate adults of all ages without further stratification; 2) limited information on disparities and risk in subpopulations; and 3) lack of information on health outcomes with lower prevalence.

For this ongoing cohort study of young adults, the understanding of potential bias introduced by migration is crucial. The membership in KP, and therefore in the cohort, is dynamic, with individuals continuously joining and leaving the health plan. Retention rates in young adults are generally lower than in other age groups, likely due to the high rate of change in employment and family status in this age group. The current three-year retention rates for young adults who entered the cohort in 2007 is 68.4%, ranging from 58.1% in the youngest adults between 20.0 and 24.9 years of age to 75.6% in young adults between 25.0 and 29.9 years of age. These data suggest the feasibility of our study for medium-term health outcomes with an adequate length of follow-up based on sufficient membership retention rates. Future retention rates cannot be predicted accurately because of ongoing changes in health policy such as the Affordable Care Act; the interpretability of our results is limited to medium-term retention. However, we speculate that the expansion of health care to a larger population as planned in the Affordable Care Act would increase retention rates in this age group known for high rates of uninsured individuals[5, 6]. The estimated membership retention is expected to increase slightly by combining KPSC and KPNC as members relocating from one health plan to another will continue to be available for follow-up.

Individuals who were retained in the health plan and those lost to follow-up were remarkably similar with regard to neighborhood income and education. However, those participants who were lost to follow-up were more likely to be younger, male, and non-Hispanic White. They were also less likely to be of unknown race/ethnicity. Differences in race and ethnicity between these two groups can be partially attributed to the introduction of a mandatory assessment of race/ethnicity during outpatient visits for every patient in 2009. Therefore, individuals who left the health plan before 2009 were more likely to be of unknown race/ethnicity. Overall, our data do not suggest major bias by attrition regarding sociodemographic factors. However, factors not considered in this analysis may exist that could introduce bias such as systematic differences between individuals retained and lost to follow-up in health risk factors related to the outcomes of interest that need future consideration.

Although cancer in young adults is one of the leading causes of disease-related deaths in this age group, little attention has been given to risk factors for cancers in young adults 20 to 39 years of age. Obesity[7], diabetes[810], and metabolic syndrome[1113] have all been linked to an increased risk for several cancers, based mostly on cancer incidence in adulthood[9, 1315]. Although young adults in the US have a high prevalence of obesity[16], diabetes[17, 18], and metabolic syndrome[19], few epidemiologic studies have focused on these chronic conditions and subsequent cancer risk in this age group. This is particularly important given increasing evidence that a substantial proportion of cancers in young adults likely have a different underlying biology, etiology, and pathogenesis than in older individuals[20, 21]. Thus, while studies have reported associations between obesity, diabetes, metabolic syndrome, and many common cancers in older adults, such associations in young adults have yet to be established.

At cohort entry, all members by definition have had health insurance coverage, and this may be perceived as limiting the generalizability of the findings. Indeed, having health insurance is associated with potentially key covariates in disease risk, such as employment status or income. Despite this, there is a substantial variation in socioeconomic status, based on census-based estimates. In addition, uniform health coverage is a substantial advantage as it minimizes the risk of underdiagnosis or delayed diagnosis, which may in turn result in biased risk estimates. This is a potential limitation for cohort studies in which young adults are enrolled without regard to health insurance status, as 40% of young adults in the US do not carry health insurance.

Despite these outstanding strengths, there are some limitations. As a cohort based on electronic health records, the availability of data on covariates of interest will vary, as not all cohort members will take routine advantage of services such as screening for conditions of interest. Thus, there is the possibility that availability of screening data will be linked to disease status. This is mitigated to some extent by the large numbers of cohort members and the ability to define subsets of the cohort by the availability of data or frequency of encounters with the health care system. In addition, the implementation of clinical practice guidelines, such as systematic screening for cardiovascular disease risk factors for all members 20 years of age and older on the first clinical visit and every five years thereafter, will assure broad availability of relevant data without bias as to disease status. On the other hand, particularly for the occurrence of short-term events, the potential for confounding by indication or the prodromal effects of disease resulting in more frequent health encounters will need to be taken into account in the interpretation of findings.

As noted previously, there is disenrollment of individuals from the KP health plan, with overall about 68% of cohort members maintaining their KP health insurance after three years. Retention rates are lowest for the youngest adults and somewhat higher for those in their fourth decade of life. The loss to follow-up is mitigated somewhat for endpoints such as death or cancer due to the possibility of linkage to state and SEER tumor registries or state and national vital statistics registries. The attrition rate does indicate that the ongoing update of exposure or comorbid information may be limited. However, we will be able to explore differences in those who are retained in the cohort as members of KP and those who have disenrolled to determine if there may be systematic biases in important factors associated with exposures of interest, such as body size, or outcomes of interest, such as diabetes, cardiovascular disease, and cancer rates.

Nationwide 6% of young adults are extremely obese; yet we know little to adequately quantify the health burden that can be attributed to obesity, especially extreme obesity, and which population groups are most susceptible to early health consequences. This cohort of young adults and their electronic medical record data provide an unparalleled opportunity to investigate associations of obesity-related factors and risk of cancer and other diseases in a large multiethnic population. These data sources are unusually rich and support the development of nuanced longitudinal care quality indices for preventive and disease management services, such as the Prevention Indices and Disease Management Indices[22, 23]. Unlike claims-based quality standards, these indices draw on the full range of clinical and administrative data to define both the population and the delivery of the service over time.

In addition to the aforementioned limitations, the cohort currently has a relatively short follow-up that enables us to draw conclusions on short- and medium term outcomes. Analyses have to be designed carefully to account for systematic differences beyond demographic factors such as body weight and obesity-related conditions to investigate long-term health risks such as cancer risk. Linkage to other databases can help decrease the attrition in this cohort, including internal linkage between Kaiser Permanente regions (restricted to those who stay with Kaiser Permanente), linkage with state death files (restricted to death cases), and linkage with state cancer registries (restricted to those who remain in the state and limited to cancer diagnosis).

Planned future research on the young adult cohort will develop these quality measures in order to identify person-level characteristics associated with both variations in services related to cardiovascular disease, diabetes, and cancer in young obese adults and variations in the care they receive and its consequences for incident morbidity, mortality, and health care utilization. This planned research will also examine differences in the effectiveness of that care in demographically and medically defined subpopulations.



Body Mass Index


Cancer Research Network


National Cancer Institute


Kaiser Permanente Northern and Southern California


Kaiser Permanente


Institutional Review Board


National Cancer Institute


NCI Surveillance Epidemiology, and End Research


North American Associations of Central Cancer Registries


National Death Index.


  1. Olshansky SJ, Passaro DJ, Hershow RC, Layden J, Carnes BA, Brody J, Hayflick L, Butler RN, Allison DB, Ludwig DS: A potential decline in life expectancy in the United States in the 21st century. N Engl J Med 2005, 352: 1138-1145. 10.1056/NEJMsr043743

    CAS  Article  PubMed  Google Scholar 

  2. Flegal KM, Carroll MD, Ogden CL, Curtin LR: Prevalence and trends in obesity among US adults, 1999-2008. JAMA 2010, 303: 235-241. 10.1001/jama.2009.2014

    CAS  Article  PubMed  Google Scholar 

  3. Wagner EH, Greene SM, Hart G, Field TS, Fletcher S, Geiger AM, Herrinton LJ, Hornbrook MC, Johnson CC, Mouchawar J, et al.: Building a research consortium of large health systems: the Cancer Research Network. J Natl Cancer Inst Monogr 2005, 35: 3-11.

    Article  PubMed  Google Scholar 

  4. Smith N, Iyer RL, Langer-Gould AM, Getahun D, Strickland D, Jacobsen SJ, Chen W, Derose SF, Koebnick C: Health plan administrative records versus birth certificate records: quality of race and ethnicity information in children. BMC Health Serv Res 2010, 10: 316. 10.1186/1472-6963-10-316

    Article  PubMed  PubMed Central  Google Scholar 

  5. Bleyer A, Budd T, Montello M: Adolescents and young adults with cancer: the scope of the problem and criticality of clinical trials. Cancer 2006, 107: 1645-1655. 10.1002/cncr.22102

    Article  PubMed  Google Scholar 

  6. Mulye TP, Park MJ, Nelson CD, Adams SH, Irwin CE Jr, Brindis CD: Trends in adolescent and young adult health in the United States. J Adolesc Health 2009, 45: 8-24. 10.1016/j.jadohealth.2009.03.013

    Article  PubMed  Google Scholar 

  7. Calle EE, Rodriguez C, Walker-Thurmond K, Thun MJ: Overweight, obesity, and mortality from cancer in a prospectively studied cohort of U.S. adults. N Engl J Med 2003, 348: 1625-1638. 10.1056/NEJMoa021423

    Article  PubMed  Google Scholar 

  8. Inoue M, Iwasaki M, Otani T, Sasazuki S, Noda M, Tsugane S: Diabetes mellitus and the risk of cancer: results from a large-scale population-based cohort study in Japan. Arch Intern Med 2006, 166: 1871-1877. 10.1001/archinte.166.17.1871

    Article  PubMed  Google Scholar 

  9. Larsson SC, Mantzoros CS, Wolk A: Diabetes mellitus and risk of breast cancer: a meta-analysis. Int J Cancer 2007, 121: 856-862. 10.1002/ijc.22717

    CAS  Article  PubMed  Google Scholar 

  10. Vigneri P, Frasca F, Sciacca L, Pandini G, Vigneri R: Diabetes and cancer. Endocr Relat Cancer 2009, 16: 1103-1123. 10.1677/ERC-09-0087

    CAS  Article  PubMed  Google Scholar 

  11. Hsu IR, Kim SP, Kabir M, Bergman RN: Metabolic syndrome, hyperinsulinemia, and cancer. Am J Clin Nutr 2007, 86: s867-s871.

    PubMed  Google Scholar 

  12. Russo A, Autelitano M, Bisanti L: Metabolic syndrome and cancer risk. Eur J Cancer 2008, 44: 293-297. 10.1016/j.ejca.2007.11.005

    Article  PubMed  Google Scholar 

  13. Xue F, Michels KB: Diabetes, metabolic syndrome, and breast cancer: a review of the current evidence. Am J Clin Nutr 2007, 86: s823-s835.

    PubMed  Google Scholar 

  14. Barone BB, Yeh HC, Snyder CF, Peairs KS, Stein KB, Derr RL, Wolff AC, Brancati FL: Long-term all-cause mortality in cancer patients with preexisting diabetes mellitus: a systematic review and meta-analysis. JAMA 2008, 300: 2754-2764. 10.1001/jama.2008.824

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  15. Renehan AG, Tyson M, Egger M, Heller RF, Zwahlen M: Body-mass index and incidence of cancer: a systematic review and meta-analysis of prospective observational studies. Lancet 2008, 371: 569-578. 10.1016/S0140-6736(08)60269-X

    Article  PubMed  Google Scholar 

  16. Ogden CL, Carroll MD, Curtin LR, McDowell MA, Tabak CJ, Flegal KM: Prevalence of overweight and obesity in the United States, 1999-2004. JAMA 2006, 295: 1549-1555. 10.1001/jama.295.13.1549

    CAS  Article  PubMed  Google Scholar 

  17. Mainous AG 3rd, Baker R, Koopman RJ, Saxena S, Diaz VA, Everett CJ, Majeed A: Impact of the population at risk of diabetes on projections of diabetes burden in the United States: an epidemic on the way. Diabetologia 2007, 50: 934-940. 10.1007/s00125-006-0528-5

    Article  PubMed  Google Scholar 

  18. Koopman RJ, Mainous AG 3rd, Diaz VA, Geesey ME: Changes in age at diagnosis of type 2 diabetes mellitus in the United States, 1988 to 2000. Ann Fam Med 2005, 3: 60-63. 10.1370/afm.214

    Article  PubMed  PubMed Central  Google Scholar 

  19. Ervin RB: Prevalence of metabolic syndrome among adults 20 years of age and over, by sex, age, race and ethnicity, and body mass index: United States, 2003-2006. Natl Health Stat Report 2009, 13: 1-7.

    PubMed  Google Scholar 

  20. Bleyer A, Barr R: Cancer in young adults 20 to 39 years of age: overview. Semin Oncol 2009, 36: 194-206. 10.1053/j.seminoncol.2009.03.003

    Article  PubMed  Google Scholar 

  21. Bleyer A, Barr R, Hayes-Lattin B, Thomas D, Ellis C, Anderson B: The distinctive biology of cancer in adolescents and young adults. Nat Rev Cancer 2008, 8: 288-298. 10.1038/nrc2349

    CAS  Article  PubMed  Google Scholar 

  22. Vogt TM, Aickin M, Ahmed F, Schmidt M: The Prevention Index: using technology to improve quality assessment. Health Serv Res 2004, 39: 511-530. 10.1111/j.1475-6773.2004.00242.x

    Article  PubMed  PubMed Central  Google Scholar 

  23. Vogt TM, Feldstein AC, Aickin M, Hu WR, Uchida AR: Electronic medical records and prevention quality: the prevention index. Am J Prev Med 2007, 33: 291-296. 10.1016/j.amepre.2007.05.011

    Article  PubMed  Google Scholar 

Download references


The present study is supported by the National Cancer Institute through the HMO Cancer Research Network (U19 CA079689, Dr. Ed Wagner, Group Health Research Institute, PI) as a Pilot Project to Dr. Koebnick and by the Kaiser Permanente Southern California Direct Community Benefit Fund.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Corinna Koebnick.

Additional information

Competing interest

Lawrence H Kushi reports a relevant relationship as Adjunct Professor at the UC Davis Medical School, which is not his primary employment.

Authors' contributions

Design and conduct of the study: CK, LHK; Collection, management, analysis and interpretation of data: CK, LHK, NS, MPM, KH, HAC, AEW; Preparation of the manuscript: CK, MPM, NS; Critical revision of the manuscript for important intellectual content: LHK, KH, HAC, AEW. All authors read and approved the final manuscript.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Koebnick, C., Smith, N., Huang, K. et al. OBAYA (obesity and adverse health outcomes in young adults): feasibility of a population-based multiethnic cohort study using electronic medical records. Popul Health Metrics 10, 15 (2012).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Young adults
  • Obesity
  • Diabetes
  • Metabolic syndrome
  • Cancer
  • Epidemiology
  • Cohort study