Predicting colorectal cancer risk in a community-based cohort

From the Hsu Group, Public Health Sciences Division at Fred Hutch, and Kaiser Permanente Washington Health Research Institute

It is difficult to find someone who hasn’t been touched by cancer - either personally, or through family, friends, or acquaintances - demonstrating the global impact of this disease. Mortality rates are high for some cancers like colorectal cancer (CRC), igniting efforts to implement screening practices to ‘catch’ the disease at an earlier and often more treatable stage. For CRC, increasing incidence among younger people has led to a change in screening guidelines, with an additional 22 million younger adults (45 to 49 years) now recommended for screening. To better stratify those most at risk and to aid proposed screening strategies, other approaches are being investigated. Polygenic risk scores (PRS), a tool that utilizes genetic variants and markers of disease to estimate risk, are one such approach for identifying high risk groups. Enhanced PRS combine genetic variants of disease with additional clinical data providing a more robust model for estimating disease risk. Dr. Yu-Ru Su (Assistant Biostatistics Investigator at Kaiser Permanente Washington Health Research Institute), Dr. Ulrike Peters (a Professor in Fred Hutch’s Public Health Sciences Division), Dr. Li Hsu (a Professor in Fred Hutch’s Public Health Sciences Division) and colleagues recently undertook a study to validate an enhanced PRS for CRC in an external cohort. In discussion with the authors about their study, recently published in Cancer Epidemiology, Biomarkers and Prevention, they explained that “due to the lack of external cohorts with sufficient colorectal cancer cases and available genetic information, real-world evidence on the validity of PRS-enhanced CRC risk models is needed before including PRS into CRC risk stratification at a population scale. We developed a PRS-enhanced risk prediction model for colorectal cancer that includes demographics and clinical information, and an updated PRS comprising 140 known colorectal cancer genetic loci, and externally validated this model in a large sociodemographically diverse, community-based cohort, which is rare in the era of large GWAS consortia. This study provides comprehensive assessment on both discriminatory ability and absolute risk calibration, which is clinically important but has not been emphasized in the literature. Our findings provide real-world evaluation on the utility of PRS in risk prediction for colorectal cancer, which will inform the development of risk-stratified screening strategies in clinical practice and may aid health policymaking.”

The authors first developed their enhanced PRS in a subset of the Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO), comprising over 20,000 individuals. While many studies validate their model internally, the authors explored the limitations of this approach, and instead opted for validation in an external cohort. “Evaluating a risk prediction model in the same cohort used for model development or a highly similar cohort results in overly optimistic assessments on model’s predictive performance due to overfitting, such as overestimating the ability to distinguish between individuals who have colorectal cancer and who do not. Using an external cohort which may represent a different pattern in demographics, clinical, and genetic characteristics can inform about the real predictive performance of the developed risk model; hence help us understand whether the proposed risk model can robustly provide a good prediction to colorectal cancer risk,” they explained. They utilized an external community cohort for validation (>85,000 individuals), “the GERA cohort [that] reflects regional demographics characteristics and the pattern of social determinant of health embedded in Kaiser Permanente Northern California. Compared to studies included in GECCO/CORECT in which the model was developed, GERA has a slightly higher proportion of female participants, lower proportion of participants with a first-degree colorectal cancer family history, and a much higher rate of receiving a prior endoscopy. The results from our external validation analysis provide evidence on how our PRS-enhanced model can robustly provide fair prediction in a large external cohort reflecting a clinically practical setting,” noted the authors. 

An enhanced polygenic risk score can accurately predict colorectal cancer risk in people of European ancestry.
Comparison of discriminatory accuracy between two colorectal cancer risk models. The authors presented the area under the ROC curve of the PRS-enhanced risk model (dark blue) and a model with age and first-degree family history for colorectal cancer (light blue) in two age groups, the young age group (40-49 years) and individuals of age 45-74 years who are eligible for colorectal screening. Image provided by Dr. Su.

Their enhanced PRS model performed very well in the validation cohort which estimated CRC cases accurately in a 5-year prediction window. Ten-year predictions performed slightly less favorably, with some over and underestimation in certain study subgroups. The authors provide reasoning for this, for example for the overestimation of cases in the 50–59-year age group, this is likely due in part to a spike in CRC incidence for this age-range. It is also important to note that both analyses were primarily performed on samples of European ancestry, as the cohort was limited by low non-European ancestry cases. Next the authors undertook a discriminatory analysis, whereby they compared their enhanced PRS model to two other models that did not contain PRS data. By including PRS (in their enhanced model), the authors observed improved accuracy in identifying CRC cases over controls, and notably, this improved accuracy was also observed in a younger age group (40-49 years) who did not have a history of endoscopy. Further, their model performed with a higher sensitivity than both comparison models that did not include PRS. “Our finding motivates further investigation to demonstrate the clinical value of PRS in preventing colorectal cancer. In particular, we showed a significant improvement in the discriminatory accuracy by the inclusion of the PRS among younger participants (whom of age 40-50 years). Despite the highly significant and substantial improvement in AUC [area under the curve], the small sample size and limited number of CRC cases for this age subgroup in our study yield greater uncertainty, which makes it difficult to generalize to the broader population. Our group is pursuing a study to emphasize on this age group to provide more direct empirical evidence to support the use of PRS in personalized initiation time of colorectal cancer screening,” discussed the authors.  

Next steps for the authors are focused on “leading the development of the GECCO consortium, which is the largest colorectal cancer genetic epidemiological study in the world, including over 70,000 CRC cases and 100,000 controls.” The authors go on to describe the importance of ensuring their model and future work is representative of the general population and not just those of European ancestry. “The limited sample sizes and numbers of CRC cases in non-European racial and ethnic groups in the validation cohort hindered a comprehensive assessment of the proposed PRS-enhanced model. A future direction is developing and investigating the predictive performance in a more racial and ethnic diverse cohort with larger number of participants of non-European ancestry,” they concluded.  


This work was primarily funded by support from the National Institutes of Health.

Fred Hutch/University of Washington/Seattle Children's Cancer Consortium members Yingye Zheng, Ulrike Peters, and Li Hsu contributed to this work.

Su YR, Sakoda LC, Jeon J, Thomas M, Lin Y, Schneider JL, Udaltsova N, Lee JK, Lansdorp-Vogelaar I, Peterse EFP, Zauber AG, Zheng J, Zheng Y, Hauser E, Baron JA, Barry EL, Bishop DT, Brenner H, Buchanan DD, Burnett-Hartman A, Campbell PT, Casey G, Castellví-Bel S, Chan AT, Chang-Claude J, Figueiredo JC, Gallinger SJ, Giles GG, Gruber SB, Gsur A, Gunter MJ, Hampe J, Hampel H, Harrison TA, Hoffmeister M, Hua X, Huyghe JR, Jenkins MA, Keku TO, Marchand LL, Li L, Lindblom A, Moreno V, Newcomb PA, Pharoah PDP, Platz EA, Potter JD, Qu C, Rennert G, Schoen RE, Slattery ML, Song M, van Duijnhoven FJB, Van Guelpen B, Vodicka P, Wolk A, Woods MO, Wu AH, Hayes RB, Peters U, Corley DA, Hsu L. Validation of a Genetic-Enhanced Risk Prediction Model for Colorectal Cancer in a Large Community-Based Cohort. Cancer Epidemiol Biomarkers Prev. 2023 Mar 6;32(3):353-362. doi: 10.1158/1055-9965.EPI-22-0817. PMID: 36622766; PMCID: PMC9992158.