Intra-tumor heterogeneity inference: old methods get SMASHed

From the Sun Group, Division of Public Health Sciences

Cellular DNA aberrations, including somatic point mutations (SPM) and somatic copy number alterations (SCNA, changes in gene copy number), can lead to uncontrolled cellular proliferation and tumor growth. Intra-tumor heterogeneity (ITH) refers to the feature that tumor cells of one bulk tumor sample may form several subclones with similar DNA aberrations within subclones but different DNA aberrations across subclones. Understanding the relationships between these tumor characteristics (SPM, SCNA, ITH) and clinical outcomes may lead to better informed treatment decisions and patient survival. 

Tumor mutation burden (TMB, burden of somatic point mutations) can be readily estimated by counting the number of point mutations in a tumor sample. A bioinformatic-based method, allele-specific copy number analysis of tumors, can be used to reliably estimate SCNA burden. In contrast, estimating ITH is considerably more complex. While several methods have been developed to infer ITH, many have inherent limitations that impair their ability to be used in large association-based clinical studies. Dr. Wei Sun, in the Division of Public Health Sciences, and collaborators at the University of North Carolina Chapel Hill set out to develop a new method that would address these limitations. In a recently published paper in the journal Genome Medicine, the authors describe their new method and its application to determine the association between ITH and survival time of fourteen different types of cancer.

The authors first focused their efforts on developing the new ITH inference method. To ensure the method could be used reliably in large-scale association studies, they aimed to develop it such that it would not require more than one tumor sample per patient, it would account for SCNA burden, and it could include a measure of uncertainty. The authors named the new statistical method subclone multiplicity allocation and somatic heterogeneity (‘SMASH’). Dr. Sun elaborated, “This work is one of our efforts to develop computational methods to analyze -omic data as a compositional data, which represent the aggregation of signals from many individual cells. In particular, our method, named SMASH, is designed for association analysis using somatic mutation data while appropriately modeling intra-tumor heterogeneity (ITH), i.e., the heterogeneity of somatic mutations across tumor cells.”

Graphical representation of association between ITH and overall survival or progression-free survival
Association between ITH and overall survival (upper panel) or progression-free survival (lower panel) time by comparing the final model to the reduced model obtained by excluding all ITH-related variables. The horizontal line indicates the p value cutoff 0.05. H(W), H(P), H(S) denote the indicator for three or more subclones from PhyloWGS, PyClone, and SMASH, respectively. E(W) and E(S) denote entropy from PhyloWGS and SMASH, respectively. Cancer types across the x-axis are bladder urothelial carcinoma (BLCA), breast invasive carcinoma (BRCA), colon adenocarcinoma (COAD), glioblastoma multiforme (GBM), head/neck squamous cell carcinoma (HNSC), kidney renal clear cell carcinoma (KIRC), lower-grade glioma (LGG), liver hepatocellular carcinoma(LIHC), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), ovarian serous cystadenocarcinoma (OV), prostate adenocarcinoma (PRAD), skin cutaneous melanoma (SKCM), and stomach adenocarcinoma (STAD). Image from the publication, available through the Creative Commons license

The authors then compared the new tool against other ITH inference methods when assessing various features of tumor subclones. They found that SMASH performed better overall at inferring the number of subclones when compared to the other methods and required the least amount of computational time to conduct the ITH inferences. SMASH also identified more subclones by tumor type when compared to one of the other methods while it performed similarly well to a different method. The next step was to determine the association between ITH and overall and progression-free survival time (see Figure). ITH was significantly associated with overall survival for six of the tested cancers (those that extend beyond the dashed line in the figure), while TMB was significantly associated with seven types, and SCNA burden was significantly associated with only two types of cancer. The results for association with progression-free survival were generally similar to those for overall survival (see Figure).

Dr. Sun emphasized the potential clinical impact of their association-based findings, “One of the most interesting findings of our work is that the interaction of tumor mutation burden (the total number of somatic mutations) and ITH are associated with survival time in Colon adenocarcinoma and Lung squamous cell carcinoma. This is very relevant with checkpoint immunotherapy because mutation burden is often an indicator for patient response and our results suggest that a more complete picture should include both mutation burden and ITH.” The authors are developing research plans for their next steps, “We will continue working on this direction to develop a better predictor of patient response to checkpoint inhibitor by combining mutation burden and ITH information," said Dr. Sun.


This work was supported by the National Institutes of Health.

Fred Hutch/UW Cancer Consortium member Dr. Wei Sun contributed to this research.

Little P, Lin D-Y, Sun W. 2019.Associating somatic mutations to clinical outcomes: a pan-cancer study of survival time. Genome Medicine.doi:10.1186/s13073-019-0643-9.