That host of transcripts characteristic to a cell type or cell state is often called a “signature” by scientists and reflects the fact that biological processes arise from collaboration among many different proteins. Identifying which transcripts are present or which have become more or less abundant can help scientists better understand a biological state. A transcriptomic view can also help when researchers know that a specific biological process is changing, but don’t know the key genetic players.
“For example, say we think that there's something going on with the metabolism of this tumor, but we don't know specifically which gene is involved,” Bradley said.
In this case, transcriptomics would help guide scientists toward key genes by highlighting any transcripts of metabolism-related genes that are more or less common in tumors compared to normal tissue.
While transcriptomics can give information that’s related to epigenomics — i.e., which genes are on or off — it also offers a more nuanced understanding. Sometimes a disease state is created by switching genes on or off, but often genes are "tuned" instead: they get transcribed more or less, rather than switched on or off. Transcriptomics sheds light on this kind of genetic tuning.
RNA is also where our cells can add a little complexity. Before the human genome was mapped, researchers had expected humans would have more genes than we do, given our complexity. But transcripts often undergo processing, also called splicing, to produce different forms of a protein which can also work differently. Sometimes, mis-splicing or a lack of RNA splicing contributes to the development or progression of a disease. Bradley studies this phenomenon in cancer and related diseases. In a recent project, Bradley and his collaborators used transcriptomics to linked dysfunctional RNA splicing in a gene called BRD9 to cancer.
Transcriptomics first started building momentum when microchip arrays, which allowed scientists to look at multiple transcripts at once, were developed in the late 1990s. Today, the technique that researchers primarily use to study the transcriptome, called RNA sequencing or RNA-seq, relies on the same next-gen sequencing technology that scientists use to sequence DNA. The most recent advance in transcriptomics, like genomics, is the ability to sequence RNAs from individual cells.
There are pros and cons to sequencing RNAs either in aggregate, or cell-by-cell, Bradley said. Bulk RNA experiments don’t shed light on variation among cells within a tissue, but do provide high-quality data on virtually every kind of molecule of RNA present in a sample.
Single-cell transcriptomics, one kind of single-cell genomics, gives a clearer picture of the variation within a tissue sample, and allows researchers to make powerful statements about the cells within a sample, Bradley noted.
But this strategy provides less information about the transcripts themselves. RNA-seq technologies don’t sequence the entirety of each RNA transcript; instead, they read just enough to detect a transcript’s presence. Single-cell RNA-seq relies on even shorter reads than bulk RNA-seq, so researchers get less information for fewer of the genes turned on in each cell. This means that certain nuances — such as the often-rare splicing errors Bradley studies — are lost.
Researchers are working to overcome this by improving methods that enable them to sequence complete transcripts.
Another challenge for researchers incorporating transcriptomics into their work is the sheer amount of data that this omic, like other omics, produces. A tissue may have tens or even hundreds of thousands of different RNA transcripts, and each of them may have hundreds to thousands of copies. This is multiplied by the number of samples; in single-cell experiments, this can reach the millions.
Computational biologists are trying to go from the initial "unspeakably large amount of data to something that humans can understand and make into an actionable fact, something that we can write in one sentence,” Bradley said.
Say an RNA-seq experiment spits out 20 gigabytes of data. Sophisticated algorithms turn that data into a more-manageable matrix that lists how many copies of each gene were transcribed in each sample. But this matrix may still have 25,000 rows (for each gene turned on) and as many columns as samples in the experiment.
“That matrix is of a size that you can actually open up in a spreadsheet editor, but it's still too big for humans to really do something very meaningful with,” Bradley said. “The big bottleneck right now is the next step, going from that spreadsheet to a prediction that five or ten genes are really important for mediating some phenotype [trait] of interest.”
One principle that Bradley applies when trying to spot meaningful patterns in dizzying datasets is whether the same pattern pops up in consistently, in different types of data and from different sources.
“If we find the same phenomenon, we're going to believe it a lot more,” he said. “So we devote a lot of our effort to finding ways to integrate across different data sources.”
He applied this principle to studying mis-splicing of BRD9 transcripts, comparing hundreds of patient samples from many different types of tumors, searching for changes that arose consistently, suggesting they were worth investigating further.
Proteomics and Proteogenomics: Doing the work
While transcriptomics sheds light on which genes are turned on and how high, much of the cellular action occurs once the information in those transcripts is used to create proteins. Once made, proteins can also undergo modifications that alter their activity. The amount of a protein can change independently from that of its transcripts.
“Just sequencing the DNA or measuring the RNA of a cancer biopsy, for example, doesn't accurately reflect what's happening at the protein level. This is important because the proteins are carrying out the work of the cells and causing the cells’ behavior,” said Dr. Amanda Paulovich, a Hutch cancer geneticist and oncologist who develops proteomics technologies and holds the Aven Foundation Endowed Chair.
Because of this, many treatments, including cancer drugs, target proteins. But this has created a disconnect between the goal of personalized oncology — to tailor treatments to individual tumors — and how well it works in practice.
Though the proteome includes all the proteins in a tissue sample, personalized oncology studies have largely ignored it in favor of focusing on the genome. Many personalized oncology approaches aim to tailor patients’ therapies based on changes in their tumor’s DNA, largely because it’s cheaper and easier to precisely sequence DNA than it is to detect and quantify specific proteins. But that technology gap is beginning to narrow, said Paulovich.