As a new year dawns in Seattle, we are witnessing an amazing convergence of biology, data science and technology.
I am convinced that concepts such as machine learning, deep neural networks, natural language processing and cloud computing will become increasingly familiar parts of the cancer conversation in 2019.
At Fred Hutchinson Cancer Research Center, I see how a new generation of computationally dense technologies is cataloging the working molecular components of cells that are involved in cancer and other life-threatening illnesses. From DNA and RNA sequencing to digital imaging, we are gathering vast amounts of data about health and disease. Information on the genes, proteins and processes involved in cancer and the immune system can be searched for previously hidden patterns — clues that can lead to cures.
In 2001, it cost $95 million to sequence a single human genome. Today, it can be done for about $1,200. It took roughly 12 years to sequence the first human genome; now, it only takes a day or two. These kinds of economies are occurring in every facet of human biology, expanding the use of high-throughput studies and unleashing a deluge of data. The sheer volume of this potentially lifesaving information is mind-boggling.
In the next decade, these data are going to transform cancer prevention, diagnosis and treatment. The American Cancer Society reported this week that the death rate from cancer in the United States has steadily declined over the past 25 years, thanks to a sharp drop in smoking and advances in early detection and treatment. This is a wonderful milestone and a compelling call to redouble our efforts, because we have reached an inflection point where immunotherapies and other curative approaches can make an even greater difference in the years to come.
When we talk of harnessing the immune system to fight cancer, much of what we need to know is discoverable in digital code from these next-generation lab tools. Our success depends on how well we learn to slice, store and interpret these data, which are rolling in by the terabyte.
To put that in perspective, a terabyte is roughly equivalent to a trillion keystrokes on a computer. It is enough data to stream 100 hours of high-definition movies to your television screen.
And today, a single cancer patient can generate a terabyte of data.
What a challenge. What an opportunity.
To meet that challenge, we at Fred Hutch are fortunate to live among the world’s leading experts in collecting, moving and analyzing large data sets. For practical reasons, this sort of computational power is migrating to the cloud, where data can be stored securely and economically in massive systems run by our neighbors in Seattle such as Microsoft and Amazon, among others. There, it can be pooled with comparable data sets from other research centers and analyzed with the kind of computational horsepower no single institution could afford today.
This confluence is happening right here, right now. Microsoft and Amazon are, of course, well-established in the neighborhood. And just a few hundred yards from our campus, Google is nearing completion of a Seattle headquarters that will focus on cloud computing.
How will we leverage this incredible congruence of geography and data science in the Pacific Northwest? How do we realize the potential for Seattle to be “the place” where the integration of cancer biology and data science intersect? That is where computational biologists such as Dr. Raphael Gottardo, scientific director of our new Translational Data Science Integrated Research Center, come into play.
Raphael recently highlighted how things have changed in biomedical research. As a University of Washington graduate student in 2005, he worked on gene-expression studies, charting which genes are turned on or off within samples containing millions of cells. Today, high-speed gene-expression studies are performed on samples in which each individual cell is separated and analyzed, each cell yielding more data than could be gleaned from the entire pooled sample not so long ago. “I still do gene expression,” he said, “but we’ve multiplied the size of the data by 1 million.”
As this information is gathered, analyzed and compared, our clinical researchers will gain a clearer picture — as if they had a fantastic new microscope — of how our immune system responds to diseases like cancer. This can yield insights on better ways to keep malignancies at bay. Cancer treatment can then be tailored to match the immune profile of each patient. That is what we mean by personalized medicine.
Data science will also help us to speed the process of evaluating new treatments and open clinical trials to more patients. Sadly, 75 percent of clinical trials nationwide fail to meet their enrollment targets, and more than a quarter fail to enroll any participants at all. What if we could clear the bottlenecks?
At Fred Hutch, we are evaluating a natural language processing service from Amazon Web Services that can quickly parse medical records, helping us free more data and information hidden within clinical notes. A skilled manual screener can pull the necessary information from a record in about 1.6 hours. Amazon’s new tool can potentially review more than 9,600 records in an hour. When we can pool large numbers of patients, we can find out quickly who can benefit from a trial, a therapy or an intervention. More patients, who once were without options, could be matched with trials of cutting-edge therapies.
Collaborations are essential to manage this much data effectively. We are working with Pacific Northwest National Laboratory, which is applying machine learning to 10,000 MRI images to identify markers of breast cancer. Our hope is to improve early detection particularly for women with dense breasts.
Through a planned collaboration with Microsoft, we intend to evaluate the use of artificial intelligence — and a wearable biosensor from a third party — to monitor remotely the symptoms and side effects of patients undergoing chemotherapy. The goal is to reduce preventable visits to the emergency room, which we believe will reduce suffering, improve outcomes and reduce costs.
At the end of 2019, we will not only be looking at the end of a year, but the start of a new decade. With the right information about cancer biology and the right analytical tools in the hands of computational biologists, I begin this year with renewed optimism and a deeper sense of the urgency of our mission.