Enabling Rapid Exploration of Multiple COVID-19 Datasets
As of June 2021, the ongoing COVID-19 pandemic has resulted in over 170 million confirmed COVID-19 cases and 3.7 million deaths globally.
Understanding the molecular mechanisms of COVID-19 pathogenesis, as well as the immune-cell subsets and molecular factors associated with protective or pathological immunity against SARS-CoV-2, the virus that causes COVID-19, could greatly aid the development of vaccines and therapeutics.
Single-cell technologies such as flow cytometry, mass cytometry, single-cell transcriptomics and single-cell multi-omic profiling offer unprecedented potential to dissect immune-response heterogeneity among individual cells. These technologies are being used to analyze COVID-19 at an astounding pace.
The Challenge
While there are many valuable COVID-19 datasets in the public domain, they must be acquired and standardized before researchers can use them to answer basic questions about the disease. The high-dimensional nature of these datasets also makes it difficult to translate the raw data into a visually comprehensible display that facilitates scientific discovery.
The Approach
To enable the rapid exploration of multiple COVID-19 datasets, the Hutch Data Core partnered with Hutch computational biologists Drs. Yuan Tian and Raphael Gottardo to create the Fred Hutchinson COVID-19 Cell Atlas, an online resource for data visualization, exploration and discovery.
The individual COVID-19 Cell Atlases (one for each dataset, which merge into a unified Atlas) are built on the PubWeb platform, a collection of interoperable, open-source technologies that enable the analysis and dissemination of research data.
Key components of this platform include:
- A data portal that enables the execution of reproducible analysis pipelines on cloud infrastructure.
- A reference data application programming interface that contains 2 TB of data sourced from over 25 public repositories.
- A high-performance web visualization framework based on WebGL.
The Outcome
The COVID-19 Cell Atlas makes it possible for researchers to compare the frequencies of immune cell types across datasets, identify differentially expressed genes and proteins, or interrogate a particular gene or protein of interest. By integrating these data, this resource also provides investigators with previously unavailable information, such as predicted protein abundance.
The COVID-19 Cell Atlases were released in May 2021. This resource is the latest in a growing collection of visualization sites that currently includes over 5 TB of data on topics ranging from development and cancer to cutting-edge machine learning techniques. These resources have been featured in back-to-back publications in Science, Cell and Nature and enabled over 20,000 researchers to examine complex, high-dimensional datasets.
“Working with the Hutch Data Core has been a really rewarding experience because of how they are able to translate these datasets into highly responsive web graphics that can facilitate biological exploration. We are excited to continue our work together as we continue to push the boundaries of the scale and scope of data, which we can leverage in our work to prevent and treat human disease.”
— Dr. Raphael Gottardo, Scientific Director, Translational Data Science Integrated Research Center, and holder of the J. Orin Edson Foundation Endowed Chair, Fred Hutchinson Cancer Center