Today, we’re going to talk about genomes. Strictly speaking, genomes have two crucial functions: they must exist (to store genetic information and enable that information to be used by the cell), and they must be replicated (to transmit that information to subsequent generations of life). This seems easy enough—but when you consider that printing out the human genome using 12-point Times New Roman would give you a stack of paper higher than the Statue of Liberty, it becomes clear that making even one copy of this information (which, by the way, is happening somewhere in your body as you read this, and will take a brisk 8 hours from start to finish) is an absolutely remarkable feat. And if you think that understanding the mechanisms behind genome replication has only basic science value, think again; the majority of mutations responsible for disease, aging, and other human maladies occur during genome replication, so figuring out how genome replication works (and fails, in some cases) promises a better understanding of disease.
The quest for a better understanding of genome replication brought me to a cozy office on the second floor of Thomas Building on a warm weekday afternoon. Across the table from me sat Dr. Antonio Bedalov, whose lab in the Translational Science and Therapeutics Division at Fred Hutch focuses on chromosome biology, and Dr. Eric Foss, a long-time lab member and first author of a study from the lab recently published in eLife. “In this study, we focused on the problem of genome replication in yeast—specifically, we were looking for sites where replication begins, known as replication origins,” explained Dr. Foss.
As you may imagine from the analogy above, replicating an entire genome in the time span of a single cell cycle poses significant challenges, not least of which is ‘where to start?’ Bacteria, with their relatively small genomes, have it easy: they have a single, invariant replication origin (called oriC) at which genome replication begins every cell cycle. OriC is defined by a specific DNA sequence which is recognized by dedicated replication machinery. In stark contrast, the human genome contains tens of thousands of replication origins, but they appear to be used stochastically and in context-dependent manners. The yeast Saccharomyces cerevisiae, which has long been a valuable model system for eukaryotic genetics research, has traditionally occupied an uncomfortable middle ground between these two extremes. As Foss noted, “Before this study, yeasts were generally thought to have around 500 replication origins, which is a conspicuously small number for a eukaryotic genome.”
When I asked Foss and Bedalov whether the known replication origins in yeast shared any sequence-level features, the pair exchanged a knowing look that foreshadowed a complicated answer. “Yes and no,” began Dr. Bedalov. “There is a sequence motif known as the ARS consensus sequence, or ACS, but this sequence is quite loosely defined, and you can find many examples of known yeast replication origins without an ACS, so it’s not an absolute requirement.”
“This fact—as well as a few clues from the literature—led us to speculate that there may be more replication origins in the yeast genome than previously thought,” added Foss, “and to find these, we sought a more functional definition of replication origins.” In their quest to catalog new origins, the team took advantage of one crucial piece of knowledge: not all replication origins have an ACS, but all replication origins do need to bind a protein called the Mcm replicative helicase (Mcm)—this is the protein that physically separates the two DNA strands and allows each to be accessed by DNA-replicating machinery. With this knowledge, the team had a strategy: to find new replication origins, follow Mcm to the scene of the crime . To locate Mcm binding sites in the genome, they used a technology developed in part by the lab of Dr. Steve Henikoff at the Hutch called chromatin endogenous cleavage (ChEC), in which Mcm is fused to a DNA-cutting enzyme called micrococcal nuclease (MNase). Wherever Mcm binds in the genome, MNase will cut the DNA to either side, generating a library of short genomic fragments corresponding to the ‘footprints’ of Mcm.
Although Foss and colleagues had previously applied this method to determine Mcm binding to the 500 known replication origins in yeast, they now turned this powerful tool loose on the yeast genome at large. What they found was shocking—not 500 Mcm binding sites, as had previously been assumed, but over 5000! At this point, however, I was a bit skeptical. “Okay, so these sites bind Mcm, but how do you know that they’re actually functional replication origins, and not noise?” I asked. Foss was quick to respond. “This is a very valid consideration, which is why we asked whether these Mcm binding sites shared other features characteristic of replication origins.” Indeed, not only do these sites bind Mcm, but they also look an awful lot like bona fide replication origins—they were located in intergenic regions free of DNA-packaging structures called nucleosomes, and they were often located nearby ACS sequences. Of the ~5000 new binding sites the team identified, 1600 appeared in an independent dataset quantifying genome-wide single-stranded DNA—considered in the field to be a high-confidence predictor of origins.
“One of the most interesting findings to me was the GC skew data,” noted Foss. Here, he’s referring to an interesting evolutionary signature that characterizes replication origins. As it turns out, sites in the genome which have historically been used as replication origins exhibit a strand-specific ‘skew’ in their proportion of guanine (G) to cytosine (C) nucleotide bases, related to the fact that cytosine bases on one of the replicating DNA strands are more exposed and liable to chemical degradation and conversion to G bases. Surprisingly, all ~5000 of the newly discovered Mcm binding sites show this GC skew signature, strongly suggesting that they are indeed functional replication origins.
All in all, this study discovers ~1600 new high-confidence replication origins and thousands more probable origins in yeast, increasing the number of known replication origins in yeast by an order of magnitude. How would all of these origins have slipped past the careful scrutiny of geneticists for so long? “We think it all comes down to the methods used to identify these sites,” explained Bedalov. “Traditionally, replication origins were defined by their ability to replicate autonomously if they were taken out of their genomic context and put into an artificial plasmid. While this method works, it can identify only the ‘strongest’ origins.” As is so often the case in science, you get what you look for. Foss adds, “we also couldn’t have done this work without the incredible sensitivity and precision of ChEC to find Mcm binding sites.” Now firmly invested in the quest to find new replication origins, I lobbed what I thought was a hardball question at the duo: “You also show that a majority of these new replication origins appear to be used quite infrequently—sometimes present only in a few percent of the cells. Would this suggest that they’re not that important?”
As always, Foss and Bedalov were one step ahead of me. “Maybe so,” Foss began, “but we think the presence of that GC skew evolutionary footprint in these new origins suggests that while they may not be extensively used today, they might’ve been more active in the evolutionary past. Overall, we think the value of these results isn’t in identifying any single origin, but in the concept that when you reframe the definition of a replication origin, you discover that yeast genome replication isn’t as different from ours as previously thought!”
The spotlighted research was funded by the National Institutes of Health.
Fred Hutch/University of Washington/Seattle Children’s Cancer Consortium member Dr. Antonio Bedalov contributed to this study.
Foss, E. J., Lichauco, C., Gatbonton-Schwager, T., Gonske, S. J., Lofts, B., Lao, U., & Bedalov, A. (2024). Identification of 1600 replication origins in S. cerevisiae. eLife, 12, RP88087. https://doi.org/10.7554/eLife.88087.4