Where a gene is expressed in a tissue can be essential for its development or function. For instance, during embryonic development, spatial patterning of specific genes in the developing limb bud ensures you end up with exactly five fingers, whereas in a tumor, varying spatial transcription might be important for how cancer cells interact with the microenvironment and may influence prognosis. To identify how genes are spatially expressed in a tissue and study the importance of this regulation, spatial transcriptomics has become an emerging technology. With these new techniques, computational analysis methods have followed suit to help make sense of these complex two-dimensional datasets. Dr. Daniel Jones, a staff scientist in Dr. Evan Newell’s lab in the Vaccine and Infectious Disease Division, started to think about “different ways spatial coordinates can be used to inform clustering and gating of cells in spatial transcriptomics. This led us to formalizing the idea of ‘spatial information’. We then realized that just having a method to estimate spatial information is quite useful on its own.” In a recently published Cell Reports Methods article, Dr. Jones and researchers from the Newell group developed a method called Maxspin (short for “maximization of spatial information”), which is “a general-purpose method for spatial data that measures how spatially organized or non-random a signal like gene expression is,” explained Dr. Jones. Through utilizing data generated from multiple spatial transcriptomics platforms, the authors demonstrate the effectiveness of their method compared to current methodologies and use Maxspin to reveal novel spatial gene expression patterns in a renal cell carcinoma tumor using the CosMx Spatial Molecular Imager.
Deciding whether a gene is expressed in discrete domains or more ubiquitously, largely depends on the scope of the field you are examining. “In our view, this becomes increasingly nonsensical as spatial transcriptomics grows to larger scales, because few if any genes are really expressed totally at random across a large enough region,” Dr. Jones stated. Thinking about how scope dictates whether a gene is considered spatially varied is somewhat similar to how we visually interpret patterns. For instance, perhaps you’re in a store and see a shirt you like from across the room. You can tell there’s a patten on the shirt but details blend together so you mostly interpret the fabric’s hues. It’s not until you’re up close that you see the pattern is created by distinct, different colored shapes, demonstrating how scale dictates your interpretation of the pattern. “At the right scale, [genes are] spatially organized to some extent,” said Dr. Jones. Creating a computational method to interpret spatial gene organization involves some complex math, but luckily Dr. Jones breaks it down for us: “We took a perspective from information theory (a mathematical theory of quantifying information), to say that a gene's expression is spatially organized when knowing the expression levels of a pair of cells helps predict whether the two cells are nearby or far apart. The easier this near-vs-far prediction problem becomes, the more spatially organized the signal must be. On the other hand, if there is no spatial organization, we can't do any better than guessing. This makes some intuitive sense we hope, but it turns out there are also very good theoretical reasons to think about it this way, as it corresponds to an important quantity in information theory called the Jensen-Shannon divergence. The same idea can then be extended to look for spatial co-organization of multiple genes.” While this method can be used as a statistical test, which the authors show is also more accurate than the existing methodologies, Dr. Jones exclaimed that “we are much more interested in being able to quantify how spatially varying or organized a gene is. In other words, we want effect size estimates, not just p-values. In that way, Maxspin resembles measurements of spatial correlation, but more general and more sensitive, and in some ways more interpretable.”
To test the performance of Maxspin compared with existing programs, the authors utilized both simulated datasets and publicly available datasets that were generated using three leading spatial transcriptomics platforms (10x Visium, NanoString Spatial Molecular Imager, and Vizgen MERFISH). The Newell group then evaluated the performance of different methods to detect spatially varying genes. They found that Maxspin was the clear winner, outcompeting existing methods as it was able to better capture spatially coherent expression patterns and importantly detect more sensitive measures, like the degree that a gene could be spatially varied. Dr. Jones noted that “there were a couple of surprising things we found once we started estimating spatial information across a variety of datasets. First, a surprising number of genes show gradients of expression, which we might not think of as differentially expressed in a conventional scRNA-Seq experiment, but with spatial context shows a clear pattern.” The researchers then used Maxspin to investigate patterns of spatially varying gene expression in a CosMx dataset of a human renal cell carcinoma sample. Here, they provided Maxspin with a real challenge; this large dataset consisted of nearly 180,000 cells and 1,000 genes, which many existing methods would fail to accurately scale. Maxspin stood up to the challenge though, where Dr. Jones stated that in “the renal cell carcinoma (RCC) samples we examined, we saw this manifest as heterogeneity across tumor regions and gradual changes in expression towards the interior of some tumor regions.” He added that “Maxspin can be surprisingly useful as a QC tool. Subtle technical artifacts, like elevated expression in a particular field of view tend to have high spatial information, so we pick up on it pretty easily.” Furthermore, Dr. Jones explained that “this type of analysis really expands the concept of binary differential expression between discrete categories of cells that we're familiar with from conventional transcriptomics, and can reveal less obvious gradients and niches of expression.” They observed this in the RCC data examined as well as “with the data we've looked at since then, especially when it comes to the tumor microenvironment. We hope this can give a nuanced view of gene expression variability,” Dr. Jones noted.
Broadly, this study presents “spatial information as an elegant way of thinking about variation in spatial omics. But more importantly […]is that we have a very practical tool based on that theory that we've extensively tested and shown to be more sensitive than existing tests of spatial organization,” said Dr. Jones. In the future, the Newell group plans to “go beyond just estimating spatial information to also treat spatial information as on objective to optimize when doing unsupervised clustering or gating.” For those interested in spatial transcriptomics, Dr. Jones recommended to first consider the different tradeoffs and limitations of each platform: “Getting some familiarity with these limitations is important before investing a lot of resources, or all your hopes and dreams, in an experiment. For example, subcellular resolution in situ assays (e.g., CosMx, Xenium, MERFISH) are very detailed, but don't always have large enough panels of genes, sufficient depth of coverage, or accurate enough cell segmentation to tease out every cell type you might like to describe. Spot based assays (e.g., Visium) will give you that depth and coverage, but at the cost of a fundamentally blurrier picture of the sample.”
This work was supported by the National Institutes of Health, the Immunotherapy and Translational Data Science Integrated Research Centers at Fred Hutchinson Cancer Center, the Andy Hill CARE fund, and institutional funding from the Faculté de Biologie et Medicine of the University of Lausanne and the Lausanne University Hospital.
Fred Hutch/UW/Seattle Children’s Cancer Consortium member Dr. Evan Newell contributed to this work.
Jones DC, Danaher P, Kim Y, Beechem JM, Gottardo R, Newell EW. An information theoretic approach to detecting spatially varying genes. Cell Rep Methods. 2023 Jun 16;3(6):100507. doi: 10.1016/j.crmeth.2023.100507. PMID: 37426750; PMCID: PMC10326450.