One of the first questions you ask after clustering a Visium dataset is: which genes actually vary across space? Not just between clusters — but continuously, across the tissue. That's what spatially variable gene (SVG) analysis answers.
In this post we run Moran's I via Squidpy on the 10x Genomics mouse brain Visium demo dataset and look at what comes out.
What is Moran's I?
Moran's I is a spatial autocorrelation statistic. For a given gene, it asks: are spots with high expression near other spots with high expression? A score near 1 means strong spatial clustering. Near 0 means random. Near -1 means a checkerboard pattern (rare in transcriptomics).
The formula is:
I = (N / W) * (Σᵢ Σⱼ wᵢⱼ(xᵢ - x̄)(xⱼ - x̄)) / Σᵢ(xᵢ - x̄)²
Where N is the number of spots, wᵢⱼ is the spatial weight between spots i and j (1 if neighbors, 0 otherwise), and x is normalized expression.
Squidpy wraps this with permutation-based p-values so you get significance alongside the score.
The setup
Dataset: visium_hne_adata() from Squidpy's built-in demos — 2,688 spots, 18,078 genes, mouse brain coronal section.
We filtered to the top 500 highly variable genes (HVGs) before running Moran's I. Running on all 18k genes is possible but slow; HVG filtering keeps the analysis focused on genes that vary meaningfully across the dataset.
import squidpy as sq
import scanpy as sc
adata = sc.read_h5ad("visium_hne_adata.h5ad")
# Filter to top 500 HVGs
sc.pp.highly_variable_genes(adata, n_top_genes=500, flavor="cell_ranger")
adata = adata[:, adata.var.highly_variable].copy()
# Normalize
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
# Build spatial graph
sq.gr.spatial_neighbors(adata)
# Run Moran's I with permutation testing
if __name__ == '__main__':
sq.gr.spatial_autocorr(adata, mode="moran", n_perms=200, n_jobs=1)
svg_results = adata.uns["moranI"]
top20 = svg_results.sort_values("I", ascending=False).head(20)
One note on if __name__ == '__main__': — this guard is required on Windows because Squidpy's permutation step uses multiprocessing. Without it you get a RuntimeError about the bootstrapping phase. On Linux/Mac it's not needed.
Results: top 20 SVGs
All 500 tested genes returned p-value = 0.0 after FDR correction (Benjamini-Hochberg). That's expected when the spatial signal is this strong — the permutation test rarely produces a null distribution that overlaps with the observed Moran's I. The ranking by I score is what matters here.
| Rank | Gene | Moran's I | Known function |
|---|---|---|---|
| 1 | Itpka | 0.674 | IP3 kinase; enriched in hippocampus and cortical layers |
| 2 | Fezf1 | 0.634 | Transcription factor; deep cortical layer identity (L5/L6) |
| 3 | Baiap3 | 0.622 | Synaptic scaffolding; expressed in specific subcortical regions |
| 4 | Shox2 | 0.611 | Thalamic and hypothalamic marker |
| 5 | Slc30a3 | 0.600 | Zinc transporter; cortex-specific, enriched in excitatory neurons |
| 6 | Gal | 0.590 | Galanin neuropeptide; subcortical and hypothalamic |
| 7 | Arhgap36 | 0.590 | RhoGAP family; cerebellum and thalamus enriched |
| 8 | Foxg1 | 0.587 | Forebrain development TF; restricted to cortex and hippocampus |
| 9 | Tbr1 | 0.584 | Layer 6 cortical excitatory neuron marker |
| 10 | Icam5 | 0.580 | Telencephalin; forebrain-restricted adhesion molecule |
| 11 | Kcnh3 | 0.564 | Voltage-gated K+ channel; cortical expression |
| 12 | Nov | 0.561 | CCN3; expressed in specific cortical and subcortical layers |
| 13 | Rprml | 0.555 | Reprimo-like; limited expression data, spatially restricted here |
| 14 | Crym | 0.544 | mu-crystallin; striatum and layer 6 cortex marker |
| 15 | Epop | 0.543 | Elongin BC and Polycomb repressive complex protein |
| 16 | Kcnj4 | 0.543 | Inward-rectifier K+ channel; subcortical enriched |
| 17 | AW551984 | 0.540 | Uncharacterized locus; strong spatial signal here |
| 18 | Otp | 0.527 | Orthopedia homeobox; hypothalamic nuclei marker |
| 19 | Igfbp6 | 0.525 | IGF binding protein; meningeal and vascular enriched |
| 20 | Lamp5 | 0.523 | Interneuron subtype marker; layer 1-2 cortex |
What the results tell us
The top SVGs split into recognizable categories:
Cortical layer markers — Fezf1, Tbr1, Foxg1, Slc30a3, Crym, Lamp5. These genes define cortical laminar identity. Moran's I picks them up because cortical layers are spatially coherent bands — spots in layer 6 cluster together, and Tbr1 expression clusters with them.
Subcortical region markers — Shox2, Gal, Otp, Arhgap36. These are thalamic or hypothalamic markers. A coronal section at a mid-brain level captures both cortex and subcortical structures, and genes that are on/off between these compartments will score high on spatial autocorrelation.
Synaptic/ion channel genes — Itpka, Kcnh3, Kcnj4, Icam5. These are expressed in specific neuron subtypes that cluster spatially. Itpka topping the list is consistent with its known hippocampal enrichment — the hippocampus occupies a spatially coherent arc in these sections.
The uncharacterized locus AW551984 at rank 17 is interesting. A high Moran's I on a poorly annotated gene is a reasonable starting point for further investigation — it's spatially structured, which at minimum means it's not noise.
A note on HVG pre-filtering
We filtered to 500 HVGs before running Moran's I. This is a practical choice, not a methodological requirement. Running on all genes would likely surface similar top hits but would take proportionally longer. The tradeoff: genes that are spatially variable but not highly variable (e.g., lowly expressed region-specific markers) could be missed. For a complete SVG screen, running on all expressed genes with a minimum count filter is more thorough.
What's next
The natural follow-up is to visualize where these top SVGs are expressed on the tissue — Squidpy's sq.pl.spatial_scatter does this in a few lines. That'll be the next post.
The full code for this analysis is at github.com/lociven/spatiabio-tutorials.
Get the complete pack
Squidpy Complete Analysis Pack — 10 Notebooks
All 10 notebooks from this series in one download — SVGs, neighborhood enrichment, co-occurrence, ligand-receptor, and the complete pipeline. Verified and ready to run on your own data.
Get it for $19 →
No comments:
Post a Comment