Wednesday, July 1, 2026

I Gave Claude Science One Prompt. It Ran a Full Spatial Analysis.

By Lociven · SpatiaBio · July 2, 2026

I gave Claude Science AI Workbench — Anthropic's new scientific analysis platform — a single prompt and a dataset. Thirty minutes later, it handed me five publication-quality figures, a fully executable Jupyter notebook, and a reproducibility report.


What is Claude Science AI Workbench?

Released in June 2026, Claude Science is Anthropic's platform for running agentic scientific analysis inside a sandboxed Linux environment. You describe what you want in plain language. The agent installs packages, writes code, executes it, fixes errors, and returns figures and notebooks — all without you touching a terminal.

For spatial transcriptomics researchers, the pitch is simple: skip the environment setup, skip the debugging, get directly to the biology.

The prompt I used

Analyze the Squidpy IMC breast cancer dataset (sq.datasets.imc()):
1. Spatial neighbors graph — Delaunay triangulation
2. Neighborhood enrichment (sq.gr.nhood_enrichment, n_perms=1000)
3. Co-occurrence as a function of distance
4. Interaction matrix and centrality scores
5. Ripley's L for clustering vs. randomness

Use publication-grade conventions. Return a Jupyter notebook with all outputs embedded.

That's it. No code. No conda commands. The agent created a spatial conda environment (squidpy 1.8.2, scanpy 1.11.5), downloaded the Jackson et al. breast cancer IMC dataset (4,668 cells × 34 protein markers, 11 cell types), ran the full pipeline, and produced the figures below.


Figure 1 — Cell types in situ

The IMC dataset captures 11 cell types across a breast cancer tissue section. Apoptotic tumor cells (cyan) dominate numerically and are distributed throughout the tissue.

Breast cancer IMC cell types in situ

Generated with Claude Science AI Workbench (Anthropic, 2026) · squidpy 1.8.2 · Jackson et al., Nature 578, 2020

Figure 2 — Neighborhood enrichment: which cell types co-locate?

The permutation-based z-scores reveal the tissue's immune architecture at a glance. Three patterns stand out:

Immune clustering (z = +29 to +36): T cells, macrophages, and stromal cells form a tightly co-located immune compartment — the canonical TIL niche.
Tumor-immune avoidance (z = −21 to −28): Apoptotic tumor cells strongly avoid macrophages. Immune exclusion at the single-cell spatial level.
Tumor self-clustering (z = +33 to +48): Each tumor subtype clusters with itself, consistent with clonal expansion in distinct spatial niches.
Neighborhood enrichment heatmap

Generated with Claude Science AI Workbench (Anthropic, 2026) · squidpy 1.8.2 · Jackson et al., Nature 578, 2020

Interpretation note: A z-score above +10 is conventionally significant with n_perms=1000. Values of +29 to +48 indicate extremely non-random co-localization.

Figure 3 — Co-occurrence: the spatial scale of immune clustering

Co-occurrence shows at what distance cell types interact. The steep decay curves confirm that immune clustering is a contact-range phenomenon — not a tissue-wide gradient. Macrophages are enriched 7× near other macrophages at minimal distance, then fall sharply.

Co-occurrence distance curves

Generated with Claude Science AI Workbench (Anthropic, 2026) · squidpy 1.8.2 · Jackson et al., Nature 578, 2020

Figure 4 — Graph centrality: who connects the tissue?

Apoptotic tumor cells have degree centrality 0.83 and closeness centrality 0.84 — nearly 5× higher than any other cell type. They are physically positioned at the crossroads of the tissue, spatially interleaved with every other population.

Graph centrality scores

Generated with Claude Science AI Workbench (Anthropic, 2026) · squidpy 1.8.2 · Jackson et al., Nature 578, 2020

Figure 5 — Ripley's L: clustering vs. complete spatial randomness

Apoptotic tumor cells spike massively above the 95% CSR envelope across all distances — extreme, scale-independent clustering. Combined with the centrality result: self-clustering + network centrality is characteristic of a dominant tumor clone that has physically reorganized the tissue architecture.

Ripley's L clustering analysis

Generated with Claude Science AI Workbench (Anthropic, 2026) · squidpy 1.8.2 · Jackson et al., Nature 578, 2020


What Claude Science actually did (under the hood)

The agent's execution log was visible in real time. It:

1. Created a conda environment from scratch (squidpy==1.8.2, scanpy==1.11.5)
2. Downloaded and validated the IMC dataset (4,668 cells × 34 markers)
3. Ran all five Squidpy analyses with appropriate parameters
4. Noticed its own figure title was inaccurate and self-corrected it before saving
5. Authored a 23-cell reproducible notebook with embedded outputs
6. Bundled the dataset, requirements.txt, and README automatically

The self-correction on the Ripley's L title is worth noting. It cross-checked the figure against the data and caught a misleading generalization. That's not what most analysis scripts do.


Bonus: the same pipeline on Visium data

The IMC analysis above uses point-cloud coordinates. I reran the same prompt on a 10x Genomics Visium section (grid-based spots, 55 µm resolution). The agent automatically switched to coord_type="grid" and n_neighs=6 for the hexagonal lattice without being instructed to — it inferred this from the data format.

Figure 6 — Visium: spatial clusters

On the Visium hexagonal grid, cluster boundaries directly reflect anatomical structure. The sharper compartmentalization here vs. the IMC data is expected: Visium spots are physically constrained to a grid, so spatial patterns emerge more cleanly at tissue scale.

Visium spatial clusters

Generated with Claude Science AI Workbench (Anthropic, 2026) · squidpy 1.8.2 · 10x Genomics Visium

Figure 7 — Visium: gene expression overlay

Gene expression mapped onto spatial coordinates reveals domain-specific marker gradients. Unlike IMC (protein-level), Visium captures transcriptomic heterogeneity at spot resolution. Claude Science generated both the continuous expression overlay and the categorical cluster map in a single run.

Visium gene expression overlay

Generated with Claude Science AI Workbench (Anthropic, 2026) · squidpy 1.8.2 · 10x Genomics Visium

Figure 8 — Visium: neighborhood enrichment

The neighborhood enrichment heatmap on Visium shows a clean tissue-layer organization: adjacent anatomical compartments co-enrich (positive z-scores), distant layers avoid each other (negative z-scores). You can trace these directly back to the cluster map in Figure 6.

Visium neighborhood enrichment

Generated with Claude Science AI Workbench (Anthropic, 2026) · squidpy 1.8.2 · 10x Genomics Visium

The full Visium pipeline — grid graph parameters, spatial autocorrelation, and multi-sample batch correction — is in Pack 1 notebooks 03, 06, and 06b.

Honest assessment: is it useful?

✓ Works well for

• Standard Squidpy pipelines
• Exploratory analysis on new datasets
• Generating a reproducible baseline notebook
• Researchers new to spatial omics
• Rapid figure drafts for lab meetings

✗ Limitations

• Requires manual approval for each tool call
• Can't customize beyond what you describe
• No Visium HD / large dataset chunking
• Figure aesthetics are functional, not polished
• Session restarts lose progress

Bottom line: For getting from raw data to interpretable spatial figures in one session without writing code, it genuinely works. For publication-level customization, you still need to go hands-on. Pack 1 covers exactly that layer.


SpatiaBio Pack 1

Squidpy Foundations — 16 Notebooks ($19)

Everything Claude Science did above — plus memory optimization for Visium HD, batch correction across samples, ligand-receptor analysis, and Nature-style publication figure templates.

Get it for $19 →

Sister blog

The biology behind the cells you're mapping

NeoantigenLab covers neoantigen biology, HLA typing, pVACseq, and cancer immunotherapy — the immunology context for what spatial analysis reveals.

Visit NeoantigenLab →

Tuesday, June 30, 2026

Visualizing Spatially Variable Genes on Tissue: sq.pl.spatial_scatter

In the previous post we ran Moran's I on the 10x Genomics mouse brain Visium dataset and ranked 500 genes by spatial autocorrelation. The top hit was Itpka at I = 0.674. But a number alone doesn't tell you much. This post puts those genes on the tissue.

The setup

Same preprocessed AnnData object from post 5 — 500 HVGs, normalized, log1p transformed, spatial neighbors computed. We pass the top 5 SVGs directly to sq.pl.spatial_scatter.

import squidpy as sq
import scanpy as sc

adata = sc.read_h5ad("visium_hne_adata.h5ad")
sc.pp.highly_variable_genes(adata, n_top_genes=500, flavor="cell_ranger")
adata = adata[:, adata.var.highly_variable].copy()
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
sq.gr.spatial_neighbors(adata)

top5 = ["Itpka", "Fezf1", "Baiap3", "Shox2", "Slc30a3"]

sq.pl.spatial_scatter(
    adata,
    color=top5,
    ncols=3,
    save="svg_top5_combined.png"
)

sq.pl.spatial_scatter reads spot coordinates from adata.obsm["spatial"] and colors each spot by expression value. The colormap runs from purple (low) to yellow (high), matching scanpy's default viridis scale.

Results

Top 5 spatially variable genes visualized on mouse brain Visium tissue

Top 5 SVGs by Moran's I score plotted on the coronal mouse brain section. Each spot represents one Visium capture area (~55 µm diameter).

What each pattern tells us

Itpka (I = 0.674) — hippocampus arc
The high-expression spots cluster tightly in the upper-left region of the section — the hippocampal formation. Itpka encodes IP3 3-kinase A, which is highly enriched in hippocampal neurons and involved in calcium signaling. The spatial pattern matches known anatomy exactly: a compact, curved structure that stands out from surrounding cortex.

Fezf1 (I = 0.634) — deep cortical layer band
Expression is restricted to a narrow band along the lower edge of the cortical mantle. Fezf1 is a transcription factor required for the identity of layer 5 and layer 6 corticospinal neurons. The thin stripe visible here is consistent with cortical laminar organization — layers 5/6 run as a continuous band at the base of the neocortex.

Baiap3 (I = 0.622) — subcortical focal cluster
High expression concentrated in the lower-right, corresponding to a subcortical nucleus. Baiap3 (brain-specific angiogenesis inhibitor 1-associated protein 3) is a synaptic scaffolding protein expressed in specific subcortical populations. The focal, non-laminar pattern distinguishes it from the cortical layer markers above.

Shox2 (I = 0.611) — thalamic/hypothalamic domain
A broad cluster in the lower-left, covering what appears to be thalamic or hypothalamic territory. Shox2 marks specific nuclei in this region and is involved in the development of thalamic relay circuits. The expression domain is spatially coherent but larger than the single-nucleus pattern of Baiap3.

Slc30a3 (I = 0.600) — cortical band, broader than Fezf1
Expression forms a wider band than Fezf1 and extends further across the cortical surface. Slc30a3 encodes a zinc transporter enriched in excitatory cortical neurons. Unlike the deep-layer specificity of Fezf1, Slc30a3 appears to span multiple cortical layers, which explains the broader stripe.

Pattern categories

Looking across the five maps, two distinct spatial modes emerge:

Laminar patterns (Fezf1, Slc30a3) — horizontal bands parallel to the cortical surface. These genes define layer identity. The Moran's I is high because spots within a layer are neighbors of other spots in the same layer, and they all share expression of the same laminar marker.

Regional patterns (Itpka, Baiap3, Shox2) — compact clusters corresponding to anatomically defined structures. These genes are on in one region and off everywhere else. This on/off contrast across space is exactly what Moran's I detects: neighboring spots agree on expression, and that agreement is strong.

Both modes score high on Moran's I, but for slightly different reasons. Laminar genes have many neighbors (a full band of spots), while regional genes have fewer but with very high expression contrast against background.

A note on the colormap

The viridis colormap scales independently per gene, so the yellow in Itpka (max ~80) is not comparable to yellow in Fezf1 (max ~14). This is the default behavior and is fine for qualitative comparison of spatial pattern, but misleading if you want to compare expression levels across genes. For cross-gene level comparison, fix the colormap range with vmax:

sq.pl.spatial_scatter(adata, color="Itpka", vmin=0, vmax=20)

What's next

The next logical step is neighborhood enrichment analysis — looking at which cell types or clusters tend to be spatially adjacent. That's sq.gr.nhood_enrichment, and it answers a different question: not which genes vary across space, but which populations co-locate.

Full code at github.com/lociven/spatiabio-tutorials.

Get the complete pack

Squidpy Complete Analysis Pack — 10 Notebooks

All 10 notebooks from this series in one download — SVGs, neighborhood enrichment, co-occurrence, ligand-receptor, and the complete pipeline. Verified and ready to run on your own data.

Get it for $19 →

Spatially Variable Genes with Squidpy: Moran's I on Mouse Brain Visium Data

One of the first questions you ask after clustering a Visium dataset is: which genes actually vary across space? Not just between clusters — but continuously, across the tissue. That's what spatially variable gene (SVG) analysis answers.

In this post we run Moran's I via Squidpy on the 10x Genomics mouse brain Visium demo dataset and look at what comes out.

What is Moran's I?

Moran's I is a spatial autocorrelation statistic. For a given gene, it asks: are spots with high expression near other spots with high expression? A score near 1 means strong spatial clustering. Near 0 means random. Near -1 means a checkerboard pattern (rare in transcriptomics).

The formula is:

I = (N / W) * (Σᵢ Σⱼ wᵢⱼ(xᵢ - x̄)(xⱼ - x̄)) / Σᵢ(xᵢ - x̄)²

Where N is the number of spots, wᵢⱼ is the spatial weight between spots i and j (1 if neighbors, 0 otherwise), and x is normalized expression.

Squidpy wraps this with permutation-based p-values so you get significance alongside the score.

The setup

Dataset: visium_hne_adata() from Squidpy's built-in demos — 2,688 spots, 18,078 genes, mouse brain coronal section.

We filtered to the top 500 highly variable genes (HVGs) before running Moran's I. Running on all 18k genes is possible but slow; HVG filtering keeps the analysis focused on genes that vary meaningfully across the dataset.

import squidpy as sq
import scanpy as sc

adata = sc.read_h5ad("visium_hne_adata.h5ad")

# Filter to top 500 HVGs
sc.pp.highly_variable_genes(adata, n_top_genes=500, flavor="cell_ranger")
adata = adata[:, adata.var.highly_variable].copy()

# Normalize
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)

# Build spatial graph
sq.gr.spatial_neighbors(adata)

# Run Moran's I with permutation testing
if __name__ == '__main__':
    sq.gr.spatial_autocorr(adata, mode="moran", n_perms=200, n_jobs=1)

svg_results = adata.uns["moranI"]
top20 = svg_results.sort_values("I", ascending=False).head(20)

One note on if __name__ == '__main__': — this guard is required on Windows because Squidpy's permutation step uses multiprocessing. Without it you get a RuntimeError about the bootstrapping phase. On Linux/Mac it's not needed.

Results: top 20 SVGs

All 500 tested genes returned p-value = 0.0 after FDR correction (Benjamini-Hochberg). That's expected when the spatial signal is this strong — the permutation test rarely produces a null distribution that overlaps with the observed Moran's I. The ranking by I score is what matters here.

RankGeneMoran's IKnown function
1Itpka0.674IP3 kinase; enriched in hippocampus and cortical layers
2Fezf10.634Transcription factor; deep cortical layer identity (L5/L6)
3Baiap30.622Synaptic scaffolding; expressed in specific subcortical regions
4Shox20.611Thalamic and hypothalamic marker
5Slc30a30.600Zinc transporter; cortex-specific, enriched in excitatory neurons
6Gal0.590Galanin neuropeptide; subcortical and hypothalamic
7Arhgap360.590RhoGAP family; cerebellum and thalamus enriched
8Foxg10.587Forebrain development TF; restricted to cortex and hippocampus
9Tbr10.584Layer 6 cortical excitatory neuron marker
10Icam50.580Telencephalin; forebrain-restricted adhesion molecule
11Kcnh30.564Voltage-gated K+ channel; cortical expression
12Nov0.561CCN3; expressed in specific cortical and subcortical layers
13Rprml0.555Reprimo-like; limited expression data, spatially restricted here
14Crym0.544mu-crystallin; striatum and layer 6 cortex marker
15Epop0.543Elongin BC and Polycomb repressive complex protein
16Kcnj40.543Inward-rectifier K+ channel; subcortical enriched
17AW5519840.540Uncharacterized locus; strong spatial signal here
18Otp0.527Orthopedia homeobox; hypothalamic nuclei marker
19Igfbp60.525IGF binding protein; meningeal and vascular enriched
20Lamp50.523Interneuron subtype marker; layer 1-2 cortex

What the results tell us

The top SVGs split into recognizable categories:

Cortical layer markers — Fezf1, Tbr1, Foxg1, Slc30a3, Crym, Lamp5. These genes define cortical laminar identity. Moran's I picks them up because cortical layers are spatially coherent bands — spots in layer 6 cluster together, and Tbr1 expression clusters with them.

Subcortical region markers — Shox2, Gal, Otp, Arhgap36. These are thalamic or hypothalamic markers. A coronal section at a mid-brain level captures both cortex and subcortical structures, and genes that are on/off between these compartments will score high on spatial autocorrelation.

Synaptic/ion channel genes — Itpka, Kcnh3, Kcnj4, Icam5. These are expressed in specific neuron subtypes that cluster spatially. Itpka topping the list is consistent with its known hippocampal enrichment — the hippocampus occupies a spatially coherent arc in these sections.

The uncharacterized locus AW551984 at rank 17 is interesting. A high Moran's I on a poorly annotated gene is a reasonable starting point for further investigation — it's spatially structured, which at minimum means it's not noise.

A note on HVG pre-filtering

We filtered to 500 HVGs before running Moran's I. This is a practical choice, not a methodological requirement. Running on all genes would likely surface similar top hits but would take proportionally longer. The tradeoff: genes that are spatially variable but not highly variable (e.g., lowly expressed region-specific markers) could be missed. For a complete SVG screen, running on all expressed genes with a minimum count filter is more thorough.

What's next

The natural follow-up is to visualize where these top SVGs are expressed on the tissue — Squidpy's sq.pl.spatial_scatter does this in a few lines. That'll be the next post.

The full code for this analysis is at github.com/lociven/spatiabio-tutorials.

Get the complete pack

Squidpy Complete Analysis Pack — 10 Notebooks

All 10 notebooks from this series in one download — SVGs, neighborhood enrichment, co-occurrence, ligand-receptor, and the complete pipeline. Verified and ready to run on your own data.

Get it for $19 →

CellChat on Spatial Data: A Step-by-Step Tutorial (With Real Errors Included)

If you've already got a clustered Visium object from Seurat (like the one we built in our Squidpy vs Seurat comparison), the natural next question is: which cell types are actually talking to each other, and does that change with physical distance? That's exactly what CellChat's spatial mode answers — and unlike our earlier posts, this one comes with three real errors we hit along the way, not just the happy path.

What CellChat Does

CellChat infers, analyzes, and visualizes cell-cell communication networks from single-cell or spatial transcriptomics data, using a curated ligand-receptor interaction database. The spatial mode adds physical distance as a constraint — two cell types might both express a matching ligand-receptor pair, but if they're never actually near each other in the tissue, CellChat won't call that a likely real interaction.

We ran this on the exact same dataset from our Seurat post — the official stxBrain mouse brain Visium demo (2,696 spots, 15 clusters) — so this is a direct continuation, not a new dataset.

The Real Pipeline (With Real Errors)

library(Seurat)
library(SeuratData)
library(CellChat)

brain <- LoadData("stxBrain", type = "anterior1")
brain <- SCTransform(brain, assay = "Spatial", verbose = FALSE)
brain <- RunPCA(brain, assay = "SCT", verbose = FALSE)
brain <- FindNeighbors(brain, reduction = "pca", dims = 1:30, verbose = FALSE)
brain <- FindClusters(brain, verbose = FALSE)

data.input <- GetAssayData(brain, layer = "data", assay = "SCT")
meta <- data.frame(labels = factor(paste0("C", Idents(brain))), row.names = names(Idents(brain)))
spatial.locs <- as.matrix(GetTissueCoordinates(brain)[, c("x","y")])

cellchat <- createCellChat(object = data.input, meta = meta, group.by = "labels",
                            datatype = "spatial", coordinates = spatial.locs,
                            spatial.factors = data.frame(ratio = 1, tol = 5))
cellchat@DB <- CellChatDB.mouse
cellchat <- subsetData(cellchat)
cellchat <- identifyOverExpressedGenes(cellchat)
cellchat <- identifyOverExpressedInteractions(cellchat)

cellchat <- computeCommunProb(cellchat, type = "truncatedMean", trim = 0.1,
                               distance.use = TRUE, interaction.range = 250,
                               scale.distance = 0.01, contact.dependent = TRUE, contact.range = 100)
cellchat <- filterCommunication(cellchat, min.cells = 10)
cellchat <- aggregateNet(cellchat)

This looks clean here because we already fixed it. The first time through, we hit three separate real errors:

Error 1: GetAssayData() with slot is defunct

Using slot = "data" (which is what most older tutorials still show) throws a hard error in current SeuratObject — it was deprecated in 5.0.0 and is now fully removed. Fix: use layer = "data" instead. If you're following any Seurat tutorial written before 2024, check for this.

Error 2: CellChat rejects cluster label "0"

Error in setIdent(object, ident.use = group.by) :
  Cell labels cannot contain `0`!

Seurat's default cluster identities are numeric starting at 0 — completely normal and fine for Seurat itself. But CellChat's internal identity handling breaks if any cluster is literally named "0". Fix: prefix cluster labels with a letter before handing them to CellChat, e.g. factor(paste0("C", Idents(brain))) turns cluster 0 into "C0". Easy to miss because the error message doesn't make the cause obvious.

Error 3: missing presto dependency

CellChat's identifyOverExpressedGenes() wants the presto package for a faster Wilcoxon test implementation and throws an error (not just a warning) if it's missing — unlike many R packages that silently fall back. Fix: devtools::install_github('immunogenomics/presto') before running CellChat, or pass do.fast = FALSE if you don't want the extra dependency.

Real Results

On this dataset, the full pipeline (preprocessing + spatial-distance-constrained communication probability calculation) took ~360 seconds (6 minutes) on a standard machine — almost all of it in computeCommunProb(), which is the step that actually factors in spatial distance between every pair of spots.

  • 13,110 significant ligand-receptor interactions identified across the 15 spatial clusters
  • Top signaling pathways by interaction count: Glutamate (4,787), GABA-A (1,361), LAMININ (874), WNT (538), COLLAGEN (489), GABA-B (396)
  • This makes biological sense without any cherry-picking: glutamate and GABA are the two dominant excitatory/inhibitory neurotransmitter systems in brain tissue, so seeing them as the top communication signals in a mouse brain section is exactly what you'd expect from a sane result, not a red flag
CellChat circle plot showing cell-cell communication strength between all 15 clusters in mouse brain Visium tissue

Aggregated communication network across all signaling pathways — edge thickness reflects interaction strength between cluster pairs. Real output from the pipeline above.

CellChat circle plot showing Glutamate signaling specifically between clusters

Glutamate signaling specifically — the single largest signaling category in this dataset by interaction count.

Why the Spatial Constraint Matters

Without distance.use = TRUE, CellChat would just look at co-expression across the whole dataset, the same way it would for dissociated single-cell data — any cluster pair with matching ligand-receptor expression gets flagged, regardless of whether they're ever physically close in the tissue. With the spatial constraint on, interactions only count if the cell types are within the specified interaction.range (250 units here) or contact.range (100 units, for direct contact-dependent signaling). For a tissue like brain where spatial organization is functionally meaningful, this is the difference between a biologically plausible result and a list of statistically-matched-but-physically-impossible interactions.

Bottom Line

CellChat's spatial mode works, and the three errors above are exactly the kind of thing that wastes an afternoon if you don't know they're coming. If you're already running Seurat on Visium data, adding CellChat on top is a relatively small additional step — the real cost is the ~6 minute runtime for computeCommunProb() on a dataset this size, which will scale up on larger tissue sections.


Last updated: July 2026. Tested with CellChat (jinworks/CellChat, GitHub HEAD), Seurat 5.5.1, R 4.6.1, SeuratData stxBrain 0.1.2.

Get the complete pack

Squidpy Complete Analysis Pack — 10 Notebooks

All 10 notebooks from this series in one download — SVGs, neighborhood enrichment, co-occurrence, ligand-receptor, and the complete pipeline. Verified and ready to run on your own data.

Get it for $19 →

Spatio-DARLIN: The Paper That Maps Where Cells Have Been (Nature Methods, 2026)

Why this paper matters

If you work in spatial transcriptomics, you already know the field's biggest unsolved problem: spatial data tells you where a cell is, but not where it came from. Lineage tracing tells you ancestry, but classic methods strip away spatial context in the process. A paper published in Nature Methods on June 29, 2026 — "Spatio-DARLIN enables robust and efficient in situ lineage tracing in mice at single-cell resolution" (Gao, Zhang, Chen, Diao, Liu, Wang, Li — Westlake University) — tackles exactly this gap.

This post breaks down what Spatio-DARLIN actually does, in plain terms, and why it's worth paying attention to even if lineage tracing isn't your direct focus.

Diagram showing how Spatio-DARLIN combines spatial transcriptomics and DARLIN lineage barcoding, and the three key findings in gut, cortex/hippocampus, and hypothalamus

Original summary diagram of the Spatio-DARLIN approach and key findings, based on the published abstract — not a figure from the paper itself.

The core problem it solves

Spatially resolved lineage tracing is essential for understanding how clonal relationships shape tissue architecture — basically, which cells came from the same ancestor, and how that ancestry maps onto physical tissue structure. Until now, recovering this information in situ (directly in tissue, preserving spatial position) at single-cell resolution has been technically difficult.

Spatio-DARLIN combines spatial transcriptomics with DARLIN lineage-tracing mice (a CRISPR-based barcoding system used to record cell ancestry) to recover reliable lineage information directly from tissue sections.

What they actually found

According to the published abstract, the method recovers reliable clonal/lineage information from roughly 25–50% of cells across the organs and brain regions tested — a meaningful recovery rate for an in situ method, where signal loss is typically the limiting factor.

Specific findings reported in the paper:

  • Gut epithelium: the authors identified stereotyped clonal patterns — meaning related cells weren't randomly scattered, but organized in consistent, predictable spatial arrangements.
  • Brain (cortex and hippocampus): radial glia in these regions showed greater clonal expansion compared to other brain regions — i.e., these progenitor cell lineages produced more descendant cells that stayed spatially clustered.
  • Hypothalamus: the data suggest neural progenitor cells in this region were already spatially pre-determined by embryonic day E10 — implying that some positional "decisions" about where a cell's descendants will end up are made much earlier in development than previously appreciated for this region.

Why this matters beyond mouse brain studies

Even if you're not working on neurodevelopment specifically, this paper is a useful signal for the field:

  1. It pushes in situ lineage tracing closer to being a standard companion to spatial transcriptomics — not just a specialized niche technique. If recovery rates like this hold up across more tissue types, expect to see lineage information integrated into more standard spatial workflows over the next couple of years.
  2. It's a concrete demonstration that "where a cell ends up" and "where it came from" can both be captured from the same tissue section — which has been a major practical bottleneck for combining these two data types.
  3. For anyone designing spatial experiments in developmental biology, this is a relevant reference point for what recovery rates and clonal-pattern resolution are currently achievable with this class of method.

What we don't know yet

To be clear about the limits of what's in the public abstract: the ~25–50% recovery rate is described as occurring "across organs and brain" generally, but the abstract doesn't break down per-tissue recovery rates in detail, and we haven't read the full methods section ourselves. If you're considering using this approach, the full paper (DOI: 10.1038/s41592-026-03151-5) is the place to check protocol-level details before planning an experiment around it.

Bottom line

Spatio-DARLIN is a real, recently published (Nature Methods, June 2026) method combining spatial transcriptomics with CRISPR-based lineage barcoding to recover ancestry information directly in tissue, at meaningful recovery rates. If you're working anywhere near spatial transcriptomics + developmental biology, this is worth bookmarking — it's a strong sign that combined spatial + lineage workflows are becoming practically feasible rather than purely aspirational.


Source: Gao J, Zhang Z, Chen D, Diao S, Liu S, Wang SW, Li L. Spatio-DARLIN enables robust and efficient in situ lineage tracing in mice at single-cell resolution. Nat Methods. 2026 Jun 29. DOI: 10.1038/s41592-026-03151-5. Westlake Laboratory of Life Sciences and Biomedicine / School of Life Sciences, Westlake University, Hangzhou, China. Corresponding authors: Wang Shouwen, Li Li. No conflicts of interest declared by authors.

Get the complete pack

Squidpy Complete Analysis Pack — 10 Notebooks

All 10 notebooks from this series in one download — SVGs, neighborhood enrichment, co-occurrence, ligand-receptor, and the complete pipeline. Verified and ready to run on your own data.

Get it for $19 →

I Gave Claude Science One Prompt. It Ran a Full Spatial Analysis.

By Lociven · SpatiaBio · July 2, 2026 I gave Claude Science AI Workbench — Anthropic's new scientific analysis platform — a single...