SplisosmNP vs SplisosmFFT (Visium FFPE, probe)#
The relative small spot size (n_max = 4992) and regular hexagonal lattice of standard 10x Visium data make it an ideal test case for comparing the two SPLISOSM implementations: SplisosmNP (full-rank) and SplisosmFFT.
Class |
Input |
Spatial kernel |
|---|---|---|
|
|
Sparse CAR (k-NN graph) |
|
|
Block-circulant CAR (periodic) |
In this tutorial, we run spatial variability tests using both methods on a Visium v2 CytAssist FFPE dataset of adult mouse brain with probe-level quantification (Visium Mouse Transcriptome Probe Set v1.0), and verify that they yield concordant results.
Estimated runtime: 1 min.
Imports#
from __future__ import annotations
from pathlib import Path
import warnings
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import spearmanr
from splisosm import SplisosmNP, SplisosmFFT
from splisosm.io import load_visium_probe
warnings.filterwarnings('ignore', category=FutureWarning)
plt.rcParams['figure.dpi'] = 120
visium_outs = Path("/path/to/visium_ffpe/outs")
Load probe-level data from Space Ranger output#
After running the Space Ranger (v4.0) count pipeline, we use load_visium_probe to read the raw_probe_bc_matrix.h5 file from the output directory.
It optionally returns either an AnnData or a SpatialData. Both preserve probe-level var metadata (gene_ids, probe_ids, gene_name, etc.).
%%time
import warnings
with warnings.catch_warnings():
warnings.filterwarnings('ignore', category=UserWarning)
# AnnData mode — for SplisosmNP (default)
adata = load_visium_probe(visium_outs, return_type='anndata')
print(adata)
# SpatialData mode — for SplisosmFFT
sdata = load_visium_probe(visium_outs, return_type='spatialdata')
print(sdata)
AnnData object with n_obs × n_vars = 2310 × 21178
obs: 'filtered_barcodes', 'in_tissue', 'array_row', 'array_col'
var: 'gene_ids', 'probe_ids', 'feature_types', 'filtered_probes', 'gene_name', 'genome', 'probe_region'
uns: 'spatial'
obsm: 'spatial'
layers: 'counts'
SpatialData object
├── Images
│ ├── 'visium_ffpe_mouse_cbs_hires_image': DataArray[cyx] (3, 2000, 1872)
│ └── 'visium_ffpe_mouse_cbs_lowres_image': DataArray[cyx] (3, 600, 562)
├── Shapes
│ └── 'visium_ffpe_mouse_cbs': GeoDataFrame shape: (2310, 2) (2D shapes)
└── Tables
└── 'table': AnnData (2310, 21178)
with coordinate systems:
▸ 'visium_ffpe_mouse_cbs', with elements:
visium_ffpe_mouse_cbs_hires_image (Images), visium_ffpe_mouse_cbs_lowres_image (Images), visium_ffpe_mouse_cbs (Shapes)
▸ 'visium_ffpe_mouse_cbs_downscaled_hires', with elements:
visium_ffpe_mouse_cbs_hires_image (Images), visium_ffpe_mouse_cbs (Shapes)
▸ 'visium_ffpe_mouse_cbs_downscaled_lowres', with elements:
visium_ffpe_mouse_cbs_lowres_image (Images), visium_ffpe_mouse_cbs (Shapes)
CPU times: user 2.09 s, sys: 307 ms, total: 2.4 s
Wall time: 2.76 s
adata.var.head(3)
| gene_ids | probe_ids | feature_types | filtered_probes | gene_name | genome | probe_region | |
|---|---|---|---|---|---|---|---|
| Itgb2l|2ef1e7b | DEPRECATED_ENSMUSG00000000157 | DEPRECATED_ENSMUSG00000000157|Itgb2l|2ef1e7b | Gene Expression | False | DEPRECATED_ENSMUSG00000000157 | mm10 | none |
| Btn1a1|49a22c6 | DEPRECATED_ENSMUSG00000000706 | DEPRECATED_ENSMUSG00000000706|Btn1a1|49a22c6 | Gene Expression | False | DEPRECATED_ENSMUSG00000000706 | mm10 | none |
| Timp1|9dd7d08 | DEPRECATED_ENSMUSG00000001131 | DEPRECATED_ENSMUSG00000001131|Timp1|9dd7d08 | Gene Expression | False | DEPRECATED_ENSMUSG00000001131 | mm10 | none |
10x Visium spots are arranged in a hexagonal lattice, with grid indices array_row and array_col in adata.obs, and spatial coordinates of spots centroids in adata.obsm["spatial"].
array_row: the row index of the spot in the hexagonal grid (from 0 to 77)array_col: the column index of the spot in the hexagonal grid (from 0 to 127), even numbers for even rows and odd numbers for odd rowsobsm["spatial"]: the (x, y) coordinates of the spot (pxl_row_in_fullres,pxl_col_in_fullres) in the original image coordinate system (in pixels)
See 10x documentation for more details on the spatial metadata.
print(adata.obs[['array_row', 'array_col']].head(3))
print(adata.obsm['spatial'][:3,:])
array_row array_col
AACACTTGGCAAGGAA-1 47 71
AACAGGATTCATAGTT-1 49 43
AACAGGTTATTGCACC-1 28 86
[[16003.03063455 20105.36253564]
[21119.36714956 20734.51500256]
[13254.7453719 14068.95800173]]
In the following sections, we will use obsm["spatial"] to build the spatial graph for SplisosmNP, and use (array_row, array_col) to create a square grid for SplisosmFFT.
Running SplisosmNP for spatial variability in probe usage#
SplisosmNP works directly on AnnData. We use gene_ids to group probes by gene. After filtering, only 281 multi-probe genes were kept for spatial variability in probe usage.
%%time
model_np = SplisosmNP()
model_np.setup_data(
adata, layer='counts', group_iso_by='gene_ids', gene_names='gene_name',
spatial_key='spatial', min_counts=10, filter_single_iso_genes=True,
)
model_np
CPU times: user 1.11 s, sys: 37.4 ms, total: 1.15 s
Wall time: 307 ms
=== SplisosmNP
- Number of genes: 281
- Number of spots: 2310
- Number of covariates: 0
- Average isoforms per gene: 2.1
=== Model configurations
- Spatial kernel source: spatial_key='spatial'
- k_neighbors: 4, rho: 0.99
- Standardize spatial covariance: True
=== Test results
- Spatial variability: N/A
- Differential usage: N/A
Since the dataset is small, SV test will by default run in full-rank mode (no approximation) and will be fast.
%%time
# Run SV test to find genes with spatially variable probe usage
model_np.test_spatial_variability(
method='hsic-ir',
null_method='liu', # default
# null_configs={'n_probes': 60} # optional tuning for trace accuracy
)
results_np = model_np.get_formatted_test_results('sv', with_gene_summary=True)
SV [hsic-ir]: 100%|██████████| 19/19 [00:00<00:00, 7545.86it/s]
CPU times: user 1.09 s, sys: 247 ms, total: 1.34 s
Wall time: 1.03 s
print(f'SVP genes (FDR < 0.01): {(results_np["pvalue_adj"] < 0.01).sum()} / {len(results_np)}')
results_np.sort_values('pvalue_adj').head(5)
SVP genes (FDR < 0.01): 60 / 281
| gene | statistic | pvalue | pvalue_adj | n_isos | perplexity | pct_bin_on | count_avg | count_std | major_ratio_avg | |
|---|---|---|---|---|---|---|---|---|---|---|
| 219 | Kalrn | 0.000483 | 8.580020e-170 | 2.410985e-167 | 2 | 1.840045 | 0.957576 | 12.988312 | 12.532429 | 0.701263 |
| 66 | Gabbr1 | 0.000233 | 1.709264e-128 | 2.401516e-126 | 2 | 1.999731 | 0.991775 | 17.642424 | 14.411285 | 0.508196 |
| 132 | Map4 | 0.000320 | 4.906295e-122 | 4.595563e-120 | 2 | 1.934026 | 0.988312 | 9.772294 | 7.393756 | 0.628776 |
| 51 | Oxr1 | 0.000457 | 4.228255e-104 | 2.970349e-102 | 2 | 1.988226 | 0.898268 | 4.601732 | 3.779965 | 0.554280 |
| 87 | Gnas | 0.000014 | 3.554604e-102 | 1.997687e-100 | 5 | 1.411140 | 1.000000 | 48.459307 | 35.816461 | 0.917251 |
Let’s visualize the top significant spatially variably processed (SVP) genes.
%%time
import scanpy as sc
from splisosm.utils import add_ratio_layer
# Compute log1p counts
adata.layers['log1p'] = np.log1p(adata.layers['counts'])
# Add observed ratios to the anndata
add_ratio_layer(adata, layer='counts', group_iso_by='gene_ids', ratio_layer_key='ratios_obs')
# Spatially variably processed: Kalrn
_gene = "Kalrn"
_probes = adata.var.query(f'gene_name == @_gene').index.tolist()
_titles_log1p = [f"{name} (log1p)" for name in _probes]
_titles_ratio = [f"{name} (ratio)" for name in _probes]
with plt.rc_context({"figure.figsize": (3, 3)}):
sc.pl.spatial(
adata, color=_probes, title=_titles_log1p,
img_key=None, layer='log1p', ncols = len(_probes)
)
sc.pl.spatial(
adata, color=_probes, title=_titles_ratio,
img_key=None, layer='ratios_obs', ncols = len(_probes)
)
CPU times: user 683 ms, sys: 54.2 ms, total: 738 ms
Wall time: 753 ms
Running SplisosmFFT for spatial variability in probe usage#
Standard Visium uses a staggered hexagonal grid (col step = 2). When using the (array_row, array_col) indices for rasterization in spatialdata.rasterize_bins,
not indexed positions (i.e., odd number indices for even rows and even number indices for odd rows) are automatically filled with zeros (col step = 1).
%%time
from spatialdata import rasterize_bins
# Rasterize Visium spots into a rectangular grid
sdata.tables['table'].X = sdata.tables['table'].X.tocsc()
sdata['rasterized'] = rasterize_bins(
sdata,
bins='visium_ffpe_mouse_cbs',
table_name='table',
col_key='array_col',
row_key='array_row'
)
CPU times: user 176 ms, sys: 11.8 ms, total: 188 ms
Wall time: 189 ms
_, ny, nx = sdata['rasterized'].shape
n_spots = sdata.tables['table'].shape[0]
print(f'Raster grid: {ny} x {nx} = {ny * nx} cells')
print(f'Observed spots: {n_spots}')
print(f'Occupancy: {n_spots / (ny * nx) * 100:.1f}%')
Raster grid: 64 x 93 = 5952 cells
Observed spots: 2310
Occupancy: 38.8%
The resulting grid look like a checkerboard.
%%time
import spatialdata_plot
# Visualize a selected gene
_gene = 'Kalrn'
_probes = adata.var.query('gene_name == @_gene').index.tolist()
fig, axes = plt.subplots(1, len(_probes), figsize=(4*len(_probes), 4))
for i, probe in enumerate(_probes):
sdata.pl.render_images(f"rasterized", channel=probe).pl.show(
coordinate_systems=f"visium_ffpe_mouse_cbs_downscaled_lowres",
ax = axes[i],
title=f"Counts (rasterized): {probe}"
)
plt.subplots_adjust(wspace=0.7)
plt.show()
CPU times: user 314 ms, sys: 31.7 ms, total: 345 ms
Wall time: 428 ms
The empty cells increase the grid size by 2×, change spatial connectivity (the nearest neighbors for all spots are now empty bins) and also the physical scale of rasterized pixels (the new y-axis unit step is \(\sqrt{3}\) of the x-axis unit).
To reflect these changes, when running SplisosmFFT, we adjust for the following parameters:
neighbor_degreeto include the second-nearest neighbors in the hexagonal latticespacingto compensate for ny (array_row) spacing as compared to nx (array_col) spacing (\(\frac{1}{\sqrt{3}} \rightarrow 1\)) in the rasterized grid.
%%time
model_fft = SplisosmFFT(
rho=0.99,
neighbor_degree=2, # original nearest neighbors become second nearest due to padding
spacing=(np.sqrt(3), 1.0), # x-axis (array_col) is stretched in the rasterized image
)
model_fft.setup_data(
sdata, bins='visium_ffpe_mouse_cbs', table_name='table',
col_key='array_col', row_key='array_row', layer='counts',
group_iso_by='gene_ids', gene_names='gene_name',
min_counts=10, filter_single_iso_genes=True,
)
model_fft
CPU times: user 2.21 s, sys: 27.1 ms, total: 2.24 s
Wall time: 2.24 s
=== SplisosmFFT
- Number of genes: 281
- Number of spots: 2310
- Number of spots after rasterization: 5952
- Number of covariates: 0
- Average isoforms per gene: 2.1
=== Model configurations
- Neighborhood degree: 2
- Spatial autocorrelation rho: 0.99
=== Test results
- Spatial variability: N/A
- Differential usage: N/A
%%time
model_fft.test_spatial_variability(method='hsic-ir')
results_fft = model_fft.get_formatted_test_results('sv', with_gene_summary=True)
SV [hsic-ir]: 100%|██████████| 19/19 [00:00<00:00, 13599.28it/s]
CPU times: user 381 ms, sys: 189 ms, total: 569 ms
Wall time: 385 ms
print(f'SVP genes (FDR < 0.01): {(results_fft["pvalue_adj"] < 0.01).sum()} / {len(results_fft)}')
results_fft.sort_values('pvalue_adj').head(5)
SVP genes (FDR < 0.01): 74 / 281
| gene | statistic | pvalue | pvalue_adj | n_isos | perplexity | pct_bin_on | count_avg | count_std | major_ratio_avg | |
|---|---|---|---|---|---|---|---|---|---|---|
| 219 | Kalrn | 0.000096 | 4.989955e-198 | 1.402177e-195 | 2 | 1.840045 | 0.957576 | 12.988312 | 12.532429 | 0.701263 |
| 132 | Map4 | 0.000080 | 6.494533e-190 | 9.124818e-188 | 2 | 1.934026 | 0.988312 | 9.772294 | 7.393756 | 0.628776 |
| 66 | Gabbr1 | 0.000050 | 9.687196e-162 | 9.073674e-160 | 2 | 1.999731 | 0.991775 | 17.642424 | 14.411285 | 0.508196 |
| 87 | Gnas | 0.000003 | 6.290959e-127 | 4.419399e-125 | 5 | 1.411140 | 1.000000 | 48.459307 | 35.816461 | 0.917251 |
| 51 | Oxr1 | 0.000098 | 5.022497e-126 | 2.822643e-124 | 2 | 1.988226 | 0.898268 | 4.601732 | 3.779965 | 0.554280 |
Comparison: SplisosmNP vs SplisosmFFT#
Overall, the two approaches produce highly concordant test statistics and p-values.
r_np = results_np.set_index('gene')
r_fft = results_fft.set_index('gene')
common = r_np.index.intersection(r_fft.index)
df = pd.DataFrame({
'stat_np': r_np.loc[common, 'statistic'],
'stat_fft': r_fft.loc[common, 'statistic'],
'padj_np': r_np.loc[common, 'pvalue_adj'],
'padj_fft': r_fft.loc[common, 'pvalue_adj'],
})
rho_stat, _ = spearmanr(df['stat_np'], df['stat_fft'])
rho_pval, _ = spearmanr(df['padj_np'], df['padj_fft'])
sig_np = set(df[df['padj_np'] < 0.01].index)
sig_fft = set(df[df['padj_fft'] < 0.01].index)
overlap = sig_np & sig_fft
print(f'Common genes tested: {len(common)}')
print(f'Spearman rho (stat): {rho_stat:.4f}')
print(f'Spearman rho (pval-adj): {rho_pval:.4f}')
print(f'SVP genes — NP: {len(sig_np)}')
print(f'SVP genes — FFT: {len(sig_fft)}')
print(f'Overlap: {len(overlap)}')
Common genes tested: 281
Spearman rho (stat): 0.9969
Spearman rho (pval-adj): 0.9616
SVP genes — NP: 60
SVP genes — FFT: 74
Overlap: 59
fig, axes = plt.subplots(1, 2, figsize=(8, 4))
# Test statistic
ax = axes[0]
ax.scatter(df['stat_np'], df['stat_fft'], s=8, alpha=0.4, linewidths=0)
lim = max(df['stat_np'].max(), df['stat_fft'].max()) * 1.05
ax.plot([0, lim], [0, lim], 'k--', lw=0.8, alpha=0.5)
ax.set_xlabel('HSIC-IR statistic (SplisosmNP)')
ax.set_ylabel('HSIC-IR statistic (SplisosmFFT)')
ax.set_title(f'Test statistic (Spearman $\\rho$ = {rho_stat:.3f})')
# P-value
ax = axes[1]; eps = 1e-300
x = -np.log10(df['padj_np'].clip(lower=eps))
y = -np.log10(df['padj_fft'].clip(lower=eps))
ax.scatter(x, y, s=8, alpha=0.4, linewidths=0)
lim = max(x.max(), y.max()) * 1.05
ax.plot([0, lim], [0, lim], 'k--', lw=0.8, alpha=0.5)
ax.set_xlabel(r'$-\log_{10}$(adj. p-value) — SplisosmNP')
ax.set_ylabel(r'$-\log_{10}$(adj. p-value) — SplisosmFFT')
ax.set_title(f'Adjusted p-values (Spearman $\\rho$ = {rho_pval:.3f})')
fig.tight_layout(); plt.show()
Differences in the test statistics can be attributed to sample size differences (4992 spots vs 16384 grid cells) and kernel differences. Both classes use a CAR kernel \(K = (I - \rho W)^{-1}\), but normalize \(W\) differently:
SplisosmNP: symmetric degree normalization \(D^{-1/2} W D^{-1/2}\)
SplisosmFFT: global scalar normalization \(W / \text{degree}\)
On a regular grid these are equivalent. On a hex grid embedded in a rectangular raster, the zero-padded cells create degree heterogeneity, causing the kernel eigenvalue spectrum to differ. The key diagnostic is the kernel trace per grid cell \(\mathrm{tr}(K)/n\).
tr_np = model_np.sp_kernel.trace().item()
tr_fft = model_fft.sp_kernel.trace()
print(f"{'Method':20s} {'n':>6s} {'tr(K)':>10s} {'tr(K)/n':>10s}")
print(f"{'NP (k-NN)':20s} {model_np.n_spots:6d} {tr_np:10.1f} {tr_np/model_np.n_spots:10.4f}")
print(f"{'FFT (raw hex)':20s} {model_fft.n_grid:6d} {tr_fft:10.1f} {tr_fft/model_fft.n_grid:10.4f}")
print()
ratio = df['stat_fft'] / df['stat_np'].clip(lower=1e-15)
print(f'Statistic ratio (FFT/NP): median={ratio.median():.2f}, mean={ratio.mean():.2f}')
Method n tr(K) tr(K)/n
NP (k-NN) 2310 2274.9 0.9848
FFT (raw hex) 5952 13146.6 2.2088
Statistic ratio (FFT/NP): median=0.33, mean=0.32
Finally, let’s compare the speed and memory usage of the two approaches. Since SplisosmFFT’s rasterization step is performed on the fly via spatialdata.rasterize_bins, it is expected to be slower than SplisosmNP on this dataset. Also, the larger grid (5952 vs 2310) created after rasterization makes the memory improvement not as significant. However, the \(O(n\log n)\) time and \(O(n)\) memory FFT complexities will pay off the overhead on larger datasets (e.g., 1M+ spots).
import time, tracemalloc
def benchmark(label, setup_fn, test_fn, n_runs=3):
tracemalloc.start()
t0 = time.perf_counter(); m = setup_fn(); t_setup = time.perf_counter() - t0
times = []
for _ in range(n_runs):
t0 = time.perf_counter(); test_fn(m); times.append(time.perf_counter() - t0)
_, peak = tracemalloc.get_traced_memory(); tracemalloc.stop()
print(f'{label:15s} setup={t_setup:.2f}s test={min(times):.2f}s '
f'total={t_setup+min(times):.2f}s peak_mem={peak/1e6:.1f}MB')
def np_setup():
m = SplisosmNP()
m.setup_data(adata, layer='counts', group_iso_by='gene_ids', gene_names='gene_name',
spatial_key='spatial', min_counts=10, filter_single_iso_genes=True)
return m
def fft_setup():
m = SplisosmFFT(
rho=0.99,
neighbor_degree=2, # original nearest neighbors become second nearest due to padding
spacing=(np.sqrt(3), 1.0), # x-axis (array_col) is stretched in the rasterized image
)
m.setup_data(
sdata, bins='visium_ffpe_mouse_cbs', table_name='table',
col_key='array_col', row_key='array_row', layer='counts',
group_iso_by='gene_ids', gene_names='gene_name',
min_counts=10, filter_single_iso_genes=True,
)
return m
benchmark('SplisosmNP', np_setup, lambda m: m.test_spatial_variability(method='hsic-ir'))
benchmark('SplisosmFFT', fft_setup, lambda m: m.test_spatial_variability(method='hsic-ir'))
SV [hsic-ir]: 100%|██████████| 19/19 [00:00<00:00, 8870.41it/s]
SV [hsic-ir]: 100%|██████████| 19/19 [00:00<00:00, 9581.79it/s]
SV [hsic-ir]: 100%|██████████| 19/19 [00:00<00:00, 9718.51it/s]
SplisosmNP setup=0.38s test=0.44s total=0.83s peak_mem=301.6MB
SV [hsic-ir]: 100%|██████████| 19/19 [00:00<00:00, 9793.75it/s]
SV [hsic-ir]: 100%|██████████| 19/19 [00:00<00:00, 10439.06it/s]
SV [hsic-ir]: 100%|██████████| 19/19 [00:00<00:00, 9962.72it/s]
SplisosmFFT setup=8.44s test=1.59s total=10.03s peak_mem=206.2MB
Summary#
SplisosmNPoperates on the native hexagonal spot geometry via a k-NN graph and is the recommended default for standard-resolution Visium, which is already fast.SplisosmFFTcan be applied to standard Visium using theSpatialDatainterface, but requires careful adjustment of theneighbor_degreeandspacingparameters to account for the zero-padded rasterization. SV test results are highly concordant withSplisosmNP.
For reproducibility#
import sys
from datetime import date
import splisosm
print("Last updated:", date.today())
print("Python:", sys.version.split()[0])
print("splisosm:", getattr(splisosm, "__version__", "unknown"))
Last updated: 2026-04-27
Python: 3.12.13
splisosm: 1.2.0rc2.dev4+g83955c42e.d20260427