SplisosmNP vs SplisosmFFT (Visium FFPE, probe)

SplisosmNP vs SplisosmFFT (Visium FFPE, probe)#

The relative small spot size (n_max = 4992) and regular hexagonal lattice of standard 10x Visium data make it an ideal test case for comparing the two SPLISOSM implementations: SplisosmNP (full-rank) and SplisosmFFT.

Class	Input	Spatial kernel
`SplisosmNP`	`AnnData` with `obsm["spatial"]`	Sparse CAR (k-NN graph)
`SplisosmFFT`	`SpatialData` with rasterized grid (hexagonal to square)	Block-circulant CAR (periodic)

In this tutorial, we run spatial variability tests using both methods on a Visium v2 CytAssist FFPE dataset of adult mouse brain with probe-level quantification (Visium Mouse Transcriptome Probe Set v1.0), and verify that they yield concordant results.

Estimated runtime: 1 min.

Imports#

from __future__ import annotations

from pathlib import Path
import warnings

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import spearmanr

from splisosm import SplisosmNP, SplisosmFFT
from splisosm.io import load_visium_probe

warnings.filterwarnings('ignore', category=FutureWarning)
plt.rcParams['figure.dpi'] = 120

visium_outs = Path("/path/to/visium_ffpe/outs")

Load probe-level data from Space Ranger output#

After running the Space Ranger (v4.0) count pipeline, we use load_visium_probe to read the raw_probe_bc_matrix.h5 file from the output directory. It optionally returns either an AnnData or a SpatialData. Both preserve probe-level var metadata (gene_ids, probe_ids, gene_name, etc.).

%%time
import warnings
with warnings.catch_warnings():
    warnings.filterwarnings('ignore', category=UserWarning)
    # AnnData mode — for SplisosmNP (default)
    adata = load_visium_probe(visium_outs, return_type='anndata')
    print(adata)

    # SpatialData mode — for SplisosmFFT
    sdata = load_visium_probe(visium_outs, return_type='spatialdata')
    print(sdata)

AnnData object with n_obs × n_vars = 2310 × 21178
    obs: 'filtered_barcodes', 'in_tissue', 'array_row', 'array_col'
    var: 'gene_ids', 'probe_ids', 'feature_types', 'filtered_probes', 'gene_name', 'genome', 'probe_region'
    uns: 'spatial'
    obsm: 'spatial'
    layers: 'counts'
SpatialData object
├── Images
│     ├── 'visium_ffpe_mouse_cbs_hires_image': DataArray[cyx] (3, 2000, 1872)
│     └── 'visium_ffpe_mouse_cbs_lowres_image': DataArray[cyx] (3, 600, 562)
├── Shapes
│     └── 'visium_ffpe_mouse_cbs': GeoDataFrame shape: (2310, 2) (2D shapes)
└── Tables
      └── 'table': AnnData (2310, 21178)
with coordinate systems:
    ▸ 'visium_ffpe_mouse_cbs', with elements:
        visium_ffpe_mouse_cbs_hires_image (Images), visium_ffpe_mouse_cbs_lowres_image (Images), visium_ffpe_mouse_cbs (Shapes)
    ▸ 'visium_ffpe_mouse_cbs_downscaled_hires', with elements:
        visium_ffpe_mouse_cbs_hires_image (Images), visium_ffpe_mouse_cbs (Shapes)
    ▸ 'visium_ffpe_mouse_cbs_downscaled_lowres', with elements:
        visium_ffpe_mouse_cbs_lowres_image (Images), visium_ffpe_mouse_cbs (Shapes)
CPU times: user 2.09 s, sys: 307 ms, total: 2.4 s
Wall time: 2.76 s

adata.var.head(3)

	gene_ids	probe_ids	feature_types	filtered_probes	gene_name	genome	probe_region
Itgb2l\|2ef1e7b	DEPRECATED_ENSMUSG00000000157	DEPRECATED_ENSMUSG00000000157\|Itgb2l\|2ef1e7b	Gene Expression	False	DEPRECATED_ENSMUSG00000000157	mm10	none
Btn1a1\|49a22c6	DEPRECATED_ENSMUSG00000000706	DEPRECATED_ENSMUSG00000000706\|Btn1a1\|49a22c6	Gene Expression	False	DEPRECATED_ENSMUSG00000000706	mm10	none
Timp1\|9dd7d08	DEPRECATED_ENSMUSG00000001131	DEPRECATED_ENSMUSG00000001131\|Timp1\|9dd7d08	Gene Expression	False	DEPRECATED_ENSMUSG00000001131	mm10	none

10x Visium spots are arranged in a hexagonal lattice, with grid indices array_row and array_col in adata.obs, and spatial coordinates of spots centroids in adata.obsm["spatial"].

array_row: the row index of the spot in the hexagonal grid (from 0 to 77)
array_col: the column index of the spot in the hexagonal grid (from 0 to 127), even numbers for even rows and odd numbers for odd rows
obsm["spatial"]: the (x, y) coordinates of the spot (pxl_row_in_fullres, pxl_col_in_fullres) in the original image coordinate system (in pixels)

See 10x documentation for more details on the spatial metadata.

print(adata.obs[['array_row', 'array_col']].head(3))
print(adata.obsm['spatial'][:3,:])

                    array_row  array_col
AACACTTGGCAAGGAA-1         47         71
AACAGGATTCATAGTT-1         49         43
AACAGGTTATTGCACC-1         28         86
[[16003.03063455 20105.36253564]
 [21119.36714956 20734.51500256]
 [13254.7453719  14068.95800173]]

In the following sections, we will use obsm["spatial"] to build the spatial graph for SplisosmNP, and use (array_row, array_col) to create a square grid for SplisosmFFT.

Running `SplisosmNP` for spatial variability in probe usage#

SplisosmNP works directly on AnnData. We use gene_ids to group probes by gene. After filtering, only 281 multi-probe genes were kept for spatial variability in probe usage.

%%time
model_np = SplisosmNP()
model_np.setup_data(
    adata, layer='counts', group_iso_by='gene_ids', gene_names='gene_name',
    spatial_key='spatial', min_counts=10, filter_single_iso_genes=True,
)
model_np

CPU times: user 1.11 s, sys: 37.4 ms, total: 1.15 s
Wall time: 307 ms

=== SplisosmNP
- Number of genes: 281
- Number of spots: 2310
- Number of covariates: 0
- Average isoforms per gene: 2.1
=== Model configurations
- Spatial kernel source: spatial_key='spatial'
- k_neighbors: 4, rho: 0.99
- Standardize spatial covariance: True
=== Test results
- Spatial variability: N/A
- Differential usage: N/A

Since the dataset is small, SV test will by default run in full-rank mode (no approximation) and will be fast.

%%time
# Run SV test to find genes with spatially variable probe usage
model_np.test_spatial_variability(
    method='hsic-ir', 
    null_method='liu', # default
    # null_configs={'n_probes': 60}  # optional tuning for trace accuracy
)
results_np = model_np.get_formatted_test_results('sv', with_gene_summary=True)

SV [hsic-ir]: 100%|██████████| 19/19 [00:00<00:00, 7545.86it/s]

CPU times: user 1.09 s, sys: 247 ms, total: 1.34 s
Wall time: 1.03 s

print(f'SVP genes (FDR < 0.01): {(results_np["pvalue_adj"] < 0.01).sum()} / {len(results_np)}')
results_np.sort_values('pvalue_adj').head(5)

SVP genes (FDR < 0.01): 60 / 281

	gene	statistic	pvalue	pvalue_adj	n_isos	perplexity	pct_bin_on	count_avg	count_std	major_ratio_avg
219	Kalrn	0.000483	8.580020e-170	2.410985e-167	2	1.840045	0.957576	12.988312	12.532429	0.701263
66	Gabbr1	0.000233	1.709264e-128	2.401516e-126	2	1.999731	0.991775	17.642424	14.411285	0.508196
132	Map4	0.000320	4.906295e-122	4.595563e-120	2	1.934026	0.988312	9.772294	7.393756	0.628776
51	Oxr1	0.000457	4.228255e-104	2.970349e-102	2	1.988226	0.898268	4.601732	3.779965	0.554280
87	Gnas	0.000014	3.554604e-102	1.997687e-100	5	1.411140	1.000000	48.459307	35.816461	0.917251

Let’s visualize the top significant spatially variably processed (SVP) genes.

%%time
import scanpy as sc
from splisosm.utils import add_ratio_layer

# Compute log1p counts
adata.layers['log1p'] = np.log1p(adata.layers['counts'])

# Add observed ratios to the anndata
add_ratio_layer(adata, layer='counts', group_iso_by='gene_ids', ratio_layer_key='ratios_obs')

# Spatially variably processed: Kalrn
_gene = "Kalrn"
_probes = adata.var.query(f'gene_name == @_gene').index.tolist()
_titles_log1p = [f"{name} (log1p)" for name in _probes]
_titles_ratio = [f"{name} (ratio)" for name in _probes]
with plt.rc_context({"figure.figsize": (3, 3)}):
    sc.pl.spatial(
        adata, color=_probes, title=_titles_log1p,
        img_key=None, layer='log1p', ncols = len(_probes)
    )
    sc.pl.spatial(
        adata, color=_probes, title=_titles_ratio,
        img_key=None, layer='ratios_obs', ncols = len(_probes)
    )

../_images/0c3bde10cf0fcd59ad95c15a7f0546fdfd39f1a10aa7e3e37d79cfc4c61e1da4.png

../_images/e74617371d68ace1b4134372d8b659773fa97eb809c79cca80381c198e3920ca.png

CPU times: user 683 ms, sys: 54.2 ms, total: 738 ms
Wall time: 753 ms

Running `SplisosmFFT` for spatial variability in probe usage#

Standard Visium uses a staggered hexagonal grid (col step = 2). When using the (array_row, array_col) indices for rasterization in spatialdata.rasterize_bins, not indexed positions (i.e., odd number indices for even rows and even number indices for odd rows) are automatically filled with zeros (col step = 1).

%%time
from spatialdata import rasterize_bins
# Rasterize Visium spots into a rectangular grid
sdata.tables['table'].X = sdata.tables['table'].X.tocsc()
sdata['rasterized'] = rasterize_bins(
    sdata, 
    bins='visium_ffpe_mouse_cbs',
    table_name='table',
    col_key='array_col', 
    row_key='array_row'
)

CPU times: user 176 ms, sys: 11.8 ms, total: 188 ms
Wall time: 189 ms

_, ny, nx = sdata['rasterized'].shape
n_spots = sdata.tables['table'].shape[0]
print(f'Raster grid: {ny} x {nx} = {ny * nx} cells')
print(f'Observed spots: {n_spots}')
print(f'Occupancy: {n_spots / (ny * nx) * 100:.1f}%')

Raster grid: 64 x 93 = 5952 cells
Observed spots: 2310
Occupancy: 38.8%

The resulting grid look like a checkerboard.

%%time
import spatialdata_plot

# Visualize a selected gene
_gene = 'Kalrn'
_probes = adata.var.query('gene_name == @_gene').index.tolist()
fig, axes = plt.subplots(1, len(_probes), figsize=(4*len(_probes), 4))
for i, probe in enumerate(_probes):
    sdata.pl.render_images(f"rasterized", channel=probe).pl.show(
        coordinate_systems=f"visium_ffpe_mouse_cbs_downscaled_lowres",
        ax = axes[i],
        title=f"Counts (rasterized): {probe}"
    )

plt.subplots_adjust(wspace=0.7)
plt.show()

../_images/50f6e61793fcf5034433909efc42965ad03a1749e34a082653ec215e2045247b.png

CPU times: user 314 ms, sys: 31.7 ms, total: 345 ms
Wall time: 428 ms

The empty cells increase the grid size by 2×, change spatial connectivity (the nearest neighbors for all spots are now empty bins) and also the physical scale of rasterized pixels (the new y-axis unit step is \(\sqrt{3}\) of the x-axis unit). To reflect these changes, when running SplisosmFFT, we adjust for the following parameters:

neighbor_degree to include the second-nearest neighbors in the hexagonal lattice
spacing to compensate for ny (array_row) spacing as compared to nx (array_col) spacing (\(\frac{1}{\sqrt{3}} \rightarrow 1\)) in the rasterized grid.

%%time
model_fft = SplisosmFFT(
    rho=0.99, 
    neighbor_degree=2, # original nearest neighbors become second nearest due to padding
    spacing=(np.sqrt(3), 1.0),  # x-axis (array_col) is stretched in the rasterized image
)
model_fft.setup_data(
    sdata, bins='visium_ffpe_mouse_cbs', table_name='table',
    col_key='array_col', row_key='array_row', layer='counts',
    group_iso_by='gene_ids', gene_names='gene_name',
    min_counts=10, filter_single_iso_genes=True,
)
model_fft

CPU times: user 2.21 s, sys: 27.1 ms, total: 2.24 s
Wall time: 2.24 s

=== SplisosmFFT
- Number of genes: 281
- Number of spots: 2310
- Number of spots after rasterization: 5952
- Number of covariates: 0
- Average isoforms per gene: 2.1
=== Model configurations
- Neighborhood degree: 2
- Spatial autocorrelation rho: 0.99
=== Test results
- Spatial variability: N/A
- Differential usage: N/A

%%time
model_fft.test_spatial_variability(method='hsic-ir')
results_fft = model_fft.get_formatted_test_results('sv', with_gene_summary=True)

SV [hsic-ir]: 100%|██████████| 19/19 [00:00<00:00, 13599.28it/s]

CPU times: user 381 ms, sys: 189 ms, total: 569 ms
Wall time: 385 ms

print(f'SVP genes (FDR < 0.01): {(results_fft["pvalue_adj"] < 0.01).sum()} / {len(results_fft)}')
results_fft.sort_values('pvalue_adj').head(5)

SVP genes (FDR < 0.01): 74 / 281

	gene	statistic	pvalue	pvalue_adj	n_isos	perplexity	pct_bin_on	count_avg	count_std	major_ratio_avg
219	Kalrn	0.000096	4.989955e-198	1.402177e-195	2	1.840045	0.957576	12.988312	12.532429	0.701263
132	Map4	0.000080	6.494533e-190	9.124818e-188	2	1.934026	0.988312	9.772294	7.393756	0.628776
66	Gabbr1	0.000050	9.687196e-162	9.073674e-160	2	1.999731	0.991775	17.642424	14.411285	0.508196
87	Gnas	0.000003	6.290959e-127	4.419399e-125	5	1.411140	1.000000	48.459307	35.816461	0.917251
51	Oxr1	0.000098	5.022497e-126	2.822643e-124	2	1.988226	0.898268	4.601732	3.779965	0.554280

Comparison: SplisosmNP vs SplisosmFFT#

Overall, the two approaches produce highly concordant test statistics and p-values.

r_np = results_np.set_index('gene')
r_fft = results_fft.set_index('gene')
common = r_np.index.intersection(r_fft.index)
df = pd.DataFrame({
    'stat_np': r_np.loc[common, 'statistic'],
    'stat_fft': r_fft.loc[common, 'statistic'],
    'padj_np': r_np.loc[common, 'pvalue_adj'],
    'padj_fft': r_fft.loc[common, 'pvalue_adj'],
})
rho_stat, _ = spearmanr(df['stat_np'], df['stat_fft'])
rho_pval, _ = spearmanr(df['padj_np'], df['padj_fft'])
sig_np = set(df[df['padj_np'] < 0.01].index)
sig_fft = set(df[df['padj_fft'] < 0.01].index)
overlap = sig_np & sig_fft
print(f'Common genes tested:     {len(common)}')
print(f'Spearman rho (stat):     {rho_stat:.4f}')
print(f'Spearman rho (pval-adj):    {rho_pval:.4f}')
print(f'SVP genes — NP:          {len(sig_np)}')
print(f'SVP genes — FFT:         {len(sig_fft)}')
print(f'Overlap:                 {len(overlap)}')

Common genes tested:     281
Spearman rho (stat):     0.9969
Spearman rho (pval-adj):    0.9616
SVP genes — NP:          60
SVP genes — FFT:         74
Overlap:                 59

fig, axes = plt.subplots(1, 2, figsize=(8, 4))

# Test statistic
ax = axes[0]
ax.scatter(df['stat_np'], df['stat_fft'], s=8, alpha=0.4, linewidths=0)
lim = max(df['stat_np'].max(), df['stat_fft'].max()) * 1.05
ax.plot([0, lim], [0, lim], 'k--', lw=0.8, alpha=0.5)
ax.set_xlabel('HSIC-IR statistic (SplisosmNP)')
ax.set_ylabel('HSIC-IR statistic (SplisosmFFT)')
ax.set_title(f'Test statistic (Spearman $\\rho$ = {rho_stat:.3f})')

# P-value
ax = axes[1]; eps = 1e-300
x = -np.log10(df['padj_np'].clip(lower=eps))
y = -np.log10(df['padj_fft'].clip(lower=eps))
ax.scatter(x, y, s=8, alpha=0.4, linewidths=0)
lim = max(x.max(), y.max()) * 1.05
ax.plot([0, lim], [0, lim], 'k--', lw=0.8, alpha=0.5)
ax.set_xlabel(r'$-\log_{10}$(adj. p-value) — SplisosmNP')
ax.set_ylabel(r'$-\log_{10}$(adj. p-value) — SplisosmFFT')
ax.set_title(f'Adjusted p-values (Spearman $\\rho$ = {rho_pval:.3f})')
fig.tight_layout(); plt.show()

../_images/3ec1942cc3bed56d89a5df51f49ce4523451fe691932c130d004295f1d0f5a0a.png

Differences in the test statistics can be attributed to sample size differences (4992 spots vs 16384 grid cells) and kernel differences. Both classes use a CAR kernel \(K = (I - \rho W)^{-1}\), but normalize \(W\) differently:

SplisosmNP: symmetric degree normalization \(D^{-1/2} W D^{-1/2}\)
SplisosmFFT: global scalar normalization \(W / \text{degree}\)

On a regular grid these are equivalent. On a hex grid embedded in a rectangular raster, the zero-padded cells create degree heterogeneity, causing the kernel eigenvalue spectrum to differ. The key diagnostic is the kernel trace per grid cell \(\mathrm{tr}(K)/n\).

tr_np = model_np.sp_kernel.trace().item()
tr_fft = model_fft.sp_kernel.trace()
print(f"{'Method':20s} {'n':>6s} {'tr(K)':>10s} {'tr(K)/n':>10s}")
print(f"{'NP (k-NN)':20s} {model_np.n_spots:6d} {tr_np:10.1f} {tr_np/model_np.n_spots:10.4f}")
print(f"{'FFT (raw hex)':20s} {model_fft.n_grid:6d} {tr_fft:10.1f} {tr_fft/model_fft.n_grid:10.4f}")
print()
ratio = df['stat_fft'] / df['stat_np'].clip(lower=1e-15)
print(f'Statistic ratio (FFT/NP): median={ratio.median():.2f}, mean={ratio.mean():.2f}')

Method                    n      tr(K)    tr(K)/n
NP (k-NN)              2310     2274.9     0.9848
FFT (raw hex)          5952    13146.6     2.2088

Statistic ratio (FFT/NP): median=0.33, mean=0.32

Finally, let’s compare the speed and memory usage of the two approaches. Since SplisosmFFT’s rasterization step is performed on the fly via spatialdata.rasterize_bins, it is expected to be slower than SplisosmNP on this dataset. Also, the larger grid (5952 vs 2310) created after rasterization makes the memory improvement not as significant. However, the \(O(n\log n)\) time and \(O(n)\) memory FFT complexities will pay off the overhead on larger datasets (e.g., 1M+ spots).

import time, tracemalloc
def benchmark(label, setup_fn, test_fn, n_runs=3):
    tracemalloc.start()
    t0 = time.perf_counter(); m = setup_fn(); t_setup = time.perf_counter() - t0
    times = []
    for _ in range(n_runs):
        t0 = time.perf_counter(); test_fn(m); times.append(time.perf_counter() - t0)
    _, peak = tracemalloc.get_traced_memory(); tracemalloc.stop()
    print(f'{label:15s}  setup={t_setup:.2f}s  test={min(times):.2f}s  '
          f'total={t_setup+min(times):.2f}s  peak_mem={peak/1e6:.1f}MB')
def np_setup():
    m = SplisosmNP()
    m.setup_data(adata, layer='counts', group_iso_by='gene_ids', gene_names='gene_name',
                 spatial_key='spatial', min_counts=10, filter_single_iso_genes=True)
    return m
def fft_setup():
    m = SplisosmFFT(
        rho=0.99, 
        neighbor_degree=2, # original nearest neighbors become second nearest due to padding
        spacing=(np.sqrt(3), 1.0),  # x-axis (array_col) is stretched in the rasterized image
    )
    m.setup_data(
        sdata, bins='visium_ffpe_mouse_cbs', table_name='table',
        col_key='array_col', row_key='array_row', layer='counts',
        group_iso_by='gene_ids', gene_names='gene_name',
        min_counts=10, filter_single_iso_genes=True,
    )
    return m
benchmark('SplisosmNP', np_setup, lambda m: m.test_spatial_variability(method='hsic-ir'))
benchmark('SplisosmFFT', fft_setup, lambda m: m.test_spatial_variability(method='hsic-ir'))

SV [hsic-ir]: 100%|██████████| 19/19 [00:00<00:00, 8870.41it/s]
SV [hsic-ir]: 100%|██████████| 19/19 [00:00<00:00, 9581.79it/s]
SV [hsic-ir]: 100%|██████████| 19/19 [00:00<00:00, 9718.51it/s]

SplisosmNP       setup=0.38s  test=0.44s  total=0.83s  peak_mem=301.6MB

SV [hsic-ir]: 100%|██████████| 19/19 [00:00<00:00, 9793.75it/s]
SV [hsic-ir]: 100%|██████████| 19/19 [00:00<00:00, 10439.06it/s]
SV [hsic-ir]: 100%|██████████| 19/19 [00:00<00:00, 9962.72it/s]

SplisosmFFT      setup=8.44s  test=1.59s  total=10.03s  peak_mem=206.2MB

Summary#

SplisosmNP operates on the native hexagonal spot geometry via a k-NN graph and is the recommended default for standard-resolution Visium, which is already fast.
SplisosmFFT can be applied to standard Visium using the SpatialData interface, but requires careful adjustment of the neighbor_degree and spacing parameters to account for the zero-padded rasterization. SV test results are highly concordant with SplisosmNP.

For reproducibility#

import sys
from datetime import date
import splisosm

print("Last updated:", date.today())
print("Python:", sys.version.split()[0])
print("splisosm:", getattr(splisosm, "__version__", "unknown"))

Last updated: 2026-04-27
Python: 3.12.13
splisosm: 1.2.0rc2.dev4+g83955c42e.d20260427