Quick Start#
This guide walks through running SPLISOSM for spatial variability (SV) and differential isoform usage (DU) testing.
Choosing a model class#
SPLISOSM provides three model classes:
SplisosmNP— Non-parametric HSIC tests with a sparse CAR spatial kernel. Works on both regular and irregular geometries (e.g., segmented spatial data or even single-cell data).SplisosmFFT— Same tests asSplisosmNPbut with FFT-accelerated implementation for data on regular grids (e.g., Visium HD, Xenium binned data).SplisosmGLMM— Parametric Multinomial-based DU test with spatial random effects. The GLM model is similar to conventional (bulk) differential splicing analysis. The GLMM model is extended to handle cross-spot variability.
Use the decision tree below to pick the right class for your dataset. For methodological details of each test, see Statistical Methods.
Is your data on a REGULAR GRID (Visium HD, Xenium binned, etc.)?
├── YES → SplisosmFFT (fastest, requires SpatialData input)
└── NO (Slide-seq, single-cell segmented data, etc.)
│
Are you interested in parametric model fitting (effect sizes, covariates)?
├── YES → SplisosmGLMM (not calibrated for SV, but can be used for DU testing)
└── NO → SplisosmNP (the most general, supports both SV and DU testing)
Summary of class features:
Model class |
SV |
DU |
Input type |
Geometry |
Speed |
Low-rank approximation |
|---|---|---|---|---|---|---|
|
✓ |
✓ |
AnnData |
any |
fast |
optional via |
|
✓ |
✓ |
SpatialData |
grid only |
fastest |
N/A (FFT-based; no eigendecomposition) |
|
⚠️ |
✓ |
AnnData |
any |
slow |
optional via |
Note
SplisosmNP vs SplisosmFFT: same test, different assumptions, potentially different results.
Show details
While classes compute the same HSIC test statistic and use the same spectrum-based null, results can still differ because SplisosmFFT (i) assumes periodic boundaries (the grid wraps around, so edge bins become neighbours) and (ii) operates on the full raster grid, zero-padding unobserved bins. On the other hand, SplisosmNP works only on the observed spots with a k-NN graph that has no wrap-around. The discrepancy is small when the grid is densely observed and the tissue is far from the slide edges (i.e. boundaries are already all zero).Inputs and outputs#
SplisosmNP and SplisosmGLMM accept isoform-level quantification as an AnnData object.
model.setup_data(
adata, # AnnData of shape (n_spots, n_isoforms)
spatial_key="spatial", # key in adata.obsm for spatial coordinates
layer="counts", # layer containing raw isoform counts
group_iso_by="gene_symbol", # adata.var column grouping isoforms by gene
gene_names="gene_symbol", # adata.var column for gene names
# ----- Differential usage testing
design_mtx=covariates, # (n_spots, n_factors)
covariate_names=covariate_names, # list of covariate names
# ----- Spot filtering options (for removing disconnected tissue fragments, etc.)
min_component_size=1, # minimal size of a connected spatial k-NN graph component to keep
# ----- Feature filtering options (applied after spot filtering)
min_counts=10, # minimal total counts per isoform to keep
min_bin_pct=0.01, # minimal proportion of non-zero expression spots per isoform to keep
filter_single_iso_genes=True, # remove single-isoform genes (SplisosmNP/FFT only; GLMM always requires ≥2)
)
Note
Since v1.1.0, the approx_rank argument was moved out of setup_data:
SplisosmNP — pass
null_configs={"approx_rank": k}totest_spatial_variability()to cap the eigenvalue rank used for the HSIC null distribution.SplisosmGLMM — pass
approx_rank=ktoSplisosmGLMMat construction time to control the low-rank spatial kernel approximation.SplisosmFFT uses FFT-based convolutions instead of eigendecomposition and does not support
approx_rankornull_configs.
For SplisosmFFT, pass a SpatialData object with isoform-level counts to setup_data.
model.setup_data(
sdata, # SpatialData object
bins="Visium_HD_Mouse_Brain_square_016um", # SpatialData bin element name
table_name="square_016um", # adata containing isoform-level counts
col_key="array_col", # adata.obs column for array column indices
row_key="array_row", # adata.obs column for array row indices
layer="counts", # layer containing raw isoform counts
group_iso_by="gene_ids", # adata.var column grouping isoforms by gene
gene_names="gene_name", # adata.var column for gene names
min_counts=10,
min_bin_pct=0.01,
)
See Feature Quantification page for guidance on preparing input data from different spatial transcriptomics platforms. All model classes share the same output convention:
SV results: a
DataFramewith per-gene test statistics and p-values.DU results: a
DataFramewith test statistics and p-values for each gene-covariate pair.
Example data#
A demo Visium-ONT mouse olfactory bulb dataset (SiT-MOB) is available from Dropbox (~100 MB):
mob_ont_filtered_1107.h5ad: isoform quantification (AnnData).mob_visium_rbp_1107.h5ad: short-read RBP gene expression (AnnData).
import scanpy as sc
adata_ont = sc.read("mob_ont_filtered_1107.h5ad")
adata_rbp = sc.read("mob_visium_rbp_1107.h5ad")
# Align RBP data and extract spatially variable RBPs as covariates
adata_rbp = adata_rbp[adata_ont.obs_names, :].copy()
covariates = adata_rbp[:, adata_rbp.var['is_visium_sve']].layers['log1p'].toarray()
covariate_names = adata_rbp.var.loc[adata_rbp.var['is_visium_sve'], 'features']
Testing for spatial variability (SV)#
SPLISOSM tests for statistical independence between isoform expression and spatial location using the Hilbert-Schmidt Independence Criterion (HSIC). Three SV tests are available:
HSIC-IR — tests SV of relative isoform usage (spatially variably processed genes, SVP).
HSIC-GC — tests SV of total gene expression (spatially variably expressed genes, SVE). Drop-in replacement for SPARK-X with improved power.
HSIC-IC — tests SV of individual isoform counts; reflects joint changes in expression and usage.
SplisosmNP (any geometries — AnnData input)#
from splisosm import SplisosmNP
model = SplisosmNP()
model.setup_data(
adata_ont, spatial_key="spatial",
layer="counts", group_iso_by="gene_symbol"
)
model.test_spatial_variability(method="hsic-ir")
df_sv_res = model.get_formatted_test_results("sv")
Show full call with all defaults
model = SplisosmNP(
k_neighbors=4, # k-NN graph degree for CAR spatial kernel
rho=0.99, # spatial autocorrelation strength (0 < rho < 1)
standardize_cov=True, # set kernel diagonal to 1 (to downweight outliers)
)
model.setup_data(
adata=adata_ont,
spatial_key="spatial", # key in adata.obsm for 2-D coordinates
adj_key=None, # key in adata.obsp if using precomputed adjacency
layer="counts",
group_iso_by="gene_symbol",
min_counts=10, # min total counts per isoform to keep
min_bin_pct=0.0, # min fraction of spots expressing the isoform
filter_single_iso_genes=True,
min_component_size=1, # set > 1 to drop small tissue fragments
skip_spatial_kernel=False, # True → use IdentityKernel (DU-only mode)
)
model.test_spatial_variability(
method="hsic-ir", # 'hsic-ir' | 'hsic-gc' | 'hsic-ic' | 'spark-x'
ratio_transformation="none", # 'none' | 'clr' | 'ilr' | 'alr'
nan_filling="mean", # if 'none', use only non-zero spots per gene (slow)
null_method="eig", # 'eig' (Liu) | 'trace' (normal approx) | 'perm'
null_configs=None, # e.g. {"approx_rank": 20} to cap eigenvalues
n_jobs=-1, # gene-level parallelism; -1 = all CPUs
print_progress=True,
)
df_sv_res = model.get_formatted_test_results("sv")
SplisosmFFT (regular grids only — SpatialData input)#
from splisosm import SplisosmFFT
model = SplisosmFFT()
model.setup_data(
sdata,
bins="Visium_HD_Mouse_Brain_square_016um",
table_name="square_016um",
col_key="array_col",
row_key="array_row",
layer="counts",
group_iso_by="gene_ids",
)
model.test_spatial_variability(method="hsic-ir")
df_sv_res = model.get_formatted_test_results("sv")
Show full call with all defaults
model = SplisosmFFT(
neighbor_degree=1, # grid adjacency degree (1 = 4-connectivity, 2 = 8-connectivity)
rho=0.99, # spatial autocorrelation strength
workers=None, # scipy.fft thread count; None = all available CPUs
)
model.setup_data(
sdata,
bins="Visium_HD_Mouse_Brain_square_016um",
table_name="square_016um",
col_key="array_col",
row_key="array_row",
layer="counts",
group_iso_by="gene_ids",
gene_names="gene_name", # adata.var column to change gene display names in results
min_counts=10,
min_bin_pct=0.0,
filter_single_iso_genes=True,
)
model.test_spatial_variability(
method="hsic-ir", # 'hsic-ir' | 'hsic-gc' | 'hsic-ic'
ratio_transformation="none", # 'none' | 'clr' | 'ilr' | 'alr'
n_jobs=-1, # gene-level parallelism; -1 = all CPUs
print_progress=True,
)
df_sv_res = model.get_formatted_test_results("sv")
Testing for differential isoform usage (DU)#
DU tests identify genes whose isoform usage is associated with a covariate (e.g., spatial domain, RBP expression), conditioned on spatial correlation.
SplisosmNP (non-parametric, HSIC-based)#
Uses Gaussian process regression (GPR) to remove spatial autocorrelation.
from splisosm import SplisosmNP
# Focus on SVP genes from the SV step
svp_genes = df_sv_res.query("pvalue_adj < 0.05")['gene'].tolist()
adata_svp = adata_ont[:, adata_ont.var['gene_symbol'].isin(svp_genes)].copy()
model = SplisosmNP()
model.setup_data(
adata_svp, spatial_key="spatial",
layer="counts", group_iso_by="gene_symbol",
design_mtx=covariates, covariate_names=covariate_names,
skip_spatial_kernel=True, # DU-only; skip expensive CAR kernel construction
)
model.test_differential_usage(method="hsic-gp")
df_du_res = model.get_formatted_test_results("du")
Show full call with all defaults
model = SplisosmNP()
model.setup_data(
adata=adata_svp,
spatial_key="spatial",
adj_key=None, # key in adata.obsp if using precomputed adjacency
layer="counts",
group_iso_by="gene_symbol",
design_mtx=covariates, # (n_spots, n_factors) array/tensor/DataFrame
covariate_names=covariate_names, # list of covariate display names
skip_spatial_kernel=True, # skip CAR kernel — not needed for DU
min_counts=10,
min_bin_pct=0.0,
filter_single_iso_genes=True,
min_component_size=1, # set > 1 to drop small tissue fragments
)
model.test_differential_usage(
method="hsic-gp", # 'hsic' | 'hsic-gp' | 't-fisher' | 't-tippett'
ratio_transformation="none", # 'none' | 'clr' | 'ilr' | 'alr'
nan_filling="mean", # if 'none', use only non-zero spots per gene (slow)
residualize="cov_only", # 'cov_only' (faster) | 'both' (more conservative)
gpr_backend="sklearn", # 'sklearn' | 'gpytorch' (FITC sparse GP with GPU)
gpr_configs=None, # e.g. {"covariate": {"n_inducing": 500}}
n_jobs=-1, # gene-level parallelism; -1 = all CPUs
print_progress=True,
)
df_du_res = model.get_formatted_test_results("du")
SplisosmFFT (same tests but for regular grids only)#
Same test as SplisosmNP but with FFT-accelerated GPR for regular grid data.
from splisosm import SplisosmFFT
model = SplisosmFFT()
model.setup_data(
sdata,
bins="Visium_HD_Mouse_Brain_square_016um",
table_name="square_016um_svp",
design_mtx="square_016um_rbp_sve", # design stored as a separate sdata table
col_key="array_col", row_key="array_row",
layer="counts", group_iso_by="gene_ids",
)
model.test_differential_usage(method="hsic-gp")
df_du_res = model.get_formatted_test_results("du")
Show full call with all defaults
model = SplisosmFFT(
neighbor_degree=1,
rho=0.99,
workers=None, # scipy.fft thread count; None = all CPUs
)
model.setup_data(
sdata,
bins="Visium_HD_Mouse_Brain_square_016um",
table_name="square_016um_svp",
design_mtx="square_016um_rbp_sve",
col_key="array_col",
row_key="array_row",
layer="counts",
group_iso_by="gene_ids",
gene_names="gene_name",
min_counts=10,
min_bin_pct=0.0,
filter_single_iso_genes=True,
)
model.test_differential_usage(
method="hsic-gp", # 'hsic' | 'hsic-gp' | 't-fisher' | 't-tippett'
ratio_transformation="none", # 'none' | 'clr' | 'ilr' | 'alr'
residualize="cov_only", # 'cov_only' | 'both'
gpr_configs=None, # e.g. {"covariate": {"length_scale_bounds": "fixed"}}
n_jobs=-1, # gene-level parallelism
print_progress=True,
)
df_du_res = model.get_formatted_test_results("du")
SplisosmGLMM (parametric, GLM/GLMM-based)#
Fits a multinomial GLMM with a Gaussian random field spatial random effect.
Set var_fix_sigma=False to learn the total variance jointly.
With the default settings (var_fix_sigma=True, max_epochs=500), SV and DU
hypothesis tests are conservative (better false positive control but
reduced power).
from splisosm import SplisosmGLMM
# DU testing (score test conditioned on spatial random effects)
model = SplisosmGLMM(model_type="glmm-full")
model.setup_data(
adata_svp, spatial_key="spatial",
layer="counts", group_iso_by="gene_symbol",
group_gene_by_n_iso=True,
design_mtx=covariates, covariate_names=covariate_names,
)
model.fit(with_design_mtx=False)
model.test_differential_usage(method="score")
df_du_res = model.get_formatted_test_results("du")
Show full call with all defaults
model = SplisosmGLMM(
model_type="glmm-full", # 'glmm-full' | 'glmm-null' | 'glm'
var_fix_sigma=True, # True → conservative tests; False → more power but may inflate FPR
init_ratio="uniform", # 'uniform' | 'observed'
fitting_method="joint_gd", # 'joint_gd' | 'joint_newton' | 'marginal_gd' | 'marginal_newton'
fitting_configs={"max_epochs": 500},
device="cpu", # 'cpu' | 'cuda' (NVIDIA) | 'mps' (Apple Silicon)
# approx_rank: omit for auto (default) | None (force full rank) | int (fixed low-rank)
)
model.setup_data(
adata=adata_svp,
spatial_key="spatial",
adj_key=None, # key in adata.obsp if using precomputed adjacency
layer="counts",
group_iso_by="gene_symbol",
group_gene_by_n_iso=True, # required for batch_size > 1
design_mtx=covariates,
covariate_names=covariate_names,
min_counts=10, # min total counts per isoform to keep
min_bin_pct=0.0, # min fraction of spots expressing the isoform
min_component_size=1, # set > 1 to drop small tissue fragments
)
model.fit(
n_jobs=1, # parallel gene fitting (CPU only; auto-disabled on GPU)
batch_size=1, # genes per batch; > 1 requires group_gene_by_n_iso=True
with_design_mtx=False, # False → score test (recommended); True → Wald test
from_null=False, # if True, initialize from glmm-null model
print_progress=True,
)
model.test_differential_usage(
method="score", # 'score' | 'wald' (not recommended)
print_progress=True,
)
df_du_res = model.get_formatted_test_results("du")
Performance tuning and configuration reference#
Each model class has its own set of tuning knobs. All options have sensible defaults; adjust them only when you hit performance or accuracy limits.
SplisosmNP / run_hsic_gc — configuration options (click to expand)
Feature |
How to set |
Default |
Trade-off |
|---|---|---|---|
Ratio transformation |
|
|
log-based transformations ( |
SV null approximation |
|
|
|
SV null low-rank (when |
|
Auto: full rank when |
Restricts test statistic and null computation to the top-k eigenvalues; significant speedup for large n with potential power loss for high-frequency patterns (usually rare in real spatial data). |
Conditional DU test ( |
|
|
|
GPR backend (DU, |
|
|
|
GPR inducing points (DU, |
|
|
Reduces GP fitting cost from O(n*³) to O(*nM*²).
Accuracy degrades if *M is too small.
Set to |
Parallel jobs |
|
|
Number of joblib workers for gene-wise computation.
Uses |
Other GPR configuration |
|
See |
Adjust |
SplisosmFFT — configuration options (click to expand)
Feature |
How to set |
Default |
Trade-off |
|---|---|---|---|
Ratio transformation |
|
|
log-based transformations ( |
FFT thread count |
|
|
Controls the number of threads used for the FFT-based spatial
convolutions. Set to |
SV parallel jobs |
|
|
Number of joblib workers for gene-wise HSIC computation. Set to
|
DU parallel jobs |
|
|
Number of joblib workers for (gene, covariate) pair computation.
Same trade-off as for |
Conditional DU test ( |
|
|
Same semantics as |
Other GPR configuration |
|
See |
Adjust |
SplisosmGLMM — configuration options (click to expand)
Feature |
How to set |
Default |
Trade-off |
|---|---|---|---|
GPU acceleration |
|
|
Use |
Low-rank spatial kernel |
|
|
Fewer eigenvectors → faster fitting and lower memory (with accuracy loss).
Pass |
Fitting method |
|
|
|
Parallel gene fitting (CPU only) |
|
|
Near-linear speed-up with |
Model complexity |
|
|
|
Variance sharing across isoforms |
|
|
|
Fix total variance |
|
|
When |
Isoform ratio initialization |
|
|
|