Quick Start#

This guide walks through running SPLISOSM for spatial variability (SV) and differential isoform usage (DU) testing.

Choosing a model class#

SPLISOSM provides three model classes:

  • SplisosmNP — Non-parametric HSIC tests with a sparse CAR spatial kernel. Works on both regular and irregular geometries (e.g., segmented spatial data or even single-cell data).

  • SplisosmFFT — Same tests as SplisosmNP but with FFT-accelerated implementation for data on regular grids (e.g., Visium HD, Xenium binned data).

  • SplisosmGLMM — Parametric Multinomial-based DU test with spatial random effects. The GLM model is similar to conventional (bulk) differential splicing analysis. The GLMM model is extended to handle cross-spot variability.

Use the decision tree below to pick the right class for your dataset. For methodological details of each test, see Statistical Methods.

Is your data on a REGULAR GRID (Visium HD, Xenium binned, etc.)?
├── YES  →  SplisosmFFT  (fastest, requires SpatialData input)
└── NO  (Slide-seq, single-cell segmented data, etc.)
    │
    Are you interested in parametric model fitting (effect sizes, covariates)?
    ├── YES  →  SplisosmGLMM (not calibrated for SV, but can be used for DU testing)
    └── NO  →  SplisosmNP  (the most general, supports both SV and DU testing)

Summary of class features:

Model class

SV

DU

Input type

Geometry

Speed

Low-rank approximation

SplisosmNP

AnnData

any

fast

optional via null_configs={"approx_rank": k} for SV, and gpr_configs={"covariate": {"n_inducing": m}} for DU.

SplisosmFFT

SpatialData

grid only

fastest

N/A (FFT-based; no eigendecomposition)

SplisosmGLMM

⚠️

AnnData

any

slow

optional via approx_rank=k at SplisosmGLMM() construction time.

Note

SplisosmNP vs SplisosmFFT: same test, different assumptions, potentially different results.

Show details While classes compute the same HSIC test statistic and use the same spectrum-based null, results can still differ because SplisosmFFT (i) assumes periodic boundaries (the grid wraps around, so edge bins become neighbours) and (ii) operates on the full raster grid, zero-padding unobserved bins. On the other hand, SplisosmNP works only on the observed spots with a k-NN graph that has no wrap-around. The discrepancy is small when the grid is densely observed and the tissue is far from the slide edges (i.e. boundaries are already all zero).

Inputs and outputs#

SplisosmNP and SplisosmGLMM accept isoform-level quantification as an AnnData object.

model.setup_data(
    adata,                        # AnnData of shape (n_spots, n_isoforms)
    spatial_key="spatial",        # key in adata.obsm for spatial coordinates
    layer="counts",               # layer containing raw isoform counts
    group_iso_by="gene_symbol",   # adata.var column grouping isoforms by gene
    gene_names="gene_symbol",     # adata.var column for gene names
    # ----- Differential usage testing
    design_mtx=covariates,           # (n_spots, n_factors)
    covariate_names=covariate_names, # list of covariate names
    # ----- Spot filtering options (for removing disconnected tissue fragments, etc.)
    min_component_size=1,         # minimal size of a connected spatial k-NN graph component to keep
    # ----- Feature filtering options (applied after spot filtering)
    min_counts=10,                # minimal total counts per isoform to keep
    min_bin_pct=0.01,             # minimal proportion of non-zero expression spots per isoform to keep
    filter_single_iso_genes=True, # remove single-isoform genes (SplisosmNP/FFT only; GLMM always requires ≥2)
)

Note

Since v1.1.0, the approx_rank argument was moved out of setup_data:

  • SplisosmNP — pass null_configs={"approx_rank": k} to test_spatial_variability() to cap the eigenvalue rank used for the HSIC null distribution.

  • SplisosmGLMM — pass approx_rank=k to SplisosmGLMM at construction time to control the low-rank spatial kernel approximation.

  • SplisosmFFT uses FFT-based convolutions instead of eigendecomposition and does not support approx_rank or null_configs.

For SplisosmFFT, pass a SpatialData object with isoform-level counts to setup_data.

model.setup_data(
    sdata,                     # SpatialData object
    bins="Visium_HD_Mouse_Brain_square_016um",  # SpatialData bin element name
    table_name="square_016um", # adata containing isoform-level counts
    col_key="array_col",       # adata.obs column for array column indices
    row_key="array_row",       # adata.obs column for array row indices
    layer="counts",            # layer containing raw isoform counts
    group_iso_by="gene_ids",   # adata.var column grouping isoforms by gene
    gene_names="gene_name",    # adata.var column for gene names
    min_counts=10,
    min_bin_pct=0.01,
)

See Feature Quantification page for guidance on preparing input data from different spatial transcriptomics platforms. All model classes share the same output convention:

  • SV results: a DataFrame with per-gene test statistics and p-values.

  • DU results: a DataFrame with test statistics and p-values for each gene-covariate pair.

Example data#

A demo Visium-ONT mouse olfactory bulb dataset (SiT-MOB) is available from Dropbox (~100 MB):

  • mob_ont_filtered_1107.h5ad: isoform quantification (AnnData).

  • mob_visium_rbp_1107.h5ad: short-read RBP gene expression (AnnData).

import scanpy as sc

adata_ont = sc.read("mob_ont_filtered_1107.h5ad")
adata_rbp = sc.read("mob_visium_rbp_1107.h5ad")

# Align RBP data and extract spatially variable RBPs as covariates
adata_rbp = adata_rbp[adata_ont.obs_names, :].copy()
covariates = adata_rbp[:, adata_rbp.var['is_visium_sve']].layers['log1p'].toarray()
covariate_names = adata_rbp.var.loc[adata_rbp.var['is_visium_sve'], 'features']

Testing for spatial variability (SV)#

SPLISOSM tests for statistical independence between isoform expression and spatial location using the Hilbert-Schmidt Independence Criterion (HSIC). Three SV tests are available:

  1. HSIC-IR — tests SV of relative isoform usage (spatially variably processed genes, SVP).

  2. HSIC-GC — tests SV of total gene expression (spatially variably expressed genes, SVE). Drop-in replacement for SPARK-X with improved power.

  3. HSIC-IC — tests SV of individual isoform counts; reflects joint changes in expression and usage.

SplisosmNP (any geometries — AnnData input)#

from splisosm import SplisosmNP

model = SplisosmNP()
model.setup_data(
   adata_ont, spatial_key="spatial",
   layer="counts", group_iso_by="gene_symbol"
)
model.test_spatial_variability(method="hsic-ir")
df_sv_res = model.get_formatted_test_results("sv")
Show full call with all defaults
model = SplisosmNP(
    k_neighbors=4,        # k-NN graph degree for CAR spatial kernel
    rho=0.99,             # spatial autocorrelation strength (0 < rho < 1)
    standardize_cov=True, # set kernel diagonal to 1 (to downweight outliers)
)
model.setup_data(
    adata=adata_ont,
    spatial_key="spatial",    # key in adata.obsm for 2-D coordinates
    adj_key=None,             # key in adata.obsp if using precomputed adjacency
    layer="counts",
    group_iso_by="gene_symbol",
    min_counts=10,            # min total counts per isoform to keep
    min_bin_pct=0.0,          # min fraction of spots expressing the isoform
    filter_single_iso_genes=True,
    min_component_size=1,     # set > 1 to drop small tissue fragments
    skip_spatial_kernel=False,  # True → use IdentityKernel (DU-only mode)
)
model.test_spatial_variability(
    method="hsic-ir",          # 'hsic-ir' | 'hsic-gc' | 'hsic-ic' | 'spark-x'
    ratio_transformation="none",  # 'none' | 'clr' | 'ilr' | 'alr'
    nan_filling="mean",           # if 'none', use only non-zero spots per gene (slow)
    null_method="eig",            # 'eig' (Liu) | 'trace' (normal approx) | 'perm'
    null_configs=None,            # e.g. {"approx_rank": 20} to cap eigenvalues
    n_jobs=-1,                    # gene-level parallelism; -1 = all CPUs
    print_progress=True,
)
df_sv_res = model.get_formatted_test_results("sv")

SplisosmFFT (regular grids only — SpatialData input)#

from splisosm import SplisosmFFT

model = SplisosmFFT()
model.setup_data(
    sdata,
    bins="Visium_HD_Mouse_Brain_square_016um",
    table_name="square_016um",
    col_key="array_col",
    row_key="array_row",
    layer="counts",
    group_iso_by="gene_ids",
)
model.test_spatial_variability(method="hsic-ir")
df_sv_res = model.get_formatted_test_results("sv")
Show full call with all defaults
model = SplisosmFFT(
    neighbor_degree=1,   # grid adjacency degree (1 = 4-connectivity, 2 = 8-connectivity)
    rho=0.99,            # spatial autocorrelation strength
    workers=None,        # scipy.fft thread count; None = all available CPUs
)
model.setup_data(
    sdata,
    bins="Visium_HD_Mouse_Brain_square_016um",
    table_name="square_016um",
    col_key="array_col",
    row_key="array_row",
    layer="counts",
    group_iso_by="gene_ids",
    gene_names="gene_name",  # adata.var column to change gene display names in results
    min_counts=10,
    min_bin_pct=0.0,
    filter_single_iso_genes=True,
)
model.test_spatial_variability(
    method="hsic-ir",    # 'hsic-ir' | 'hsic-gc' | 'hsic-ic'
    ratio_transformation="none",  # 'none' | 'clr' | 'ilr' | 'alr'
    n_jobs=-1,           # gene-level parallelism; -1 = all CPUs
    print_progress=True,
)
df_sv_res = model.get_formatted_test_results("sv")

Testing for differential isoform usage (DU)#

DU tests identify genes whose isoform usage is associated with a covariate (e.g., spatial domain, RBP expression), conditioned on spatial correlation.

SplisosmNP (non-parametric, HSIC-based)#

Uses Gaussian process regression (GPR) to remove spatial autocorrelation.

from splisosm import SplisosmNP

# Focus on SVP genes from the SV step
svp_genes = df_sv_res.query("pvalue_adj < 0.05")['gene'].tolist()
adata_svp = adata_ont[:, adata_ont.var['gene_symbol'].isin(svp_genes)].copy()

model = SplisosmNP()
model.setup_data(
    adata_svp, spatial_key="spatial",
    layer="counts", group_iso_by="gene_symbol",
    design_mtx=covariates, covariate_names=covariate_names,
    skip_spatial_kernel=True,  # DU-only; skip expensive CAR kernel construction
)
model.test_differential_usage(method="hsic-gp")
df_du_res = model.get_formatted_test_results("du")
Show full call with all defaults
model = SplisosmNP()
model.setup_data(
    adata=adata_svp,
    spatial_key="spatial",
    adj_key=None,                    # key in adata.obsp if using precomputed adjacency
    layer="counts",
    group_iso_by="gene_symbol",
    design_mtx=covariates,           # (n_spots, n_factors) array/tensor/DataFrame
    covariate_names=covariate_names, # list of covariate display names
    skip_spatial_kernel=True,        # skip CAR kernel — not needed for DU
    min_counts=10,
    min_bin_pct=0.0,
    filter_single_iso_genes=True,
    min_component_size=1,     # set > 1 to drop small tissue fragments
)
model.test_differential_usage(
    method="hsic-gp",              # 'hsic' | 'hsic-gp' | 't-fisher' | 't-tippett'
    ratio_transformation="none",   # 'none' | 'clr' | 'ilr' | 'alr'
    nan_filling="mean",            # if 'none', use only non-zero spots per gene (slow)
    residualize="cov_only",        # 'cov_only' (faster) | 'both' (more conservative)
    gpr_backend="sklearn",         # 'sklearn' | 'gpytorch' (FITC sparse GP with GPU)
    gpr_configs=None,              # e.g. {"covariate": {"n_inducing": 500}}
    n_jobs=-1,                     # gene-level parallelism; -1 = all CPUs
    print_progress=True,
)
df_du_res = model.get_formatted_test_results("du")

SplisosmFFT (same tests but for regular grids only)#

Same test as SplisosmNP but with FFT-accelerated GPR for regular grid data.

from splisosm import SplisosmFFT

model = SplisosmFFT()
model.setup_data(
    sdata,
    bins="Visium_HD_Mouse_Brain_square_016um",
    table_name="square_016um_svp",
    design_mtx="square_016um_rbp_sve",  # design stored as a separate sdata table
    col_key="array_col", row_key="array_row",
    layer="counts", group_iso_by="gene_ids",
)
model.test_differential_usage(method="hsic-gp")
df_du_res = model.get_formatted_test_results("du")
Show full call with all defaults
model = SplisosmFFT(
    neighbor_degree=1,
    rho=0.99,
    workers=None,   # scipy.fft thread count; None = all CPUs
)
model.setup_data(
    sdata,
    bins="Visium_HD_Mouse_Brain_square_016um",
    table_name="square_016um_svp",
    design_mtx="square_016um_rbp_sve",
    col_key="array_col",
    row_key="array_row",
    layer="counts",
    group_iso_by="gene_ids",
    gene_names="gene_name",
    min_counts=10,
    min_bin_pct=0.0,
    filter_single_iso_genes=True,
)
model.test_differential_usage(
    method="hsic-gp",              # 'hsic' | 'hsic-gp' | 't-fisher' | 't-tippett'
    ratio_transformation="none",   # 'none' | 'clr' | 'ilr' | 'alr'
    residualize="cov_only",        # 'cov_only' | 'both'
    gpr_configs=None,              # e.g. {"covariate": {"length_scale_bounds": "fixed"}}
    n_jobs=-1,                     # gene-level parallelism
    print_progress=True,
)
df_du_res = model.get_formatted_test_results("du")

SplisosmGLMM (parametric, GLM/GLMM-based)#

Fits a multinomial GLMM with a Gaussian random field spatial random effect. Set var_fix_sigma=False to learn the total variance jointly. With the default settings (var_fix_sigma=True, max_epochs=500), SV and DU hypothesis tests are conservative (better false positive control but reduced power).

from splisosm import SplisosmGLMM

# DU testing (score test conditioned on spatial random effects)
model = SplisosmGLMM(model_type="glmm-full")
model.setup_data(
    adata_svp, spatial_key="spatial",
    layer="counts", group_iso_by="gene_symbol",
    group_gene_by_n_iso=True,
    design_mtx=covariates, covariate_names=covariate_names,
)
model.fit(with_design_mtx=False)
model.test_differential_usage(method="score")
df_du_res = model.get_formatted_test_results("du")
Show full call with all defaults
model = SplisosmGLMM(
    model_type="glmm-full",      # 'glmm-full' | 'glmm-null' | 'glm'
    var_fix_sigma=True,          # True → conservative tests; False → more power but may inflate FPR
    init_ratio="uniform",        # 'uniform' | 'observed'
    fitting_method="joint_gd",   # 'joint_gd' | 'joint_newton' | 'marginal_gd' | 'marginal_newton'
    fitting_configs={"max_epochs": 500},
    device="cpu",                # 'cpu' | 'cuda' (NVIDIA) | 'mps' (Apple Silicon)
    # approx_rank: omit for auto (default) | None (force full rank) | int (fixed low-rank)
)
model.setup_data(
    adata=adata_svp,
    spatial_key="spatial",
    adj_key=None,                   # key in adata.obsp if using precomputed adjacency
    layer="counts",
    group_iso_by="gene_symbol",
    group_gene_by_n_iso=True,       # required for batch_size > 1
    design_mtx=covariates,
    covariate_names=covariate_names,
    min_counts=10,            # min total counts per isoform to keep
    min_bin_pct=0.0,          # min fraction of spots expressing the isoform
    min_component_size=1,     # set > 1 to drop small tissue fragments
)
model.fit(
    n_jobs=1,               # parallel gene fitting (CPU only; auto-disabled on GPU)
    batch_size=1,           # genes per batch; > 1 requires group_gene_by_n_iso=True
    with_design_mtx=False,  # False → score test (recommended); True → Wald test
    from_null=False,        # if True, initialize from glmm-null model
    print_progress=True,
)
model.test_differential_usage(
    method="score",   # 'score' | 'wald' (not recommended)
    print_progress=True,
)
df_du_res = model.get_formatted_test_results("du")

Performance tuning and configuration reference#

Each model class has its own set of tuning knobs. All options have sensible defaults; adjust them only when you hit performance or accuracy limits.

SplisosmNP / run_hsic_gc — configuration options (click to expand)

Feature

How to set

Default

Trade-off

Ratio transformation

ratio_transformation= in test_spatial_variability / test_differential_usage

"none" (no compositional constraint)

log-based transformations ("clr", "ilr", "alr") require pseudocounts to handle zero ratios; "radial" is not calibrated. Performance of "none" is better in most cases.

SV null approximation

null_method= in test_spatial_variability / run_hsic_gc

"eig" (Liu’s chi-square mixture)

"trace" (moment-matching normal) avoids eigendecomposition entirely — fastest for very large n but p-values may be slightly less calibrated in the tails. "perm" uses permutation; slowest but assumption-free.

SV null low-rank (when null_method="eig")

null_configs={"approx_rank": k} in test_spatial_variability / run_hsic_gc

Auto: full rank when n_spots 5000, else ⌈4√n⌉

Restricts test statistic and null computation to the top-k eigenvalues; significant speedup for large n with potential power loss for high-frequency patterns (usually rare in real spatial data).

Conditional DU test (method="hsic-gp")

residualize= in test_differential_usage

"cov_only"

"cov_only" removes spatial effects from covariates only — fastest and recommended in most cases. "both" additionally residualizes isoform ratios, which is more conservative and can be more robust in case the covariate residualization is incomplete.

GPR backend (DU, method="hsic-gp")

gpr_backend= in test_differential_usage

"sklearn"

"gpytorch" uses the FITC sparse-GP approximation with GPU support.

Tends to be slower than "sklearn" on CPU due to overhead.

GPR inducing points (DU, method="hsic-gp")

gpr_configs={"covariate": {"n_inducing": M}} in test_differential_usage

5000 (subset-of-data for sklearn; FITC for gpytorch)

Reduces GP fitting cost from O(n*³) to O(*nM*²). Accuracy degrades if *M is too small. Set to None for exact GP (warns when n_spots > 10000).

Parallel jobs

n_jobs= in test_spatial_variability / test_differential_usage

-1 (all CPUs)

Number of joblib workers for gene-wise computation. Uses prefer="threads" to avoid pickling large shared objects. When gpr_backend="gpytorch" with device != "cpu", parallelism is automatically disabled (CUDA not thread-safe).

Other GPR configuration

gpr_configs= in test_differential_usage

See test_differential_usage docstring for details

Adjust constant_value_bounds or length_scale_bounds to tune the hyperparameter searching ranges.

SplisosmFFT — configuration options (click to expand)

Feature

How to set

Default

Trade-off

Ratio transformation

ratio_transformation= in test_spatial_variability / test_differential_usage

"none" (no compositional constraint)

log-based transformations ("clr", "ilr", "alr") require pseudocounts to handle zero ratios; "radial" is not calibrated. Performance of "none" is better in most cases.

FFT thread count

SplisosmFFT(workers=N)

None (all available CPUs, via scipy.fft)

Controls the number of threads used for the FFT-based spatial convolutions. Set to 1 to disable multi-threading (e.g. when already parallelising at the gene level with n_jobs).

SV parallel jobs

n_jobs= in test_spatial_variability

-1 (all CPUs)

Number of joblib workers for gene-wise HSIC computation. Set to 1 to disable parallelism (useful when workers > 1 already saturates the CPU).

DU parallel jobs

n_jobs= in test_differential_usage

-1 (all CPUs)

Number of joblib workers for (gene, covariate) pair computation. Same trade-off as for test_spatial_variability.

Conditional DU test (method="hsic-gp")

residualize= in test_differential_usage

"cov_only"

Same semantics as SplisosmNP: "cov_only" residualizes covariates only; "both" also residualizes isoform ratios.

Other GPR configuration

gpr_configs= in test_differential_usage

See test_differential_usage docstring for details

Adjust constant_value_bounds or length_scale_bounds to tune the hyperparameter searching ranges.

SplisosmGLMM — configuration options (click to expand)

Feature

How to set

Default

Trade-off

GPU acceleration

SplisosmGLMM(device=...)

"cpu"

Use "cuda" (NVIDIA) or "mps" (Apple Silicon) for a large per-gene speed-up. Parallel fitting (n_jobs > 1) is auto-disabled on non-CPU devices because fork-based workers cannot share GPU context.

Low-rank spatial kernel

SplisosmGLMM(approx_rank=k)

"auto": full rank when n_spots 5000, else ⌈4√n⌉

Fewer eigenvectors → faster fitting and lower memory (with accuracy loss). Pass None to force full rank regardless of n_spots; pass an integer to set a fixed rank.

Fitting method

SplisosmGLMM(fitting_method=...)

"joint_gd"

"joint_gd" maximises the joint likelihood by gradient descent — fastest. "joint_newton" adds a Newton step for the variance parameters. "marginal_gd" / "marginal_newton" integrate out the random effect via second-order Laplace approximation and give more accurate variance posteriors but are significantly more expensive (O(n³) Cholesky per epoch; see warning for n_spots > 300).

Parallel gene fitting (CPU only)

fit(n_jobs=N, batch_size=B)

n_jobs=1, batch_size=1

Near-linear speed-up with n_jobs on multi-core CPUs; disabled automatically when device != "cpu". batch_size > 1 requires group_gene_by_n_iso=True in setup_data, which batches genes with the same number of isoforms together to utilize PyTorch’s efficient batched operations.

Model complexity

SplisosmGLMM(model_type=...)

"glmm-full"

"glm" fits a GLM without any random effect — fastest but ignores spatial autocorrelation and may inflate FDR. Use for quick screening or as a baseline.

Variance sharing across isoforms

SplisosmGLMM(share_variance=False)

True (one shared σ)

False fits one variance component per isoform — more flexible but slower and prone to overfitting for genes with many isoforms.

Fix total variance

SplisosmGLMM(var_fix_sigma=...)

True (σ frozen at Fano-factor estimate)

When True, only the spatial proportion θ is learned, producing conservative tests (near-zero FPR). Set to False to learn σ jointly — may increase SV power but inflates FPR for both SV and DU tests.

Isoform ratio initialization

SplisosmGLMM(init_ratio=...)

"uniform"

"uniform" initialises isoform ratios to equal proportions — fast convergence. "observed" initialises from raw count ratios — slower but may improve ratio estimates for high-count data.