Changelog#
v1.2.0 (2026-05-03)#
pip install "splisosm[sdata,gp]"
Behavioral changes#
SplisosmNP SV default: Liu + low-rank -> Liu + full-rank cumulants
SplisosmNP.test_spatial_variability()andrun_hsic_gc()still default tonull_method="liu"(previously named “eig”), but now estimate cumulants for Liu’s approximation from full-rank spatial-kernel, avoiding eigen-decomposition and low-rank approximations.Dense/spectral kernels use exact traces when cheap; implicit kernels use Hutchinson Rademacher trace estimates. Use
null_configs={"n_probes": m}to tune this stochastic trace budget. This is not a low-rank approximation.To recover the previous low-rank behavior, set
null_configs={"approx_rank": k}.
This changes the statistic and p-values for large datasets (n>5,000) that
previously used automatic rank truncation. FDR hit counts may
change compared with v1.1.1, although gene rankings are usually similar. Specifically, the old low-rank path can look more powerful because it prioritizes low-frequency structure at
the cost of zero sensitivity to local variation. To emphasize
global patterns in v1.2.0, prefer a smoother full-rank kernel, for example
rho=0.999, rather than returning to rank truncation.
Permutation p-values now use (1 + # null >= observed) / (B + 1). GLMM SV
permutation nulls are kept per gene instead of being pooled across genes.
Performance and memory#
Expect significant runtime and memory improvements for large datasets due to:
No eigensolver calls for the default NP SV null (10x or more speedup at 100K spots).
Memory-aware automatic feature chunking to reduce overhead and improve speed.
Sparse-preserving algebra for kernel operations.
Joblib-based parallelism for the standalone
run_hsic_gc()function.
Estimated impact compared with v1.1.1; exact gains depend on sparsity, isoform
counts per gene, n_jobs, and the spatial kernel.
SplisosmNPSV:v1.1.1’s default large-data Liu path cached a rank
k = ceil(4 * sqrt(n_spots))eigensummary. Storing both eigenvectors and the weighted low-rank factor costs about2 * n_spots * k * 4bytes: roughly 1.0 GB at 100K spots and 32 GB at 1M spots, before eigensolver work arrays.v1.2.0’s default
n_probes=60cumulant path uses about3 * n_spots * n_probes * 8bytes for batched probe/result arrays: roughly 144 MB at 100K spots and 1.4 GB at 1M spots, plus sparse graph/precision storage. That is about 7x lower memory at 100K spots and more than 20x lower at 1M spots for the null-calibration state.Null setup replaces thousands of Lanczos eigenvectors at million-spot scale with
2 * n_probeskernel applications. The observed statistic is now full-rank, so per-gene work can be heavier than the old low-rank shortcut, but sparse reductions and 32-column chunks reduce dispatch and solver overhead. Kernel calls drop from one per gene to aboutceil(total_response_columns / 32): up to 32x fewer calls forhsic-gcand usually about 8-16x fewer calls for 2-4 isoforms per gene.
SplisosmFFTSV:v1.1.1 formed a per-gene product spectrum
lambda_spatial x lambda_response. This temporary costs about8 * n_grid * rank(response)bytes per gene: about 24 MB at a 1M-cell grid for a 3-dimensional response, or 80 MB for a 10-dimensional response. v1.2.0 uses cumulants instead, so the null calculation is effectively constant memory per gene after the FFT spectrum is cached.FFT spatial statistics are packed by response channel. The number of FFT kernel calls drops from one per gene to about
ceil(total_response_channels / 32), with the same 32x (hsic-gc) or 8-16x (typical multi-isoform genes) call-count reduction. The automatic chunk cap keeps live FFT work arrays under the 2 GiB per-worker budget; at a 1M-cell grid and 32 channels the estimate is about 1.5 GB.
New features#
FINUFFT-backed NUFFT GP backend for
SplisosmNP.test_differential_usage(method="hsic-gp")on large irregular 2-D coordinates. Faster, more memory-efficient, and more accurate than the default Sklearn backend. Usegpr_backend="nufft"or"finufft".New NUFFT controls:
n_modes,max_auto_modes, andgpr_configs={"covariate": {"lml_approx_rank": r}}.Sparse-aware, response-column chunked SV tests for
SplisosmNP,SplisosmFFT, andrun_hsic_gc;chunk_size="auto"caps NP and FFT chunks at 32 response columns/channels.run_hsic_gc()now acceptsn_jobs.hsic-irwithnan_filling="none"uses a masked implicit spatial kernel instead of materializing dense per-gene spatial submatrices.Deprecated null aliases are routed automatically:
eig->liu, andclt/trace->welch.
Fixes#
GLMM SV permutation calibration now compares each gene with its own permutation null.
FFT DU t-tests reject constant or all-NaN binary covariates during validation.
FFT
n_jobs=0now receives the shared input-validation error.SpatialCovKernelandFFTKernelnow requirerhoin[0, 1); invalid values raiseValueErrorinstead of being silently clipped.Sparse linear HSIC keeps null eigenvalues consistent with
centering=False.
Docs and API#
Quickstart, FAQ, installation, methods, README, API pages, and tutorial text now describe the full-rank cumulant SV default and clarify that
n_probesis trace-estimation control, not low-rank approximation.Added NUFFT GP methods/API documentation, including
lml_approx_rank.Reorganized API docs into Core API and Advanced Options.
Added a new SV hyperparameter optimization tutorial comparing kernel hyperparameters, full-rank NP cumulant Liu, smoother full-rank NP, and the legacy low-rank path.
Moved the package to a
src/splisosmlayout and grouped advanced helpers undersplisosm.gpr,splisosm.utils,splisosm.io,splisosm.glmm, andsplisosm.hyptest. Main imports such asfrom splisosm import SplisosmNP, SplisosmFFT, SplisosmGLMMare unchanged.Tutorial notebooks were refreshed for v1.2.0.
Testing#
Added and extended tests for cumulant Liu p-values, SV chunking and sparse paths,
run_hsic_gcparallelism, NUFFT GP agreement, public API imports, and removal of old internal import paths.Sphinx docs, local links, package build, and tutorial outputs were refreshed for v1.2.0.
v1.1.1 (2026-04-20)#
Bug fixes#
SplisosmFFT — missing kernel double-centring on FFT SV tests
Prior to v1.1.1, SplisosmFFT.setup_data() built the internal FFTKernel without passing centering=True, so the periodic CAR kernel
retained its DC eigenvalue \(\lambda^K_{(0,0)} = 1/(1 - \rho)\) (≈ 100 at the default rho=0.99).
This violates the standard HSIC double-centring convention and inflates p-values because the constant mode is included in the null mixture.
Impact of the fix (centering=True is now passed explicitly):
Test statistic (
tr(Y^T K Y)): unchanged because of existing column-centring.Gene ranking: unchanged (a monotone transformation of test statistics).
P-values: systematically smaller (more significant) after the fix.
FDR hits: expect more genes to pass a fixed BH threshold after upgrading; the previous v1.1.0 results were conservative. From tutorials:
Visium FFPE: 68 → 74 SVP genes (FDR < 0.01).Visium HD FFPE: 192 → 196 SVP genes (FDR < 0.01).Visium HD 3': 501 → 506 SVP genes (FDR < 0.01).Visium HD ONT: 784 → 790 SVP genes (FDR < 0.01).Xenium Prime 5K binned: 2144 → 2158 SVP genes (FDR < 0.01).
SplisosmNP — per-gene null mismatch for hsic-ir + nan_filling='none'
In this branch the worker builds a per-gene double-centred kernel submatrix K_sp_gene (dropping spots whose isoform ratios are NaN),
but the trace / welch null reused tr(K_sp) / tr(K_sp²) from the global kernel, and the perm null applied the global K_sp to the
NaN-filtered y_batch (causing a shape mismatch and runtime error whenever filtering actually removed spots). All three null methods now
reference K_sp_gene consistently.
Other bug fixes with no user-visible numerical change#
SpatialCovKernelimplicit (LU-solve) path forn > 5000:_hutchinson_traceunconditionally returnedtr(HKH)/tr((HKH)²); now branches on the_centeringflag socentering=Falsecorrectly returnstr(K)/tr(K²).xtKx_exactreturnedx^T K xregardless of_centering; now appliesHon both sides whencentering=True.
All current call sites build the implicit kernel with
centering=Trueand pre-column-centre the input, so dense-mode and implicit-mode results continue to agree; these are latent API consistency fixes.BED probe filtering in
load_visium_probe: tightened substring matching to avoid spurious hits across gene name prefixes.
Renames (back-compat preserved)#
null_method='trace'→null_method='clt'inSplisosmNP.test_spatial_variabilityandsplisosm.utils.run_hsic_gc. The previous name'trace'conflated the moment-matching normal (Central Limit Theorem) approximation with the Welch–Satterthwaite'welch'path, which also uses the matrix tracestr(K)/tr(K²).'trace'is still accepted and returns identical results, but emits aDeprecationWarning; please update call sites to'clt'.
New features#
null_method='welch'forSplisosmNP.test_spatial_variabilityandsplisosm.utils.run_hsic_gc. Uses Welch–Satterthwaite moment matching (g·χ²_hwithg = Var/2E,h = 2E²/Var) from the sametr(K)andtr(K²)asnull_method='clt'. Typically close to theeig(Liu) reference and is recommended wheneigis too slow.Optional
spatial_keywhenadj_keyis provided (SplisosmNP,SplisosmGLMM,run_hsic_gcAnnData mode). Non-spatial AnnData (e.g. scRNA-seq withadata.obsp['connectivities']fromscanpy.pp.neighbors) can now be tested end-to-end without coordinates.method='spark-x'(SV) andmethod='hsic-gp'(DU) raise a targetedValueErrorat call time when coordinates are absent.IdentityKernelandFFTKernelnow document acentering: bool = Falseconstructor argument matchingSpatialCovKernel; HSIC-based SV/DU workflows should always setcentering=True(no direct impact).
Documentation#
methods.rst— mathematical corrections and notation sweep:Liu’s chi-squared mixture null: the missing
1/nfactor is now explicit,Q = tr(Y^T K Y) ≈ (1/n) Σ λ^K_i λ^Y_j Z_{ij}.Trace/Welch moments:
μ₀ = (1/n) tr(K) tr(Y^T Y),σ₀² = (2/n²) tr(K²) tr((Y^T Y)²)(previous form had an incorrect1/(n−1)scaling).FFT-DU: corrected the claim about the p-value source: covariate and response eigenvalues, not the FFT spatial spectrum.
FFT convention: the
1/nprefactor in the inverse DFT identityF⁻¹ = (1/n) F*is now stated explicitly.GLMM model spec rewritten with explicit dimensions (
β ∈ ℝ^{d×q},U ∈ ℝ^{n×q}), reference-category multinomial-logit link, and a Kronecker-form random-effect covarianceΣ = θK + (1−θ)I_n,vec(U) ~ N(0, σ² Σ ⊗ I_q).Notation unified across the page:
i ∈ {1,…,n}indexes spots,j ∈ {1,…,p}indexes isoforms;K_{ii'}for spot-pair kernel entries.
quickstart.rst— Inputs-and-outputs section rewritten and consolidated with the “Expected input data format” spec moved in fromtxquant.rst. New subsection documenting the non-spatial / single-cell workflow viaadj_key+ targeted errors for operations that still require coordinates.README — platform/feature table, model-class decision tree, non-spatial path, paper + preprint references, and badges.
All tutorial notebooks: updated to v1.1.1. Add a new
visium_ffpe.ipynbdemo for 10x Visium FFPE (v2, CytAssist) data and for SplisosmNP vs SplisosmFFT comparison.
Testing#
New:
test_sv_nan_filling_none_uses_per_gene_kernel_moments(tests/test_hyptest_np.py) — regression for the per-gene null fix.New:
test_implicit_honors_centering_flag(tests/test_kernel.py) — dense/implicit agreement fortrace,square_trace, andxtKx_exactunder bothcentering=Trueandcentering=False.Extended
test_null_methods_agreementto includewelchalongsideeig/clt/perm(Spearman-ρ thresholds at ≥ 0.90 pairwise among the three asymptotic methods).New:
test_sv_null_method_trace_alias_deprecated(tests/test_hyptest_np.py) andtest_matrix_mode_null_method_trace_alias_deprecated(tests/test_utils.py) — assert the deprecated'trace'alias returns identical results to'clt'and emits aDeprecationWarning.390 tests passing (up from 371 in v1.1.0), 4 GPU-skipped.
v1.1.0 (2026-04-08)#
Breaking Changes#
Behavioral changes:
counts_to_ratioswithnan_filling='mean'now fills NaNs after ratio transformation instead of before. This does not affect the default no-transformation behavior.SplisosmGLMMdefault fitting parameters changed:Parameter
v1.0.4
v1.1.0
Rationale
var_fix_sigmaFalseTrueFix total variance to method-of-moments estimate; faster convergence
init_ratio"observed""uniform"More robust for sparse data
fitting_configs["max_epochs"]-1(10000)500Efficiency (still recommend large max_epochs if possible)
var_parameterization_sigma_thetaTrue(user choice)always
True(removed)Simplify API
Renamed attributes (all three classes):
n_isos→n_isos_per_geneinSplisosmNPandSplisosmFFT(aligns withSplisosmGLMM)corr_sp/kernel→sp_kernelacross all three classes (unified spatial kernel attribute)sv_test_results/du_test_results→ now private (_sv_test_results/_du_test_results); useget_formatted_test_results()insteadmodel_type/model_configs→ now private inSplisosmGLMM(shown in__str__)data/coordinates→ now private inSplisosmNPandSplisosmGLMMk_neighbors/rho/standardize_cov→ now private inSplisosmNP(shown in__str__)
Removed methods:
SplisosmGLMM.fitting_resultsproperty — useget_fitted_models()orget_gene_model()insteadSplisosmGLMM.n_isosproperty alias — usen_isos_per_genedirectly
Changed parameters and arguments:
SplisosmNP/run_hsic_gc:move
approx_rankfromsetup_datatotest_spatial_variability(null_configs={'approx_rank': ...})for finer control over kernel construction. Setting'approx_rank': Nonenow will override the default low-rank approximation and use the full kernel for large datasets (n > 5000)replace
use_perm_null/n_perms_per_geneintest_spatial_variabilitywithnull_method="perm"and thenull_configs={'n_perms_per_gene': ...}dict
SplisosmGLMM:PatienceLoggernow always logs the training loss. Usestore_param_historyinstead ofdiagnoseto keep track of parameter trajectories.fit(quiet=True)now suppresses non-convergence warnings.
Removed parameters:
setup_data()fromSplisosmNPandSplisosmGLMMnow only accepts AnnData as input; legacy array-based setup removedfilter_single_iso_genesremoved fromSplisosmGLMM.setup_data()— GLMM always requires ≥2 isoforms per gene; the parameter is still available inSplisosmNPandSplisosmFFTvar_parameterization_sigma_thetaretired fromMultinomGLMM— only the sigma/theta parameterization is supported
Deprecated functions:
extract_gene_level_statistics()→ usecompute_feature_summaries()instead (emitsDeprecationWarning)extract_counts_n_ratios→ useadd_ratio_layerinstead (emitsDeprecationWarning)
New Features#
Parallelism and GPU support:
SplisosmNP.test_spatial_variability()andtest_differential_usage()now acceptn_jobsparameter for joblib-based gene-level parallelism (prefer="threads")GPU guard: parallelism automatically disabled when
gpr_backend="gpytorch"withdevice != "cpu"GPU support for
SplisosmGLMMfitting viaSplisosmGLMM(device=...)with"cpu","cuda", or"mps"backendsFFT worker auto-coordination:
workers = max(1, cpu_count() // n_jobs)prevents thread oversubscription
Performance:
Low-rank approximation for GLMM fitting via
SplisosmGLMM(approx_rank=...), defaulting toint(4*sqrt(n))for n > 5000Analytic sigma Hessian (
_get_log_lik_hessian_sigma_expand_analytic) replacestorch.autograd.functional.jacobian— closed-form O(G·(p-1)·rank) computationSingle-allocation Hessian in
_get_log_lik_hessian_nu— halves peak memory by filling MVN + multinomial blocks in-placeIdentity covariance fast paths for glmm-null:
_calc_log_prob_joint,_inv_cov, and_get_log_lik_hessian_nubypass eigenvector operations when theta=0Lean GLMM storage: per-gene models replaced with lightweight
_FittedGeneStatedataclasses after fitting;save()/load()no longer duplicates kernel eigenvectors across genesImproved
SplisosmGLMMdefault configs (see above)
Kernel module:
SpatialCovKernelrefactored: handles sparse KNN graph construction internally (removing smoother-omics dependency), delays expensive eigen-decomposition untileigenvalues()are calledNew
adj_keyparameter inSplisosmNP.setup_data()andrun_hsic_gc(): supports custom adjacency matrices (e.g., expression-based k-NN graphs)skip_spatial_kernel=TrueinSplisosmNP.setup_data(): usesIdentityKernelfor DU-only workflows (no CAR kernel construction)min_component_sizeparameter inSplisosmNP.setup_data()andrun_hsic_gc(): filters small disconnected tissue fragments from the spatial graphFFTKernelmoved fromhyptest_fft.pytokernel.pyand now inheritsKernelABCIdentityKerneladded for identity-covariance and DU-only modes
Kernel configs exposed in API:
SplisosmNP(k_neighbors=..., rho=..., standardize_cov=...)SplisosmGLMM(k_neighbors=..., rho=..., approx_rank=...)Optional skipping of kernel construction in
SplisosmNP.setup_data()for DU-only analyses (skip_spatial_kernel=True)
New SplisosmGLMM features:
SplisosmGLMM(k_neighbors=..., rho=..., approx_rank=...)— kernel construction configs moved to constructorSplisosmGLMM.load(path)— static method to load saved models (new; v1.0.4 only hadsave())get_gene_model(gene_name)/model[gene_name]— retrieve fitted per-gene modelget_training_summary()— per-gene convergence, loss, and timing DataFrameget_fitted_ratios_anndata()— extract fitted isoform ratios as AnnData layer
API improvements:
get_formatted_test_results(with_gene_summary=True)appends gene-level summary statistics to results DataFrameextract_feature_summary(level='gene'/'isoform')on all three classes — cached gene/isoform statisticsfiltered_adataread-only property onSplisosmNPandSplisosmGLMMcompute_feature_summaries()shared utility function (gene + isoform level)add_ratio_layer()utility for adding isoform ratio layers to AnnDatarun_hsic_gc()now supports AnnData mode with integrated filteringUnified
__str__/__repr__across all three classes showing data summary, model config, and test status
Numerical safety:
Woodbury inverse correction clamped to
[-1e6, 1e6]Analytic Hessian eigenvalue derivatives clamped at
min=1e-6Single-isoform gene guards: HSIC-IR returns
(0, 1.0), all DU methods return(zeros, ones)Marginal mode warning when
n_spots > 300nan_filling="none"warning for expensive per-gene kernel path
I/O:
load_visium_sp_metamoved tosplisosm.iomodule (backward-compatible import fromutils)
Documentation:
New methods.rst sections: FFT-accelerated DU test, SplisosmNP vs SplisosmFFT differences
quickstart.rst: complete rewrite with full-defaults code blocks, configuration reference tables
New tutorial notebooks:
sit_mob_demo.ipynb,xenium_sc_segmented.ipynb
Bug Fixes#
Fix
MultinomGLMMtraining:marginal_newton: fix epoch counter_update_joint_newton: used beta value instead of gradient for Newton stepget_params_iter(): fix bug where default_collate was called on a dict
Fix
ValueErrorwhencovariate_namesis a numpy array (truthiness check onor)Fix PyTorch deprecation warning in score test (
linalg.solveoutput tensor resize)Fix
compute_feature_summariesassertion when gene display names differ from groupby keysFix
SplisosmGLMM.save()to strip per-gene kernel buffers and raw adata, reducing file sizeFix
run_hsic_gcAnnData mode: correct sparse matrix handling and filteringFix
SpatialCovKerneleigenvalue caching: recompute when rank changesFix
xtKxcomputation for implicit (sparse precision) kernelsFix all
warnings.warncalls to includestacklevel=2for correct caller attributionFix AnnData obs index handling (ensure string indices throughout)
Testing#
371 tests passing (up from ~250 in v1.0.4), 5 skipped (GPU-only)
New test classes:
TestParallelNP,TestSigmaHessianAnalytic,TestMultinomGLMMLowRank,TestRunHsicGcAdded parallelism determinism tests (n_jobs=1 vs n_jobs=2) for all SV/DU methods in NP and FFT
Added analytic sigma Hessian tests (7 parameterization × prior combinations)
Added identity fast path equivalence tests (log-prob and Hessian)
Added
with_gene_summary,filtered_adata, single-isoform edge case testsAdded sparse vs dense consistency tests for design matrix and count handling
Added save/load roundtrip tests with kernel re-linking verification
Added GPU device tests (CPU/CUDA/MPS) for SplisosmGLMM workflows