splisosm.hyptest_fft#
FFT-accelerated non-parametric hypothesis tests for SPLISOSM.
Classes#
FFT-accelerated SPLISOSM model for rasterized spatial isoform testing. |
Module Contents#
- class splisosm.hyptest_fft.SplisosmFFT(rho=0.99, neighbor_degree=1, spacing=(1.0, 1.0), workers=None)#
FFT-accelerated SPLISOSM model for rasterized spatial isoform testing.
The class follows the non-parametric SPLISOSM workflow but consumes a SpatialData table directly and rasterizes per-gene isoform counts on demand.
Examples
Spatial variability test:
>>> from splisosm import SplisosmFFT >>> model = SplisosmFFT(rho=0.9, neighbor_degree=1) >>> model.setup_data( ... sdata=sdata, ... bins="ID_square_016um", ... table_name="square_016um", ... col_key="array_col", ... row_key="array_row", ... layer="counts", ... group_iso_by="gene_ids", ... gene_names="gene_name", ... min_counts=10, ... min_bin_pct=0.0, ... ) >>> model.test_spatial_variability(method="hsic-ir") >>> sv_results = model.get_formatted_test_results(test_type="sv")
Differential usage test:
>>> model = SplisosmFFT(rho=0.9, neighbor_degree=1) >>> model.setup_data( ... sdata=sdata, ... bins="ID_square_016um", ... table_name="square_016um_svp", ... design_mtx="square_016um_rbp_sve", ... col_key="array_col", ... row_key="array_row", ... layer="counts", ... group_iso_by="gene_ids", ... gene_names="gene_name", ... min_counts=10, ... min_bin_pct=0.0, ... ) >>> model.test_differential_usage(method="hsic-gp", residualize="cov_only") >>> du_results = model.get_formatted_test_results("du")
- Parameters:
rho (float) – Spatial autocorrelation coefficient for CAR kernel.
neighbor_degree (int) – Neighbor ring degree for CAR graph construction.
spacing (tuple[float, float]) – Raster spacing
(dy, dx).workers (int | None) – Number of FFT workers.
- extract_feature_summary(level='gene', print_progress=True)#
Compute filtered feature-level summary statistics.
Gene-level statistics are aggregated across all isoforms that passed the filters applied in
setup_data(). Isoform-level statistics are computed per isoform and augmented onto the corresponding rows ofadata.var.Results are cached: repeated calls with the same
levelreturn the cachedpandas.DataFramewithout recomputation.- Parameters:
level (Literal['gene', 'isoform']) – Summary granularity.
'gene': one row per gene.'isoform': one row per isoform that passed filtering.print_progress (bool) – Whether to show a progress bar.
- Returns:
For
level='gene', the index is the gene display name and the columns are:'n_isos': int. Number of isoforms retained after filtering.'perplexity': float. Effective number of isoforms based on the marginal isoform usage entropy.'pct_bin_on': float. Fraction of bins with non-zero total gene counts.'count_avg': float. Mean per-spot total count for the gene.'count_std': float. Std of per-spot total count for the gene.
For
level='isoform', the index is the isoform name (matchingadata.var_names) and the columns are the originaladata.varcolumns plus:'pct_bin_on': float. Fraction of bins with count > 0.'count_total': float. Total counts across all bins.'count_avg': float. Mean count per bin.'count_std': float. Std of count per bin.'ratio_total': float. Fraction of total gene counts attributable to this isoform.'ratio_avg': float. Mean per-bin isoform usage ratio (computed over bins with non-zero gene coverage).'ratio_std': float. Std of per-bin isoform usage ratio (computed over bins with non-zero gene coverage).
- Return type:
- Raises:
RuntimeError – If
setup_data()has not been called.ValueError – If
levelis not'gene'or'isoform'.
- get_formatted_test_results(test_type, with_gene_summary=False)#
Get formatted test results as a pandas DataFrame.
- Parameters:
test_type ({"sv", "du"}) – Test type:
"sv"for spatial variability or"du"for differential usage.with_gene_summary (bool, optional) – If
True, append gene-level summary statistics fromextract_feature_summary().
- Returns:
Formatted result table.
- Return type:
- setup_data(sdata, bins, table_name, col_key, row_key, layer='counts', group_iso_by='gene_symbol', gene_names=None, min_counts=10, min_bin_pct=0.0, filter_single_iso_genes=True, design_mtx=None, covariate_names=None)#
Setup SpatialData-backed isoform data for FFT-based testing.
(bins, table_name, col_key, row_key) are passed to
spatialdata.rasterize_bins()to rasterize isoform counts.- Parameters:
sdata (SpatialData) – SpatialData-like object with
tablesmapping.bins (str) – Name of the SpatialData bin geometry for rasterization.
table_name (str) – Key of the table in
sdata.tables.col_key (str) – Column index key in
adata.obsfor rasterization.row_key (str) – Row index key in
adata.obsfor rasterization.layer (str, optional) – AnnData layer that stores isoform count matrix.
group_iso_by (str, optional) – Column in
adata.varused to group isoforms by gene. The unique values of this column define the gene-level groups.gene_names (str or None, optional) – Optional column name in
adata.varwhose values are used as display gene names in results. IfNone, the values ofgroup_iso_byare used directly.min_counts (int, optional) – Minimum total count (summed across all spots) required for an isoform to be retained. Isoforms below this threshold are excluded before gene grouping. Genes with fewer than two remaining isoforms after filtering are also excluded.
min_bin_pct (float, optional) – Minimum percentage of bins in which an isoform must be expressed (count greater than zero) to be retained. Values in
[0, 1]are interpreted as fractions of bins, and values in(1, 100]are interpreted as percentages.filter_single_iso_genes (bool, optional) – If
True(default), genes with fewer than two isoforms passing QC filters are removed — they cannot contribute to within-gene ratio tests. Set toFalseto keep single-isoform genes, e.g. when testing gene-level expression variability withtest_spatial_variability(method="hsic-gc").design_mtx (str, list[str], np.ndarray, scipy.sparse matrix, pd.DataFrame, or None) –
Design matrix specification. Three input modes:
Table name (
strmatching a key insdata.tables): Use the existing AnnData table’sXas the design matrix. Must have the same number of observations as the isoform table.Obs column names (
strorlist[str]not matching a table): Extract the named columns from the isoform table’sadata.obs. Categorical columns are one-hot encoded automatically.Pre-computed matrix (ndarray, sparse, or DataFrame of shape
(n_obs, n_factors)): Used as-is.
In cases 2 and 3, the design matrix will be stored as a new AnnData table inside
sdata. The matrix is also rasterized viaspatialdata.rasterize_bins()whentest_differential_usage()is called.covariate_names (list[str] or None, optional) – Factor names. Will override inferred names. If None, inferred from
design_mtxcolumn names when possible; otherwise auto-generated as["factor_0", ...].
- Raises:
ValueError – If required table/layer/metadata is missing.
- Return type:
None
See also
splisosm.hyptest_np.SplisosmNP.setup_data()AnnData-based setup for data with general geometry.
- test_differential_usage(method='hsic-gp', ratio_transformation='none', gpr_configs=None, residualize='cov_only', n_jobs=-1, return_results=False, print_progress=True)#
Test for differential isoform usage against spatial covariate expression.
Before running this function, the design matrix must be set up using
setup_data(). Each column of the design matrix corresponds to a covariate to test for differential association with the isoform usage ratios of each gene. Test statistics and p-values are computed per (gene, covariate) pair separately.Four test strategies are supported, all operating on rasterized grid data to avoid densifying the full isoform or covariate matrix in memory:
"hsic-gp"(default): spatially residualize covariates (and optionally isoform ratios) withFFTKernelGPR, then compute linear HSIC. Controlled byresidualize."hsic": linear HSIC between raw centered isoform ratios and raw centered covariates—no spatial residualization."t-fisher": per-isoform two-sample t-tests (binary covariates only) combined by Fisher’s method (chi-squared, df = 2 × n_isoforms)."t-tippett": per-isoform two-sample t-tests (binary covariates only) combined by Tippett’s corrected minimum p-value.
Regardless of method, covariates are processed in chunks of at most 100 at a time and isoform data is loaded on-the-fly per gene so that neither the full covariate grid nor the full isoform matrix is held in memory simultaneously.
- Parameters:
method (str, optional) –
Method for association testing:
"hsic": Unconditional HSIC test (multivariate RV coefficient). For continuous factors, equivalent to the multivariate Pearson correlation test. For binary factors, equivalent to the two-sample Hotelling T**2 test."hsic-gp": Conditional HSIC test. Spatial effects are removed via Gaussian process regression before computing the HSIC statistic.
Or one of the T-tests (binary factors only):
"t-fisher","t-tippett": two-sample t-test per isoform (binary covariates only — exactly two distinct non-NaN values required); p-values are combined gene-wise via Fisher’s chi-squared or Tippett’s corrected minimum method.
ratio_transformation (str, optional) – Compositional transformation for isoform ratios. One of
'none','clr','ilr','alr','radial'[PYPA22]. Seesplisosm.utils.counts_to_ratios().gpr_configs (dict, optional) –
Nested configuration dict for the GPR objects, with optional keys
'covariate'and/or'isoform'. Each sub-dict is forwarded tosplisosm.kernel_gpr.make_kernel_gpr(). Unspecified keys use the defaults fromsplisosm.kernel_gpr._DEFAULT_GPR_CONFIGS:{ "covariate": { "constant_value": 1.0, "constant_value_bounds": (1e-3, 1e3), "length_scale": 1.0, "length_scale_bounds": "fixed", }, "isoform": { "constant_value": 1.0, "constant_value_bounds": (1e-3, 1e3), "length_scale": 1.0, "length_scale_bounds": "fixed", }, }
residualize ({"cov_only", "both"}, optional) –
Controls which signals are spatially residualized when
method="hsic-gp":"cov_only"(default): residualize covariates only; test HSIC(Z_res, Y_raw). Fastest; calibration matches"both"when covariate GPR captures most spatial confounding."both": residualize both covariates and isoform ratios.
n_jobs (int, optional) – Number of parallel jobs.
-1uses all available CPUs.print_progress (bool, optional) – Whether to show the progress bar. Default to True.
return_results (bool, optional) – Whether to return the test statistics and p-values. If False, the results are stored in
self._du_test_results.
- Returns:
results – If
return_resultsis True, returns dict with test statistics and p-values. Otherwise, returns None and stores results inself._du_test_results.- Return type:
dict or None
- Raises:
RuntimeError – If
setup_data()or thedesign_mtxargument has not been set.ValueError – If
method,residualize, orratio_transformationis invalid.
See also
splisosm.hyptest_np.SplisosmNP.test_differential_usage()Non-FFT version of this function for comparison.
- test_spatial_variability(method='hsic-ir', ratio_transformation='none', n_jobs=-1, return_results=False, print_progress=True)#
Test for spatial variability using FFT-accelerated HSIC.
- Parameters:
method ({"hsic-ir", "hsic-ic", "hsic-gc"}, optional) – One of
"hsic-ir"(isoform ratios),"hsic-ic"(isoform counts), or"hsic-gc"(gene counts).ratio_transformation ({"none", "clr", "ilr", "alr", "radial"}, optional) – Ratio transform used when
method="hsic-ir".n_jobs (int, optional) – Number of joblib workers.
-1uses all available CPUs.return_results (bool, optional) – If True, return result dictionary.
print_progress (bool, optional) – Whether to show a progress bar.
- Returns:
Result dictionary when
return_results=True; otherwiseNone.- Return type:
dict or None
See also
splisosm.hyptest_np.SplisosmNP.test_spatial_variability()Non-FFT version of this function for comparison.
- design_mtx: Any | None#
Design matrix stored as an AnnData table inside
sdata.Noneif no covariates.
- n_factors: int#
Number of covariates for differential usage testing.
- n_genes: int#
Number of genes after filtering.
- n_grid: int#
Total raster grid cells (
ny * nx, including zero-padded positions).
- n_spots: int#
Number of observed spots (bins with non-zero data).
- sdata: Any | None#
Source
SpatialDataobject;Nonebeforesetup_data().