splisosm.hyptest_fft

splisosm.hyptest_fft#

FFT-accelerated non-parametric hypothesis tests for SPLISOSM.

Classes#

SplisosmFFT

FFT-accelerated SPLISOSM model for rasterized spatial isoform testing.

Module Contents#

class splisosm.hyptest_fft.SplisosmFFT(rho=0.99, neighbor_degree=1, spacing=(1.0, 1.0), workers=None)#

FFT-accelerated SPLISOSM model for rasterized spatial isoform testing.

The class follows the non-parametric SPLISOSM workflow but consumes a SpatialData table directly and rasterizes per-gene isoform counts on demand.

Examples

Spatial variability test:

>>> from splisosm import SplisosmFFT
>>> model = SplisosmFFT(rho=0.9, neighbor_degree=1)
>>> model.setup_data(
...     sdata=sdata,
...     bins="ID_square_016um",
...     table_name="square_016um",
...     col_key="array_col",
...     row_key="array_row",
...     layer="counts",
...     group_iso_by="gene_ids",
...     gene_names="gene_name",
...     min_counts=10,
...     min_bin_pct=0.0,
... )
>>> model.test_spatial_variability(method="hsic-ir")
>>> sv_results = model.get_formatted_test_results(test_type="sv")

Differential usage test:

>>> model = SplisosmFFT(rho=0.9, neighbor_degree=1)
>>> model.setup_data(
...     sdata=sdata,
...     bins="ID_square_016um",
...     table_name="square_016um_svp",
...     design_mtx="square_016um_rbp_sve",
...     col_key="array_col",
...     row_key="array_row",
...     layer="counts",
...     group_iso_by="gene_ids",
...     gene_names="gene_name",
...     min_counts=10,
...     min_bin_pct=0.0,
... )
>>> model.test_differential_usage(method="hsic-gp", residualize="cov_only")
>>> du_results = model.get_formatted_test_results("du")

Parameters:

rho (float) – Spatial autocorrelation coefficient for CAR kernel.
neighbor_degree (int) – Neighbor ring degree for CAR graph construction.
spacing (tuple[float, float]) – Raster spacing (dy, dx).
workers (int | None) – Number of FFT workers.

extract_feature_summary(level='gene', print_progress=True)#

Compute filtered feature-level summary statistics.

Gene-level statistics are aggregated across all isoforms that passed the filters applied in setup_data(). Isoform-level statistics are computed per isoform and augmented onto the corresponding rows of adata.var.

Results are cached: repeated calls with the same level return the cached pandas.DataFrame without recomputation.

Parameters:

level (Literal['gene', 'isoform']) – Summary granularity. 'gene': one row per gene. 'isoform': one row per isoform that passed filtering.
print_progress (bool) – Whether to show a progress bar.

Returns:

For level='gene', the index is the gene display name and the columns are:

'n_isos': int. Number of isoforms retained after filtering.
'perplexity': float. Effective number of isoforms based on the marginal isoform usage entropy.
'pct_bin_on': float. Fraction of bins with non-zero total gene counts.
'count_avg': float. Mean per-spot total count for the gene.
'count_std': float. Std of per-spot total count for the gene.

For level='isoform', the index is the isoform name (matching adata.var_names) and the columns are the original adata.var columns plus:

'pct_bin_on': float. Fraction of bins with count > 0.
'count_total': float. Total counts across all bins.
'count_avg': float. Mean count per bin.
'count_std': float. Std of count per bin.
'ratio_total': float. Fraction of total gene counts attributable to this isoform.
'ratio_avg': float. Mean per-bin isoform usage ratio (computed over bins with non-zero gene coverage).
'ratio_std': float. Std of per-bin isoform usage ratio (computed over bins with non-zero gene coverage).

Return type:

DataFrame

Raises:

RuntimeError – If setup_data() has not been called.
ValueError – If level is not 'gene' or 'isoform'.

get_formatted_test_results(test_type, with_gene_summary=False)#

Get formatted test results as a pandas DataFrame.

Parameters:

test_type ({"sv", "du"}) – Test type: "sv" for spatial variability or "du" for differential usage.
with_gene_summary (bool, optional) – If True, append gene-level summary statistics from extract_feature_summary().

Returns:

Formatted result table.

Return type:

DataFrame

setup_data(sdata, bins, table_name, col_key, row_key, layer='counts', group_iso_by='gene_symbol', gene_names=None, min_counts=10, min_bin_pct=0.0, filter_single_iso_genes=True, design_mtx=None, covariate_names=None)#

Setup SpatialData-backed isoform data for FFT-based testing.

(bins, table_name, col_key, row_key) are passed to spatialdata.rasterize_bins() to rasterize isoform counts.

Parameters:

sdata (SpatialData) – SpatialData-like object with tables mapping.
bins (str) – Name of the SpatialData bin geometry for rasterization.
table_name (str) – Key of the table in sdata.tables.
col_key (str) – Column index key in adata.obs for rasterization.
row_key (str) – Row index key in adata.obs for rasterization.
layer (str, optional) – AnnData layer that stores isoform count matrix.
group_iso_by (str, optional) – Column in adata.var used to group isoforms by gene. The unique values of this column define the gene-level groups.
gene_names (str or None, optional) – Optional column name in adata.var whose values are used as display gene names in results. If None, the values of group_iso_by are used directly.
min_counts (int, optional) – Minimum total count (summed across all spots) required for an isoform to be retained. Isoforms below this threshold are excluded before gene grouping. Genes with fewer than two remaining isoforms after filtering are also excluded.
min_bin_pct (float, optional) – Minimum percentage of bins in which an isoform must be expressed (count greater than zero) to be retained. Values in [0, 1] are interpreted as fractions of bins, and values in (1, 100] are interpreted as percentages.
filter_single_iso_genes (bool, optional) – If True (default), genes with fewer than two isoforms passing QC filters are removed — they cannot contribute to within-gene ratio tests. Set to False to keep single-isoform genes, e.g. when testing gene-level expression variability with test_spatial_variability(method="hsic-gc").
design_mtx (str, list[str], np.ndarray, scipy.sparse matrix, pd.DataFrame, or None) –
Design matrix specification. Three input modes:
1. Table name (str matching a key in sdata.tables): Use the existing AnnData table’s X as the design matrix. Must have the same number of observations as the isoform table.
2. Obs column names (str or list[str] not matching a table): Extract the named columns from the isoform table’s adata.obs. Categorical columns are one-hot encoded automatically.
3. Pre-computed matrix (ndarray, sparse, or DataFrame of shape (n_obs, n_factors)): Used as-is.
In cases 2 and 3, the design matrix will be stored as a new AnnData table inside sdata. The matrix is also rasterized via spatialdata.rasterize_bins() when test_differential_usage() is called.
covariate_names (list[str] or None, optional) – Factor names. Will override inferred names. If None, inferred from design_mtx column names when possible; otherwise auto-generated as ["factor_0", ...].

Raises:

ValueError – If required table/layer/metadata is missing.

Return type:

None

See also

splisosm.hyptest_np.SplisosmNP.setup_data(): AnnData-based setup for data with general geometry.

test_differential_usage(method='hsic-gp', ratio_transformation='none', gpr_configs=None, residualize='cov_only', n_jobs=-1, return_results=False, print_progress=True)#

Test for differential isoform usage against spatial covariate expression.

Before running this function, the design matrix must be set up using setup_data(). Each column of the design matrix corresponds to a covariate to test for differential association with the isoform usage ratios of each gene. Test statistics and p-values are computed per (gene, covariate) pair separately.

Four test strategies are supported, all operating on rasterized grid data to avoid densifying the full isoform or covariate matrix in memory:

"hsic-gp" (default): spatially residualize covariates (and optionally isoform ratios) with FFTKernelGPR, then compute linear HSIC. Controlled by residualize.
"hsic": linear HSIC between raw centered isoform ratios and raw centered covariates—no spatial residualization.
"t-fisher": per-isoform two-sample t-tests (binary covariates only) combined by Fisher’s method (chi-squared, df = 2 × n_isoforms).
"t-tippett": per-isoform two-sample t-tests (binary covariates only) combined by Tippett’s corrected minimum p-value.

Regardless of method, covariates are processed in chunks of at most 100 at a time and isoform data is loaded on-the-fly per gene so that neither the full covariate grid nor the full isoform matrix is held in memory simultaneously.

Parameters:

method (str, optional) –
Method for association testing:
- "hsic": Unconditional HSIC test (multivariate RV coefficient). For continuous factors, equivalent to the multivariate Pearson correlation test. For binary factors, equivalent to the two-sample Hotelling T**2 test.
- "hsic-gp": Conditional HSIC test. Spatial effects are removed via Gaussian process regression before computing the HSIC statistic.
Or one of the T-tests (binary factors only):
- "t-fisher", "t-tippett": two-sample t-test per isoform (binary covariates only — exactly two distinct non-NaN values required); p-values are combined gene-wise via Fisher’s chi-squared or Tippett’s corrected minimum method.
ratio_transformation (str, optional) – Compositional transformation for isoform ratios. One of 'none', 'clr', 'ilr', 'alr', 'radial' [PYPA22]. See splisosm.utils.counts_to_ratios().

gpr_configs (dict, optional) –

Nested configuration dict for the GPR objects, with optional keys 'covariate' and/or 'isoform'. Each sub-dict is forwarded to splisosm.kernel_gpr.make_kernel_gpr(). Unspecified keys use the defaults from splisosm.kernel_gpr._DEFAULT_GPR_CONFIGS:

{
    "covariate": {
        "constant_value": 1.0,
        "constant_value_bounds": (1e-3, 1e3),
        "length_scale": 1.0,
        "length_scale_bounds": "fixed",
    },
    "isoform": {
        "constant_value": 1.0,
        "constant_value_bounds": (1e-3, 1e3),
        "length_scale": 1.0,
        "length_scale_bounds": "fixed",
    },
}

residualize ({"cov_only", "both"}, optional) –
Controls which signals are spatially residualized when method="hsic-gp":
- "cov_only" (default): residualize covariates only; test HSIC(Z_res, Y_raw). Fastest; calibration matches "both" when covariate GPR captures most spatial confounding.
- "both": residualize both covariates and isoform ratios.
n_jobs (int, optional) – Number of parallel jobs. -1 uses all available CPUs.
print_progress (bool, optional) – Whether to show the progress bar. Default to True.
return_results (bool, optional) – Whether to return the test statistics and p-values. If False, the results are stored in self._du_test_results.

Returns:

results – If return_results is True, returns dict with test statistics and p-values. Otherwise, returns None and stores results in self._du_test_results.

Return type:

dict or None

Raises:

RuntimeError – If setup_data() or the design_mtx argument has not been set.
ValueError – If method, residualize, or ratio_transformation is invalid.