splisosm.hyptest_fft#
FFT-accelerated non-parametric hypothesis tests for SPLISOSM.
Classes#
FFT-based spatial kernel on a periodic 2D raster grid. |
|
FFT-accelerated SPLISOSM model for rasterized spatial isoform testing. |
Module Contents#
- class splisosm.hyptest_fft.FFTKernel(shape, spacing=(1.0, 1.0), rho=0.99, neighbor_degree=1, workers=None)#
FFT-based spatial kernel on a periodic 2D raster grid.
This implementation currently supports only a CAR-style spatial kernel equivalent to a periodic, neighborhood graph-based autoregressive model.
- Parameters:
shape (tuple[int, int]) – Grid shape
(ny, nx).spacing (tuple[float, float]) – Physical spacing
(dy, dx)between neighboring raster cells.rho (float) – Spatial autocorrelation coefficient in CAR kernel.
neighbor_degree (int) – Neighbor ring degree for graph construction.
1uses nearest neighbors in the periodic metric.workers (int | None) – Number of workers used by
scipy.fft.fft2.
- apply_residual_op(x, epsilon)#
Apply the kernel regression residual operator
R = epsilon * (K + epsilon * I)**(-1).Computed in O(N log N) via FFT as:
R @ v = IFFT2( epsilon / (lambda + epsilon) * FFT2(v) )
- Parameters:
x (ndarray) – Input with shape
(ny, nx)or(ny, nx, m).epsilon (float) – Regularization / noise level.
- Returns:
Residuals of the same shape as
x.- Return type:
np.ndarray
- eigenvalues(k=None)#
Return kernel eigenvalues.
- Parameters:
k (int | None) – Number of leading eigenvalues to return. If
None, return all.- Returns:
Eigenvalues in descending order.
- Return type:
np.ndarray
- power_spectral_density_1d(bins=50)#
Compute the 1D power spectral density (radial profile).
- square_trace()#
Return
trace(K^2).- Return type:
float
- trace()#
Return
trace(K).- Return type:
float
- xtKx(x)#
Compute
x^T K xinO(N log N)via FFT.- Parameters:
x (ndarray) – Input with shape
(ny, nx)or(ny, nx, m).- Returns:
Scalar for 2D input, or shape
(m,)for 3D input.- Return type:
float or np.ndarray
- n_grid#
- neighbor_degree = 1#
- rho#
- spectrum#
- workers = None#
- class splisosm.hyptest_fft.SplisosmFFT(rho=0.99, neighbor_degree=1, spacing=(1.0, 1.0), workers=None)#
FFT-accelerated SPLISOSM model for rasterized spatial isoform testing.
The class follows the non-parametric SPLISOSM workflow but consumes a SpatialData table directly and rasterizes per-gene isoform counts on demand.
Examples
>>> from splisosm import SplisosmFFT >>> model = SplisosmFFT(rho=0.9, neighbor_degree=1) >>> model.setup_data( ... sdata=sdata, ... bins="ID_square_016um", ... table_name="square_016um", ... col_key="array_col", ... row_key="array_row", ... layer="counts", ... group_iso_by="gene_ids", ... gene_names="gene_name", ... min_counts=10, ... min_bin_pct=0.0, ... ) >>> model.test_spatial_variability(method="hsic-ir") >>> sv_results = model.get_formatted_test_results(test_type="sv")
- Parameters:
rho (float) – Spatial autocorrelation coefficient for CAR kernel.
neighbor_degree (int) – Neighbor ring degree for CAR graph construction.
spacing (tuple[float, float]) – Raster spacing
(dy, dx).workers (int | None) – Number of FFT workers.
- extract_feature_summary(level='gene', print_progress=True)#
Compute filtered feature-level summary statistics.
Gene-level statistics are aggregated across all isoforms that passed the filters applied in
setup_data(). Isoform-level statistics are computed per isoform and augmented onto the corresponding rows ofadata.var.Results are cached: repeated calls with the same
levelreturn the cachedpandas.DataFramewithout recomputation.- Parameters:
level (Literal['gene', 'isoform']) – Summary granularity.
'gene': one row per gene.'isoform': one row per isoform that passed filtering.print_progress (bool) – Whether to show a progress bar.
- Returns:
For
level='gene', the index is the gene display name and the columns are:'n_isos': int. Number of isoforms retained after filtering.'perplexity': float. Effective number of isoforms based on the marginal isoform usage entropy.'pct_bin_on': float. Fraction of bins with non-zero total gene counts.'count_avg': float. Mean per-spot total count for the gene.'count_std': float. Std of per-spot total count for the gene.
For
level='isoform', the index is the isoform name (matchingadata.var_names) and the columns are the originaladata.varcolumns plus:'pct_bin_on': float. Fraction of bins with count > 0.'count_total': float. Total counts across all bins.'count_avg': float. Mean count per bin.'count_std': float. Std of count per bin.'ratio_total': float. Fraction of total gene counts attributable to this isoform.'ratio_avg': float. Mean per-bin isoform usage ratio (computed over bins with non-zero gene coverage).'ratio_std': float. Std of per-bin isoform usage ratio (computed over bins with non-zero gene coverage).
- Return type:
- Raises:
RuntimeError – If
setup_data()has not been called.ValueError – If
levelis not'gene'or'isoform'.
- get_formatted_test_results(test_type)#
Get formatted test results as a pandas DataFrame.
- Parameters:
test_type (Literal['sv', 'du']) – Test type:
"sv"for spatial variability or"du"for differential usage.- Returns:
Formatted result table.
- Return type:
- setup_data(sdata, bins, table_name, col_key, row_key, layer='counts', group_iso_by='gene_symbol', gene_names=None, min_counts=10, min_bin_pct=0.0, filter_single_iso_genes=True, design_mtx=None, covariate_names=None)#
Setup SpatialData-backed isoform data for FFT-based testing.
(bins, table_name, col_key, row_key) are passed to
spatialdata.rasterize_bins()to rasterize isoform counts.- Parameters:
sdata (Any) – SpatialData-like object with
tablesmapping.bins (str) – Name of the SpatialData bin geometry for rasterization.
table_name (str) – Key of the table in
sdata.tables.col_key (str) – Column index key in
adata.obsfor rasterization.row_key (str) – Row index key in
adata.obsfor rasterization.layer (str) – AnnData layer that stores isoform count matrix.
group_iso_by (str) – Column in
adata.varused to group isoforms by gene. The unique values of this column define the gene-level groups.gene_names (Optional[str]) – Optional column name in
adata.varwhose values are used as display gene names in results. IfNone, the values ofgroup_iso_byare used directly.min_counts (int) – Minimum total count (summed across all spots) required for an isoform to be retained. Isoforms below this threshold are excluded before gene grouping. Genes with fewer than two remaining isoforms after filtering are also excluded.
min_bin_pct (float) – Minimum percentage of bins in which an isoform must be expressed (count greater than zero) to be retained. Values in
[0, 1]are interpreted as fractions of bins, and values in(1, 100]are interpreted as percentages.filter_single_iso_genes (bool, optional) – If
True(default), genes with fewer than two isoforms passing QC filters are removed — they cannot contribute to within-gene ratio tests. Set toFalseto keep single-isoform genes, e.g. when testing gene-level expression variability withtest_spatial_variability(method="hsic-gc").design_mtx (str, list[str], np.ndarray, scipy.sparse matrix, pd.DataFrame, or None) –
Design matrix specification. Three input modes:
Table name (
strmatching a key insdata.tables): Use the existing AnnData table’sXas the design matrix. Must have the same number of observations as the isoform table.Obs column names (
strorlist[str]not matching a table): Extract the named columns from the isoform table’sadata.obs. Categorical columns are one-hot encoded automatically.Pre-computed matrix (ndarray, sparse, or DataFrame of shape
(n_obs, n_factors)): Used as-is.
In cases 2 and 3, the design matrix will be stored as a new AnnData table inside
sdata. The matrix is also rasterized viaspatialdata.rasterize_bins()whentest_differential_usage()is called.covariate_names (list[str] or None, optional) – Factor names. Will override inferred names. If None, inferred from
design_mtxcolumn names when possible; otherwise auto-generated as["factor_0", ...].
- Raises:
ValueError – If required table/layer/metadata is missing.
- Return type:
None
See also
splisosm.hyptest_np.SplisosmNP.setup_data()AnnData-based setup for data with general geometry.
- test_differential_usage(method='hsic-gp', ratio_transformation='none', gpr_configs=None, residualize='cov_only', n_jobs=-1, return_results=False, print_progress=True)#
Test for differential isoform usage against spatial covariate expression.
Before running this function, the design matrix must be set up using
setup_data(). Each column of the design matrix corresponds to a covariate to test for differential association with the isoform usage ratios of each gene. Test statistics and p-values are computed per (gene, covariate) pair separately.Four test strategies are supported, all operating on rasterized grid data to avoid densifying the full isoform or covariate matrix in memory:
"hsic-gp"(default): spatially residualize covariates (and optionally isoform ratios) withFFTKernelGPR, then compute linear HSIC. Controlled byresidualize."hsic": linear HSIC between raw centered isoform ratios and raw centered covariates—no spatial residualization."t-fisher": per-isoform two-sample t-tests (binary covariates only) combined by Fisher’s method (chi-squared, df = 2 × n_isoforms)."t-tippett": per-isoform two-sample t-tests (binary covariates only) combined by Tippett’s corrected minimum p-value.
Regardless of method, covariates are processed in chunks of at most 100 at a time and isoform data is loaded on-the-fly per gene so that neither the full covariate grid nor the full isoform matrix is held in memory simultaneously.
- Parameters:
method (str, optional) –
Method for association testing:
"hsic": Unconditional HSIC test (multivariate RV coefficient). For continuous factors, equivalent to the multivariate Pearson correlation test. For binary factors, equivalent to the two-sample Hotelling T**2 test."hsic-gp": Conditional HSIC test. Spatial effects are removed via Gaussian process regression before computing the HSIC statistic.
Or one of the T-tests (binary factors only):
"t-fisher","t-tippett": two-sample t-test per isoform (binary covariates only — exactly two distinct non-NaN values required); p-values are combined gene-wise via Fisher’s chi-squared or Tippett’s corrected minimum method.
ratio_transformation (str, optional) – Compositional transformation for isoform ratios. One of
'none','clr','ilr','alr','radial'[PYPA22]. Seesplisosm.utils.counts_to_ratios().gpr_configs (dict, optional) –
Nested configuration dict for the GPR objects, with optional keys
'covariate'and/or'isoform'. Each sub-dict is forwarded tosplisosm.kernel_gpr.make_kernel_gpr(). Unspecified keys use the defaults fromsplisosm.kernel_gpr._DEFAULT_GPR_CONFIGS:{ "covariate": { "constant_value": 1.0, "constant_value_bounds": (1e-3, 1e3), "length_scale": 1.0, "length_scale_bounds": "fixed", }, "isoform": { "constant_value": 1.0, "constant_value_bounds": (1e-3, 1e3), "length_scale": 1.0, "length_scale_bounds": "fixed", }, }
residualize ({"cov_only", "both"}, optional) –
Controls which signals are spatially residualized when
method="hsic-gp":"cov_only"(default): residualize covariates only; test HSIC(Z_res, Y_raw). Fastest; calibration matches"both"when covariate GPR captures most spatial confounding."both": residualize both covariates and isoform ratios.
n_jobs (int, optional) – Number of parallel jobs.
-1uses all available CPUs.print_progress (bool, optional) – Whether to show the progress bar. Default to True.
return_results (bool, optional) – Whether to return the test statistics and p-values. If False, the results are stored in
self.du_test_results.
- Returns:
results – If
return_resultsis True, returns dict with test statistics and p-values. Otherwise, returns None and stores results inself.du_test_results.- Return type:
dict or None
- Raises:
RuntimeError – If
setup_data()or thedesign_mtxargument has not been set.ValueError – If
method,residualize, orratio_transformationis invalid.
See also
splisosm.hyptest_np.SplisosmNP.test_differential_usage()Non-FFT version of this function for comparison.
- test_spatial_variability(method='hsic-ir', ratio_transformation='none', n_jobs=-1, return_results=False, print_progress=True)#
Test for spatial variability using FFT-accelerated HSIC.
- Parameters:
method (Literal['hsic-ir', 'hsic-ic', 'hsic-gc']) – One of
"hsic-ir"(isoform ratios),"hsic-ic"(isoform counts), or"hsic-gc"(gene counts).ratio_transformation (Literal['none', 'clr', 'ilr', 'alr', 'radial']) – Ratio transform used when
method="hsic-ir".n_jobs (int) – Number of joblib workers.
-1uses all available CPUs.return_results (bool) – If True, return result dictionary.
print_progress (bool) – Whether to show a progress bar.
- Returns:
Result dictionary when
return_results=True; otherwiseNone.- Return type:
dict or None
See also
splisosm.hyptest_np.SplisosmNP.test_spatial_variability()Non-FFT version of this function for comparison.
- covariate_names: list[str] = []#
- design_mtx: Any | None = None#
- du_test_results: dict#
Dictionary to store the differential usage test results after running test_differential_usage(). It contains the following keys:
'method': str, the method used for the test.'statistic': numpy.ndarray of shape (n_genes, n_covariates), the test statistic for each gene and covariate.'pvalue': numpy.ndarray of shape (n_genes, n_covariates), the p-value for each gene and covariate.'pvalue_adj': numpy.ndarray of shape (n_genes, n_covariates), the BH adjusted p-value for each gene and covariate. Each column/covariate is adjusted separately.
- gene_names: list[str]#
List of gene names corresponding to the genes in the model after filtering.
- n_factors: int = 0#
- n_genes: int#
Number of genes after filtering.
- n_grid: int#
Number of raster grid bins (including padding). n_grid = n_y * n_x
- n_isos: list[int]#
List of numbers of isoforms per gene after filtering.
- n_spots: int#
Number of observed spots (bins).
- sdata: Any | None#
SpatialData object containing the input data.
- sv_test_results: dict#
Dictionary to store the spatial variability test results after running test_spatial_variability(). It contains the following keys:
'method': str, the method used for the test.'statistic': numpy.ndarray of shape (n_genes,), the test statistic for each gene.'pvalue': numpy.ndarray of shape (n_genes,), the p-value for each gene.'pvalue_adj': numpy.ndarray of shape (n_genes,), the BH adjusted p-value for each gene.