splisosm.utils#
General utilities for preprocessing and statistical helpers.
Functions#
|
Convert isoform counts to proportions. |
|
Extract per-gene lists of isoform counts and ratios from anndata. |
|
Extract gene-level metadata from isoform-level counts anndata. |
|
Adjust p-values to control the false discovery rate. |
|
Wrapper function to get the spatial covariance matrix from spatial coordinates. |
|
Helper function to load Visium spatial metadata. |
|
Extract and filter isoform count tensors from an AnnData object. |
|
Function to compute HSIC-GC statistic for gene-level counts. |
|
Wrapper for running the SPARK-X test for spatial gene expression variability. |
Module Contents#
- splisosm.utils.counts_to_ratios(counts, transformation='none', nan_filling='mean')#
Convert isoform counts to proportions.
By default, isoform ratios at zero-coverage spots are filled with the mean ratio per isoform across all spots. After conversion, the isoform ratios can be further transformed using log-ratio-based transformations (clr, ilr, alr) or radial transformation [PYPA22].
- Parameters:
counts (ndarray | Tensor) – Shape (n_spots, n_isos). Isoform counts.
transformation (Literal['none', 'clr', 'ilr', 'alr', 'radial']) – Transformation applied to the proportions. Can be one of the following:
'none': no transformation, return isoform ratios.'clr': centered log-ratio transformation.'ilr': isometric log-ratio transformation.'alr': additive log-ratio transformation.'radial': radial transformation [PYPA22].nan_filling (Literal['mean', 'none']) – Method to fill all-zero rows.
'mean': fill all-zero rows with the mean of the mean per column before transformation.'none': do not fill rows and return NaNs at all-zero rows.
- Returns:
ratios – Shape (n_spots, n_isos) or (n_spots, n_isos - 1) if ilr or alr transformation is used.
- Return type:
Notes
Log-ratio-based transformations (clr, ilr, alr) are implemented via
scikit-bio, with a pseudocount of 1% of the global mean per isoform to avoid zeros in the ratio.
- splisosm.utils.extract_counts_n_ratios(adata, layer='counts', group_iso_by='gene_symbol', return_sparse=False, filter_single_iso_genes=True)#
Extract per-gene lists of isoform counts and ratios from anndata.
- Parameters:
adata (AnnData) – Annotated data matrix.
layer (str) – Layer to extract isoform counts (adata.layers[layer]).
group_iso_by (str) – Gene index in adata.var to group isoforms by.
return_sparse (bool) – Whether to return sparse torch tensors for counts_list. If True,
ratios_listwill be empty andratio_obs_mergedwill be None.filter_single_iso_genes (bool) – Whether to filter out genes with only one isoform. By default True for compatibility with splisosm models.
- Returns:
counts_list (list[torch.Tensor]) – Isoform counts per gene, each of shape (n_spots, n_isos).
ratios_list (list[torch.Tensor]) – Isoform ratios per gene, each of shape (n_spots, n_isos).
gene_name_list (list[str]) – Gene names.
ratio_obs_merged (np.ndarray | None) – Observed isoform ratios, shape (n_spots, n_isos_total), or None if
return_sparseis True.
- Return type:
tuple[list[Tensor], list[Tensor], list[str], Optional[ndarray]]
- splisosm.utils.extract_gene_level_statistics(adata, layer='counts', group_iso_by='gene_symbol')#
Extract gene-level metadata from isoform-level counts anndata.
- Parameters:
adata (AnnData) – Annotated data matrix.
layer (str) – Layer to extract isoform counts (adata.layers[layer]).
group_iso_by (str) – Gene index in adata.var to group isoforms by.
- Returns:
Gene-level metadata with columns:
'n_iso': int. Number of isoforms per gene.'pct_spot_on': float. Percentage of spots with non-zero counts.'count_avg': float. Average counts per gene.'count_std': float. Standard deviation of counts per gene.'perplexity': float. Expression-based effective number of isoforms.'major_ratio_avg': float. Average ratio of the major isoform.
- Return type:
- splisosm.utils.false_discovery_control(ps, *, axis=0, method='bh')#
Adjust p-values to control the false discovery rate.
The false discovery rate (FDR) is the expected proportion of rejected null hypotheses that are actually true. If the null hypothesis is rejected when the adjusted p-value falls below a specified level, the false discovery rate is controlled at that level.
- Parameters:
ps (numpy.typing.ArrayLike) – The p-values to adjust. Elements must be real numbers between 0 and 1.
axis (Optional[int]) – The axis along which to perform the adjustment. The adjustment is performed independently along each axis-slice. If
axisis None,psis raveled before performing the adjustment.method (Literal['bh', 'by']) – The false discovery rate control procedure to apply:
'bh'is for Benjamini-Hochberg [BH95] (Eq. 1),'by'is for Benjaminini-Yekutieli [BY01] (Theorem 1.3). The latter is more conservative, but it is guaranteed to control the FDR even when the p-values are not from independent tests.
- Returns:
ps_adjusted – The adjusted p-values. If the null hypothesis is rejected where these fall below a specified level, the false discovery rate is controlled at that level.
- Return type:
Notes
From
scipy.stats.false_discovery_controlin SciPy v1.13.1. See scipy/scipy.
- splisosm.utils.get_cov_sp(coords, k=4, rho=0.99)#
Wrapper function to get the spatial covariance matrix from spatial coordinates.
It will first construct a mutual-k-nearest neighbor graph from the euclidean spatial coordinates, then convert the adjacency matrix to a standardized spatial covariance matrix using the intrinsic conditional autoregressive (ICAR) model with spatial autocorrelation coefficient rho. See [SRF+23] for details.
- Parameters:
- Returns:
cov_sp – Shape (n_spots, n_spots). Spatial covariance matrix with standardized variance (== 1).
- Return type:
- splisosm.utils.load_visium_sp_meta(adata, path_to_spatial, library_id=None)#
Helper function to load Visium spatial metadata.
- Parameters:
adata (AnnData) – Annotated data matrix to store the spatial metadata.
path_to_spatial (str | pathlib.Path) – Path to the
spatialfolder generated by Space Ranger.library_id (Optional[str]) – Library ID of the spatial data.
- Returns:
anndata – AnnData with spatial metadata.
- Return type:
- splisosm.utils.prepare_inputs_from_anndata(adata, layer, group_iso_by, spatial_key, min_counts, min_bin_pct, filter_single_iso_genes, gene_names, design_mtx, covariate_names)#
Extract and filter isoform count tensors from an AnnData object.
Shared helper used by both
splisosm.hyptest_np.SplisosmNPandsplisosm.hyptest_glmm.SplisosmGLMMto prepare legacy-compatible tensors from an AnnData input. Feature filtering, sparse/dense handling, coordinate extraction, and design-matrix resolution are all performed here.- Parameters:
adata (AnnData) – Annotated data matrix.
layer (str) – Key in
adata.layerscontaining raw isoform counts.group_iso_by (str) – Column in
adata.varused to group isoforms by gene.spatial_key (str) – Key in
adata.obsmfor spatial coordinates.min_counts (int) – Minimum total isoform count across spots required to retain an isoform.
min_bin_pct (float) – Minimum fraction/percentage of spots with non-zero expression for an isoform. Values in
[0, 1]are treated as fractions; values in(1, 100]are treated as percentages.filter_single_iso_genes (bool) – Whether to discard genes with fewer than two retained isoforms.
gene_names (Optional[str]) – Column name in
adata.varused as display names for grouped genes. IfNone, the grouped gene IDs are used.design_mtx (Optional[Any]) – Design matrix for differential-usage tests. Accepts a tensor/array/dataframe of shape
(n_spots, n_factors), a single obs-column name (str), or a list of obs-column names.covariate_names (Optional[list[str]]) – Explicit covariate names. When
design_mtxis given as column name(s) and this isNone, the column names are used automatically.
- Returns:
counts_list (list[torch.Tensor]) – Per-gene isoform count tensors, each of shape
(n_spots, n_isos). Sparseadata.layers[layer]input yields sparse COO tensors.coordinates (torch.Tensor) – Shape
(n_spots, 2)spatial coordinates, dtype float32.resolved_gene_names (list[str]) – Display names for each gene in
counts_list.resolved_design (np.ndarray or tensor or None) – Resolved design matrix, or the original object if it was already array-like;
Nonewhendesign_mtxisNone.resolved_covariates (list[str] or None) – Resolved covariate names, or
Nonewhendesign_mtxisNone.
- Raises:
ValueError – If required fields are missing from
adata, no isoforms survive filtering, or argument values are out of range.- Return type:
tuple[list[Tensor], Tensor, list[str], Optional[Any], Optional[list[str]]]
- splisosm.utils.run_hsic_gc(counts_gene, coordinates, approx_rank=None, **spatial_kernel_kwargs)#
Function to compute HSIC-GC statistic for gene-level counts.
This function is designed to be a plugin replacement for SPARK-X.
- Parameters:
counts_gene (ndarray | Tensor) – Shape (n_spots, n_genes). Gene counts.
coordinates (ndarray | Tensor) – Shape (n_spots, 2). Spatial coordinates of spots.
approx_rank (Optional[int]) – Approximate rank of the spatial kernel matrix.
**spatial_kernel_kwargs (Any) – Additional arguments for SpatialCovKernel.
- Returns:
Results of the HSIC-GC spatial variability test with keys:
'statistic': np.ndarray of shape (n_genes,). HSIC-GC statistics.'pvalue': np.ndarray of shape (n_genes,). P-values.'pvalue_adj': np.ndarray of shape (n_genes,). Adjusted p-values.'method': str. Method name “hsic-gc”.
- Return type:
dict
- splisosm.utils.run_sparkx(counts_gene, coordinates)#
Wrapper for running the SPARK-X test for spatial gene expression variability.
It runs the R-package SPARK [ZSZ21] via rpy2.
- Parameters:
- Returns:
Results of the SPARK-X spatial variability test with keys:
'statistic': np.ndarray of shape (n_genes,). Mean SPARK-X statistics.'pvalue': np.ndarray of shape (n_genes,). Combined p-values.'pvalue_adj': np.ndarray of shape (n_genes,). Adjusted combined p-values.'method': str. Method name “spark-x”.
- Return type:
dict