Building Blocks

Building Blocks#

These lower-level classes and helpers are useful for method development, diagnostics, and reproducing implementation details discussed in Statistical Methods.

Kernels#

`splisosm.kernel.SpatialCovKernel`	Graph-based spatial covariance kernel.
`splisosm.kernel.FFTKernel`	FFT-based spatial kernel on a periodic 2D raster grid.

class splisosm.kernel.SpatialCovKernel(coords=None, *, adj_matrix=None, k_neighbors=4, rho=0.99, standardize_cov=True, centering=False)#

Bases: Kernel

Graph-based spatial covariance kernel.

Construct from spot coordinates (from_coordinates() or the coords positional argument) or from a pre-built k-NN adjacency / weight matrix (from_adjacency() or the adj_matrix keyword argument). Exactly one of the two must be provided.

The kernel is stored in one of two modes depending on dataset size:

Dense mode (n ≤ DENSE_THRESHOLD): the (n, n) covariance matrix K_sp is materialised, with optional variance standardisation and double-centring applied at construction time.
Implicit mode (n > DENSE_THRESHOLD): only the sparse precision inv_cov = K⁻¹ is retained; K x is computed via a cached sparse LU factorisation.

Eigenvalue decomposition and the low-rank Q factor are computed lazily on the first call to eigenvalues().

Parameters:

coords (np.ndarray | Tensor | None)
adj_matrix (spmatrix | np.ndarray | None)
k_neighbors (int)
rho (float)
standardize_cov (bool)
centering (bool)

classmethod from_coordinates(coords, k_neighbors=4, rho=0.99, standardize_cov=True, centering=False)#

Build a CAR spatial kernel from spot coordinates.

Parameters:

coords (ndarray | Tensor) – Spot coordinates of shape (n_spots, n_dim).
k_neighbors (int) – Number of nearest neighbours for the spatial graph.
rho (float) – CAR spatial autocorrelation coefficient (0 <= rho < 1).
standardize_cov (bool) – Scale covariance to unit marginal variance.
centering (bool) – Apply double-centring H K H.

Return type:

SpatialCovKernel

Kx(x)#

Apply K or H K H to x.

Parameters:: x (ndarray | Tensor | spmatrix) – Vector or matrix with first dimension n_spots. Dense NumPy, dense/sparse torch, and scipy sparse inputs are supported.
Returns:: Kernel product. SciPy sparse inputs return a dense np.ndarray; torch inputs return a dense torch.Tensor on the original device.
Return type:: np.ndarray or Tensor

xtKx(x)#

Compute x^T K x using the cached low-rank factor when available.

When Q has been populated (by a prior eigenvalues() call) the result is x^T Q Q^T x, i.e. consistent with the approximation K ≈ Q Q^T:

Dense mode: eigendecomposition() produces a full-rank Q so Q Q^T = K exactly.
Implicit mode: eigenvalues() produces a rank-k Q; the result is therefore a rank-k approximation of x^T K x.

Before eigenvalues() is called (Q is None) this falls back to xtKx_exact(). Use xtKx_exact() directly to always obtain the exact quadratic form regardless of whether Q has been computed.

Parameters:: x (ndarray | Tensor | spmatrix) – Input matrix of shape (n_spots, d). Dense NumPy arrays, dense/sparse torch tensors, and scipy sparse matrices are supported.
Returns:: Matrix of shape (d, d).
Return type:: Tensor

GLMM internals#

`splisosm.glmm.MultinomGLM`	The Multinomial Generalized Linear Model for spatial isoform expression.
`splisosm.glmm.MultinomGLMM`	The Multinomial Generalized Linear Mixed Model for spatial isoform expression.

class splisosm.glmm.MultinomGLM(fitting_method='iwls', fitting_configs=None)#

Bases: BaseModel, Module

The Multinomial Generalized Linear Model for spatial isoform expression.

Compared to MultinomGLMM, this model does not have a random effect term:

Y ~ Multinomial(alpha, Y.sum(1))
eta = multinomial-logit(alpha) = X @ beta + bias_eta

Given isoform counts of a gene Y (n_spots, n_isos) and design matrix X (n_spots, n_factors), MultinomGLM.fit will find the MAP estimates of the following learnable parameters:

beta: (n_factors, n_isos - 1) covariate coefficients of the fixed effect term.
bias_eta: (n_isos - 1) intercepts of the fixed effect term.

Inference is performed by maximizing the log likelihood using fitting_method (default: 'iwls')

Example

>>> from splisosm.glmm import MultinomGLM
>>> import torch
>>> # Generate synthetic data
>>> counts = torch.randint(0, 10, (5, 100, 3))  # 5 genes, 100 spots, each 3 isoforms
>>> # Fit the GLM model
>>> model = MultinomGLM(fitting_method='iwls')
>>> model.setup_data(counts, design_mtx=None)
>>> model.fit()
>>> print(model)
>>> # Extract the fitted isoform ratios
>>> isoform_ratios = model.get_isoform_ratio()  # shape (5, 100, 3)
>>> # Fitted parameters
>>> print(model.beta.shape)  # shape (5, 0, 2)
>>> print(model.bias_eta.shape)  # shape (5, 2)

Parameters:

fitting_method (Literal['iwls', 'newton', 'gd'])
fitting_configs (dict)

setup_data(counts, design_mtx=None, device='cpu')#

Set up the data for the model.

Parameters:

counts (Tensor) – Shape (n_genes, n_spots, n_isos) or (n_spots, n_isos). For batched calculations, all genes in the batch must have the same number of isoforms.
design_mtx (Tensor | None) – Shape (n_spots, n_factors). Design matrix of spatial covariates. If None, an intercept-only design matrix will be used.
device (Literal['cpu', 'cuda', 'mps']) – ‘cpu’, ‘cuda’, or ‘mps’ (torch v2.11.0+).

Return type:

None

forward()#

Calculate log probability given data.

Returns:: log_prob – Shape (n_genes,), the log probability for each gene.
Return type:: Tensor

fit(store_param_history=False, verbose=False, quiet=False, random_seed=None, diagnose=None)#

Fit the model using all data.

Parameters:

store_param_history (bool) – Whether to store per-epoch parameter snapshots during training. Loss history is always recorded via model.logger.loss_history regardless of this flag.
verbose (bool) – Whether to print verbose information during fitting.
quiet (bool) – Whether to suppress output during fitting.
random_seed (int | None) – Random seed for reproducibility.
diagnose (bool | None) – Deprecated alias for store_param_history.

Returns:

params_iter – If store_param_history=True, returns a dictionary of parameter snapshots during training. Otherwise returns None.

Return type:

dict or None

get_isoform_ratio()#

Extract the fitted isoform ratio across space.

Returns:: ratio – Shape (n_genes, n_spots, n_isos), the fitted isoform ratio across space.
Return type:: Tensor

class splisosm.glmm.MultinomGLMM(share_variance=True, var_fix_sigma=True, var_prior_model='none', var_prior_model_params=None, init_ratio='uniform', fitting_method='joint_gd', fitting_configs=None)#

Bases: MultinomGLM, BaseModel, Module

The Multinomial Generalized Linear Mixed Model for spatial isoform expression.

The model is defined as follows:

Y ~ Multinomial(alpha, Y.sum(1))
eta = multinomial-logit(alpha) = X @ beta + bias_eta + nu
nu ~ MVN(0, sigma^2 * (theta * V_sp + (1-theta) * I))

Given isoform counts of a gene Y (n_spots, n_isos), design matrix X (n_spots, n_factors), and spatial covariance matrix V_sp (n_spots, n_spots), the model estimates the isoform usage ratio alpha (n_spots, n_isos) across space. Specifically, MultinomGLMM.fit will find the MAP estimates of the following learnable parameters:

beta: (n_factors, n_isos - 1) covariate coefficients of the fixed effect term.
bias_eta: (n_isos - 1) intercepts of the fixed effect term.
nu: (n_spots, n_isos - 1) the random effect term.
variance components: (sigma, theta_logit), each of length n_isos - 1 (or 1 if share_variance is True), representing total variance and logit of spatial variance proportion theta.

Inference algorithms can be categorized into two types based on the optimization objective:

Joint: Maximize the joint likelihood (with the random effect nu). This is equivalent to the first-order Laplace approximation of the marginal likelihood.
Marginal: Maximize the marginal likelihood (with the random effect nu integrated out). The integral is approximated by a second-order Laplace approximation.

Methods implemented:

'joint_gd': Maximize the joint likelihood using gradient descent.
'joint_newton': Maximize the joint likelihood using Newton’s method.
'marginal_gd': Maximize the marginal likelihood using gradient descent.
'marginal_newton': Maximize the marginal likelihood using Newton’s method. In this method, nu is first updated using Newton’s method every 'update_nu_every_k' iterations, and beta, bias_eta, and variance components are updated using gradient descent.

Notes

It is also possible to implement held-out likelihood for model selection.

Example

>>> from splisosm.glmm import MultinomGLMM
>>> from splisosm.utils.preprocessing import get_cov_sp
>>> import torch
>>> # Generate synthetic data
>>> counts = torch.randint(0, 10, (5, 100, 3))  # 5 genes, 100 spots, each 3 isoforms
>>> coords = torch.rand(100, 2)  # 100 spots with 2D coordinates
>>> K_sp = get_cov_sp(coords, k=4, rho=0.9) # spatial covariance matrix of shape (100, 100)
>>> # Fit the GLMM model
>>> model = MultinomGLMM(fitting_method='joint_gd')
>>> model.setup_data(counts, corr_sp=K_sp, design_mtx=None)
>>> model.fit()
>>> print(model)
>>> # Extract the fitted isoform ratios
>>> isoform_ratios = model.get_isoform_ratio()  # shape (5, 100, 3)
>>> # Fitted parameters
>>> print(model.beta.shape)  # shape (5, 0, 2)
>>> print(model.bias_eta.shape)  # shape (5, 2)
>>> print(model.nu.shape)  # shape (5, 100, 2)
>>> print(model.sigma.shape)  # shape (5, 1)
>>> print(model.theta_logit.shape)  # shape (5, 1)

Parameters:

share_variance (bool)
var_fix_sigma (bool)
var_prior_model (Literal['none', 'gamma', 'inv_gamma'])
var_prior_model_params (dict)
init_ratio (Literal['observed', 'uniform'])
fitting_method (Literal['joint_gd', 'joint_newton', 'marginal_gd', 'marginal_newton'])
fitting_configs (dict)

setup_data(counts, corr_sp=None, design_mtx=None, device='cpu', corr_sp_eigvals=None, corr_sp_eigvecs=None)#

Set up the data for the model.

Parameters:

counts (Tensor) – Shape (n_genes, n_spots, n_isoforms) or (n_spots, n_isoforms). For batched calculations, all genes in the batch must have the same number of isoforms.
corr_sp (Tensor | None) – Shape (n_spots, n_spots), spatial covariance matrix. If None, the eigendecomposition of the spatial covariance matrix must be provided.
design_mtx (Tensor | None) – Shape (n_spots, n_factors). Design matrix of spatial covariates. If None, an intercept-only design matrix will be used.
device (Literal['cpu', 'cuda', 'mps']) – ‘cpu’, ‘cuda’, or ‘mps’ (torch v2.11.0+).
corr_sp_eigvals (Tensor | None) – Shape (n_spots,), eigenvalues of spatial covariance. If None, the spatial covariance matrix corr_sp must be provided.
corr_sp_eigvecs (Tensor | None) – Shape (n_spots, n_spots), eigenvectors of spatial covariance. If None, the spatial covariance matrix corr_sp must be provided.

Return type:

None

forward()#

Calculate the log-likelihood or log-marginal-likelihood of the model.

Returns:: log_prob – Shape (n_genes,), the log probability for each gene.
Return type:: Tensor

fit(store_param_history=False, verbose=False, quiet=False, random_seed=None, diagnose=None)#

Fit the model using all data.

Parameters:

store_param_history (bool) – Whether to store per-epoch parameter snapshots during training. Loss history is always recorded via model.logger.loss_history regardless of this flag.
verbose (bool) – Whether to print verbose information during fitting.
quiet (bool) – Whether to suppress output during fitting.
random_seed (int | None) – Random seed for reproducibility.
diagnose (bool | None) – Deprecated alias for store_param_history.

Returns:

params_iter – If store_param_history=True, returns a dictionary of parameter snapshots during training. Otherwise returns None.

Return type:

dict or None

get_isoform_ratio()#

Extract the fitted isoform ratio across space.

Returns:: ratio – Shape (n_genes, n_spots, n_isos), the fitted isoform ratio across space.
Return type:: Tensor

Simulation#

`splisosm.utils.simulation.simulate_isoform_counts`	Generate isoform-by-spot count matrix for multiple genes.
`splisosm.utils.simulation.simulate_isoform_counts_single_gene`	Generate isoform-by-spot count matrix for a given gene.

splisosm.utils.simulation.simulate_isoform_counts(n_genes=1, grid_size=(30, 30), n_isos=3, total_counts_expected=100, var_sp=0, var_nsp=1, rho=0.99, design_mtx=None, beta_true=None, return_params=True)#

Generate isoform-by-spot count matrix for multiple genes.

Each gene is simulated independently. For simplicity we constrain all genes to have the same number of isoforms. (In fact in the current implementation all genes have the exact same distribution.)

Parameters:

n_genes (int) – Number of genes to simulate.
grid_size (tuple[int, int]) – Shape of the spatial grid.
n_isos (int) – Number of isoforms.
total_counts_expected (int | Tensor) – Expected total gene counts per spot.
var_sp (float) – Variance explained by spatial structure.
var_nsp (float) – Unstructured variance (white noise).
rho (float) – Spatial autocorrelation parameter.
design_mtx (Tensor | None) – Shape (n_spots, n_factors). Design matrix for the isoform usage ratio.
beta_true (Tensor | None) – Shape (n_factors, n_isoforms - 1). Factor coefficients for the design matrix.
return_params (bool) – Whether to return simulation parameters.

Returns:

If return_params is False, returns datadict with 'counts', 'coords', 'cov_sp', 'design_mtx'. If return_params is True, returns (data_dict, params_dict).

Return type:

dict or tuple[dict]

splisosm.utils.simulation.simulate_isoform_counts_single_gene(grid_size=(30, 30), n_isos=3, total_counts_expected=100, var_sp=0, var_nsp=1, rho=0.99, design_mtx=None, beta_true=None, return_params=True)#

Generate isoform-by-spot count matrix for a given gene.

Each gene is simulated independently. For simplicity we constrain all genes to have the same number of isoforms. (In fact in the current implementation all genes have the exact same distribution.)

Parameters:

grid_size (tuple[int, int]) – Shape of the spatial grid.
n_isos (int) – Number of isoforms.
total_counts_expected (int | Tensor) – Expected total gene counts per spot.
var_sp (float) – Variance explained by spatial structure.
var_nsp (float) – Unstructured variance (white noise).
rho (float) – Spatial autocorrelation parameter.
design_mtx (Tensor | None) – Shape (n_spots, n_factors). Design matrix for the isoform usage ratio.
beta_true (Tensor | None) – Shape (n_factors, n_isoforms - 1). Factor coefficients for the design matrix.
return_params (bool) – Whether to return simulation parameters.

Returns:

If return_params is False, returns datadict with 'counts', 'coords', 'cov_sp', 'design_mtx'. If return_params is True, returns (data_dict, params_dict).

Return type:

dict or tuple[dict]

Building Blocks

Contents

Building Blocks#

Kernels#

GLMM internals#

Simulation#