splisosm.kernel_gpr

splisosm.kernel_gpr#

Kernel Gaussian Process Regression backends for SPLISOSM spatial residualization.

This module provides an abstract interface and concrete implementations for kernel-based conditional independence testing (via spatial residualization) used by the SPLISOSM differential-usage tests.

GPR-residualizer hierarchy#

KernelGPR (abstract):

SklearnKernelGPR - sklearn backend; dense; suitable for n <= ~10,000.

GPyTorchKernelGPR - GPyTorch backend (optional dep); lazy tensors.

FFTKernelGPR - FFT-based kernel; suitable for regular grids.

Classes#

`FFTKernelGPR`	FFT-based GPR residualizer for regular (ny, nx) grids using an RBF kernel.
`GPyTorchKernelGPR`	GPyTorch-based GP residualizer.
`KernelGPR`	Abstract GPR residualizer: learns a spatial kernel and returns residuals.
`SklearnKernelGPR`	sklearn-based GPR residualizer.

Functions#

`fit_kernel_gpr`(X, Y[, normalize_x, normalize_y, ...])	Fit a Gaussian process regression and optionally return residuals.
`linear_hsic_test`(X, Y[, centering])	The linear HSIC test (multivariate RV coefficient).
`make_kernel_gpr`([backend])	Construct a KernelGPR from a backend name.

Module Contents#

class splisosm.kernel_gpr.FFTKernelGPR(constant_value=1.0, constant_value_bounds=(0.001, 1000.0), length_scale=1.0, length_scale_bounds='fixed', epsilon_bounds=(1e-05, 10.0), workers=None)#

Bases: KernelGPR

FFT-based GPR residualizer for regular (ny, nx) grids using an RBF kernel.

Accepts the same hyperparameter interface as SklearnKernelGPR. Kernel eigenvalues are computed via FFTKernelOp; all linear-algebra is O(N log N) with no n x n matrix formed.

Notes

Hyperparameter optimization strategy

The spectral log-marginal-likelihood (LML) for Y (n = nx * ny observations, m channels) is:

-2 LML = m * sum_k log(lambda_k + eps)
         + (1/n) * sum_k ||Y_hat_k||^2 / (lambda_k + eps)

where lambda_k = sigma^2 * s_k(l) are the kernel eigenvalues with sigma being the constant value (signal variance) and s_k(l) the unit-spectrum eigenvalues at length scale l, and Y_hat = FFT2(Y_cube) (unnormalized 2-D DFT).

Four optimization cases based on which bounds (constant_value_bounds, length_scale_bounds) are "fixed":

Both fixed → 1-D bounded scalar search over log(eps) only. Fastest; unit spectrum cached across repeated calls.
cv free, ls fixed → 2-D L-BFGS-B over (log_sigma2, log_eps). Unit spectrum cached.
cv fixed, ls free → 2-D L-BFGS-B over (log_l, log_eps). A fresh FFTKernelOp is built at each optimizer step.
Both free → 3-D L-BFGS-B over (log_sigma2, log_l, log_eps). A fresh FFTKernelOp is built at each optimizer step.

Parameters:

constant_value (float) – Initial signal variance sigma^2.
constant_value_bounds (tuple or "fixed") – Bounds for sigma^2. "fixed" pins sigma^2.
length_scale (float) – Initial RBF length scale.
length_scale_bounds (tuple or "fixed") – Bounds for the RBF length scale. "fixed" pins it; a 2-tuple (lo, hi) of positive floats enables length-scale optimization.
epsilon_bounds (tuple[float, float]) – Search bounds for the white-noise / regularization level.
workers (int or None) – Number of scipy.fft workers.

fit_residuals(coords, Y)#

Fit GP and return spatial residuals.

Auto-detects the regular grid from coords, rasterizes Y, optimizes hyperparameters via spectral LML, then de-rasterizes.

Parameters:

coords (Tensor, shape (n_obs, 2)) – Spatial coordinates forming a regular grid.
Y (Tensor, shape (n_obs, m)) – Target values (may contain NaN rows).

Returns:

Spatial residuals. NaN rows are preserved.

Return type:

Tensor, shape (n_obs, m)

fit_residuals_batch(coords, Y_list)#

Residualize a list of targets, amortizing grid detection and unit spectrum.

When signal_bounds_fixed is True the unit-spectrum eigenvalues are shared; only eps is re-optimised per target (1-D search). Otherwise each target gets full 2-D optimisation.

Parameters:

coords (Tensor, shape (n_obs, 2))
Y_list (list of Tensor, each (n_obs, m_i))

Return type:

list of Tensor, each (n_obs, m_i)

fit_residuals_cube(data_cube, spacing=(1.0, 1.0), epsilon=None)#

Residualize a raster cube in O(N log N).

Kept for backward compatibility with SplisosmFFT.

Parameters:

data_cube (np.ndarray, shape (ny, nx) or (ny, nx, m))
spacing ((dy, dx)) – Physical grid spacing.
epsilon (float or None) – If given, skip optimization and use this value directly.

Returns:

residuals (np.ndarray, same shape as data_cube)
epsilon (float) – Regularization level used.

Return type:

tuple[ndarray, float]

get_kernel_op(coords=None)#

Return the cached unit-spectrum FFTKernelOp.

Raises:: RuntimeError – If called before any residualization.
Parameters:: coords (Optional[Tensor])
Return type:: FFTKernelOp

property n: int | None#

Total grid cells (ny * nx).

Return type:: Optional[int]

property nx: int | None#

Grid width (set after first residualization).

Return type:: Optional[int]

property ny: int | None#

Grid height (set after first residualization).

Return type:: Optional[int]

property signal_bounds_fixed: bool#

True when BOTH constant_value_bounds and length_scale_bounds are "fixed".

Return type:: bool

class splisosm.kernel_gpr.GPyTorchKernelGPR(constant_value=1.0, constant_value_bounds=(0.001, 1000.0), length_scale=1.0, length_scale_bounds='fixed', n_inducing=None, n_iter=100, lr=0.01, device='cpu')#

Bases: KernelGPR

GPyTorch-based GP residualizer.

Uses GPyTorch’s lazy-tensor infrastructure so that all linear-system solves during training use Lanczos-based conjugate gradients rather than dense Cholesky. Hyperparameters are optimized with Adam on the marginal log-likelihood.

For most datasets (n <= ~10,000) the exact-GP path is used. Passing n_inducing > 0 enables the FITC sparse GP approximation via InducingPointKernel.

Parameters:

constant_value (float) – Initial signal amplitude (outputscale).
constant_value_bounds (tuple or "fixed") – Search bounds for the signal amplitude. "fixed" disables optimization of this parameter.
length_scale (float) – Initial RBF length scale.
length_scale_bounds (tuple or "fixed") – Search bounds for the length scale.
n_inducing (int or None) – Number of inducing points for FITC sparse GP approximation. When set, memory scales as O(n x n_inducing) rather than O(n²). Set to None to use exact GP.
n_iter (int) – Adam optimizer iterations for hyperparameter learning.
lr (float) – Adam learning rate.
device (str) – PyTorch device string, e.g. "cpu" or "cuda".

Notes

Requires gpytorch (optional dependency):

pip install gpytorch

Hyperparameter search uses the mean of all target columns as the training signal, then applies the fitted kernel to every column. This matches the sklearn backend behavior and amortizes O(n^3) GP fitting across outputs.

The noise (epsilon) search is handled by GPyTorch’s GaussianLikelihood, which is always optimized regardless of the signal bounds setting.

fit_residuals(coords, Y)#

Fit GP residuals using GPyTorch’s exact-GP backend.

Hyperparameters are optimized on the mean of all target columns. The fitted kernel and noise level are then applied to every column of Y to compute residuals.

NaN rows are excluded from fitting and reinserted as NaN in output.

Parameters:

coords (Tensor, shape (n_obs, d)) – Normalized spatial coordinates.
Y (Tensor, shape (n_obs, m)) – Target values; may contain NaN rows.

Returns:

Spatial residuals epsilon * (K + epsilon * I)**(-1) @ Y. NaN rows from Y are preserved as NaN.

Return type:

Tensor, shape (n_obs, m)

device = 'cpu'#

lr = 0.01#

n_inducing = None#

n_iter = 100#

property signal_bounds_fixed: bool#

True when both constant_value and length_scale bounds are "fixed".

Return type:: bool

class splisosm.kernel_gpr.KernelGPR#

Bases: abc.ABC

Abstract GPR residualizer: learns a spatial kernel and returns residuals.

Subclasses determine how hyperparameters are optimized (MLE, variational, fixed) and whether the kernel matrix is materialized (dense) or kept implicit.

All subclasses expose the same fit_residuals interface so that the DU-test pipeline in SplisosmNP and SplisosmFFT is backend-agnostic.

abstractmethod fit_residuals(coords, Y)#

Fit GP to (coords, Y) and return spatial residuals.

Parameters:

coords (Tensor, shape (n_obs, d)) – Spatial coordinates (should be pre-normalized by the caller).
Y (Tensor, shape (n_obs, m)) – Target values (may contain NaN rows, which are handled internally).

Returns:

residuals – Residuals Y - GP_smooth(Y | coords). NaN rows in Y are preserved as NaN.

Return type:

Tensor, shape (n_obs, m)

fit_residuals_batch(coords, Y_list)#

Fit residuals for many targets sharing the same coordinates.

The default implementation calls fit_residuals() sequentially. Subclasses may override this to amortize work across targets (e.g. SklearnKernelGPR precomputes a shared eigendecomposition when signal bounds are fixed).

Parameters:

coords (Tensor, shape (n_obs, d)) – Spatial coordinates.
Y_list (list of Tensor) – List of target matrices, each shape (n_obs, m_i).

Returns:

List of residual matrices, each shape (n_obs, m_i).

Return type:

list of Tensor

abstractmethod get_kernel_op(coords)#

Build the kernel operator for given coordinates.

Returns a SpatialKernelOp whose hyperparameters have been set (but not yet optimized for a specific target Y). Useful when the caller needs the operator for its own purposes (e.g. computing eigenvalues for the HSIC null distribution).

Not all backends support this method.

Parameters:: coords (Tensor, shape (n_obs, d)) – Spatial coordinates.
Returns:: Kernel operator for the given coordinates.
Return type:: SpatialKernelOp
Raises:: NotImplementedError – If not implemented by the subclass.

class splisosm.kernel_gpr.SklearnKernelGPR(constant_value=1.0, constant_value_bounds=(0.001, 1000.0), length_scale=1.0, length_scale_bounds='fixed', n_inducing=5000)#

Bases: KernelGPR

sklearn-based GPR residualizer.

Uses GaussianProcessRegressor with a ConstantKernel x RBF + WhiteKernel to learn spatial hyperparameters and residualize targets.

When both signal bounds are "fixed" (signal_bounds_fixed), the base kernel matrix is the same for every target sharing the same coordinates. Calling precompute_shared_kernel() once before a loop over many targets reduces per-target cost from O(n^3) GP fitting to O(n^2) spectral operations (see _kernel_residuals_from_eigdecomp()).

Parameters:

constant_value (float) – Initial signal amplitude.
constant_value_bounds (tuple or "fixed") – Search bounds for the signal amplitude. "fixed" disables optimization of this parameter.
length_scale (float) – Initial RBF length scale.
length_scale_bounds (tuple or "fixed") – Search bounds for the length scale.
n_inducing (int or None) – Maximum number of observations used for hyperparameter fitting. When n_obs <= n_inducing (or n_inducing is None), all observations are used for an exact GP fit. When n_obs > n_inducing, a randomly sampled subset-of-data of n_inducing points is used for hyperparameter search, then the fitted kernel is applied to predict on all observations. This is not an inducing-point (FITC/VFE) approximation — the subset is only used to determine hyperparameters. Setting n_inducing=None disables the subset shortcut and always uses the full dataset; a warning is issued when n_obs > 10_000.

Notes

This backend always materialises a dense n x n kernel matrix (up to n_inducing x n_inducing when the subset path is taken). For very large n, consider GPyTorchKernelGPR with FITC inducing points or FFTKernelGPR for regular raster grids.

fit_residuals(coords, Y)#

Fit GP residuals for a single target matrix.

Chooses the fast spectral path when signal_bounds_fixed and the shared eigendecomposition has been precomputed; otherwise falls back to a full sklearn GP fit when n_obs <= n_inducing, or an inducing-point approximation when n_obs > n_inducing.

NaN rows in Y are excluded from fitting and reinserted as NaN in the output.

Parameters:

coords (Tensor, shape (n_obs, d)) – Normalized spatial coordinates.
Y (Tensor, shape (n_obs, m)) – Target values; may contain NaN rows.

Returns:

Spatial residuals. NaN rows from Y are preserved as NaN.

Return type:

Tensor, shape (n_obs, m)

fit_residuals_batch(coords, Y_list)#

Fit residuals for multiple targets, amortizing kernel setup.

If signal_bounds_fixed and no eigendecomposition has been precomputed yet, this method precomputes it before the loop.

Parameters:

coords (Tensor, shape (n_obs, d)) – Normalized spatial coordinates.
Y_list (list of Tensor) – List of target matrices, each shape (n_obs, m_i).

Returns:

List of residual matrices, each shape (n_obs, m_i).

Return type:

list of Tensor

classmethod from_config(config)#

Construct from a configuration dictionary.

Parameters:: config (dict) – Configuration dict with optional keys: constant_value, constant_value_bounds, length_scale, length_scale_bounds, n_inducing.
Returns:: New instance constructed from the configuration.
Return type:: SklearnKernelGPR

get_kernel_op(coords)#

Return a DenseKernelOp built from the current (fixed) params.

Parameters:: coords (Tensor, shape (n_obs, d)) – Normalized spatial coordinates.
Returns:: Kernel operator for the given coordinates.
Return type:: DenseKernelOp

precompute_shared_kernel(coords)#

Precompute and cache the eigendecomposition of the fixed-param kernel.

Call this once before a loop over many targets that share the same coordinates and have signal_bounds_fixed == True. Subsequent calls to fit_residuals() will use the cached decomposition and run in O(n^2) rather than O(n^3).

Has no effect (and logs a warning) when signal_bounds_fixed is False.

Parameters:: coords (Tensor, shape (n_obs, d)) – Normalized spatial coordinates, shape (n_obs, d).
Return type:: None

n_inducing = 5000#

property signal_bounds_fixed: bool#

True when both constant_value and length_scale bounds are fixed.

Return type:: bool

splisosm.kernel_gpr.fit_kernel_gpr(X, Y, normalize_x=True, normalize_y=True, return_residuals=True, constant_value=1.0, constant_value_bounds=(0.001, 1000.0), length_scale=1.0, length_scale_bounds=(0.01, 100.0))#

Fit a Gaussian process regression and optionally return residuals.

Deprecated since version 1.0.4: Use SklearnKernelGPR directly. This wrapper is kept for backward compatibility and will be removed in a future release.

Parameters:

X (Tensor, shape (n_obs, d)) – Input spatial coordinates.
Y (Tensor, shape (n_obs, m)) – Target values.
normalize_x (bool) – Whether to z-score normalize X before fitting.
normalize_y (bool) – Whether to z-score normalize Y before fitting.
return_residuals (bool) – If True, return residuals; if False, return (Kxy, epsilon).
constant_value (float) – Initial signal amplitude.
constant_value_bounds (tuple or "fixed") – Search bounds for the signal amplitude.
length_scale (float) – Initial RBF length scale.
length_scale_bounds (tuple or "fixed") – Search bounds for the length scale.

Returns:

If return_residuals is False: (Kxy, epsilon) kernel matrix and noise level. If return_residuals is True: residual tensor of shape (n_obs, m).

Return type:

tuple[Tensor, float] or Tensor

splisosm.kernel_gpr.linear_hsic_test(X, Y, centering=True)#

The linear HSIC test (multivariate RV coefficient).

Equivalent to a multivariate extension of Pearson correlation.

Supports sparse inputs for memory and speed efficiency:

Sparse X (scipy sparse matrix, shape (n, p)): the cross-product Y_c.T @ X_c is computed via a sparse matrix multiply (X.T.dot(Y_c)). Because Y is mean-centred, the X centering correction reduces to zero, so only the original sparse X is needed. X_c.T @ X_c is obtained as X.T @ X - n * mean_X ⊗ mean_X, keeping the first term sparse.
Sparse Y (scipy sparse or torch sparse COO): densified once upfront before any computation.

Parameters:

X (Tensor or spmatrix, shape (n_samples, n_x))
Y (Tensor or spmatrix, shape (n_samples, n_y))
centering (bool) – Whether to mean-centre X and Y before computing the statistic.

Returns:

hsic (float) – HSIC test statistic (scaled by 1 / (n - 1)**2).
pvalue (float) – P-value from the asymptotic chi-squared mixture distribution.

Return type:

tuple[float, float]

splisosm.kernel_gpr.make_kernel_gpr(backend='sklearn', **kwargs)#

Construct a KernelGPR from a backend name.

Parameters:

backend ({"sklearn", "gpytorch", "fft"}) – Backend to use. "fft" accepts the same kwargs as "sklearn" (constant_value, constant_value_bounds, length_scale, length_scale_bounds) plus epsilon_bounds and workers. n_inducing is supported by both "sklearn" and "gpytorch" with the same name (see _DEFAULT_GPR_CONFIGS).
**kwargs – Passed to the backend constructor.

Returns:

GPR residualizer instance for the specified backend.

Return type:

KernelGPR

Raises:

ValueError – If backend is not one of the supported options.