splisosm.kernel_gpr
===================

.. py:module:: splisosm.kernel_gpr

.. autoapi-nested-parse::

   Kernel Gaussian Process Regression backends for SPLISOSM spatial residualization.

   This module provides an abstract interface and concrete implementations for
   kernel-based conditional independence testing (via spatial residualization)
   used by the SPLISOSM differential-usage tests.

   GPR-residualizer hierarchy
   ---------------------------
   ``KernelGPR`` (abstract):

       - ``SklearnKernelGPR``  - sklearn backend; dense; suitable for n <= ~10,000.
       - ``GPyTorchKernelGPR`` - GPyTorch backend (optional dep); lazy tensors.
       - ``FFTKernelGPR``      - FFT-based kernel; suitable for regular grids.


Classes
-------

.. autoapisummary::

   splisosm.kernel_gpr.FFTKernelGPR
   splisosm.kernel_gpr.GPyTorchKernelGPR
   splisosm.kernel_gpr.KernelGPR
   splisosm.kernel_gpr.SklearnKernelGPR


Functions
---------

.. autoapisummary::

   splisosm.kernel_gpr.fit_kernel_gpr
   splisosm.kernel_gpr.linear_hsic_test
   splisosm.kernel_gpr.make_kernel_gpr


Module Contents
---------------

.. py:class:: FFTKernelGPR(constant_value = 1.0, constant_value_bounds = (0.001, 1000.0), length_scale = 1.0, length_scale_bounds = 'fixed', epsilon_bounds = (1e-05, 10.0), workers = None)

   Bases: :py:obj:`KernelGPR`


   FFT-based GPR residualizer for regular (ny, nx) grids using an RBF kernel.

   Accepts the same hyperparameter interface as :class:`SklearnKernelGPR`.
   Kernel eigenvalues are computed via :class:`FFTKernelOp`; all linear-algebra
   is O(N log N) with no n x n matrix formed.

   .. rubric:: Notes

   **Hyperparameter optimization strategy**

   The spectral log-marginal-likelihood (LML) for Y (n = nx * ny observations, m channels)
   is::

       -2 LML = m * sum_k log(lambda_k + eps)
                + (1/n) * sum_k ||Y_hat_k||^2 / (lambda_k + eps)

   where ``lambda_k = sigma^2 * s_k(l)`` are the kernel eigenvalues with
   ``sigma`` being the constant value (signal variance) and ``s_k(l)``
   the unit-spectrum eigenvalues at length scale ``l``, and
   ``Y_hat = FFT2(Y_cube)`` (unnormalized 2-D DFT).

   Four optimization cases based on which bounds (constant_value_bounds, length_scale_bounds)
   are ``"fixed"``:

   1. **Both fixed** → 1-D bounded scalar search over ``log(eps)`` only.
      Fastest; unit spectrum cached across repeated calls.
   2. **cv free, ls fixed** → 2-D L-BFGS-B over ``(log_sigma2, log_eps)``.
      Unit spectrum cached.
   3. **cv fixed, ls free** → 2-D L-BFGS-B over ``(log_l, log_eps)``.
      A fresh ``FFTKernelOp`` is built at each optimizer step.
   4. **Both free** → 3-D L-BFGS-B over ``(log_sigma2, log_l, log_eps)``.
      A fresh ``FFTKernelOp`` is built at each optimizer step.

   :param constant_value: Initial signal variance sigma^2.
   :type constant_value: float
   :param constant_value_bounds: Bounds for sigma^2.  ``"fixed"`` pins sigma^2.
   :type constant_value_bounds: tuple or "fixed"
   :param length_scale: Initial RBF length scale.
   :type length_scale: float
   :param length_scale_bounds: Bounds for the RBF length scale.  ``"fixed"`` pins it; a 2-tuple
                               ``(lo, hi)`` of positive floats enables length-scale optimization.
   :type length_scale_bounds: tuple or "fixed"
   :param epsilon_bounds: Search bounds for the white-noise / regularization level.
   :type epsilon_bounds: tuple[float, float]
   :param workers: Number of ``scipy.fft`` workers.
   :type workers: int or None


   .. py:method:: fit_residuals(coords, Y)

      Fit GP and return spatial residuals.

      Auto-detects the regular grid from ``coords``, rasterizes ``Y``,
      optimizes hyperparameters via spectral LML, then de-rasterizes.

      :param coords: Spatial coordinates forming a regular grid.
      :type coords: torch.Tensor, shape (n_obs, 2)
      :param Y: Target values (may contain NaN rows).
      :type Y: torch.Tensor, shape (n_obs, m)

      :returns: Spatial residuals. NaN rows are preserved.
      :rtype: torch.Tensor, shape (n_obs, m)


   .. py:method:: fit_residuals_batch(coords, Y_list)

      Residualize a list of targets, amortizing grid detection and unit spectrum.

      When :attr:`signal_bounds_fixed` is True the unit-spectrum
      eigenvalues are shared; only eps is re-optimised per target (1-D
      search).  Otherwise each target gets full 2-D optimisation.

      :param coords:
      :type coords: torch.Tensor, shape (n_obs, 2)
      :param Y_list:
      :type Y_list: list of torch.Tensor, each (n_obs, m_i)

      :rtype: list of torch.Tensor, each (n_obs, m_i)


   .. py:method:: fit_residuals_cube(data_cube, spacing = (1.0, 1.0), epsilon = None)

      Residualize a raster cube in O(N log N).

      Kept for backward compatibility with :class:`SplisosmFFT`.

      :param data_cube:
      :type data_cube: np.ndarray, shape (ny, nx) or (ny, nx, m)
      :param spacing: Physical grid spacing.
      :type spacing: (dy, dx)
      :param epsilon: If given, skip optimization and use this value directly.
      :type epsilon: float or None

      :returns: * **residuals** (*np.ndarray, same shape as data_cube*)
                * **epsilon** (*float*) -- Regularization level used.


   .. py:method:: get_kernel_op(coords = None)

      Return the cached unit-spectrum FFTKernelOp.

      :raises RuntimeError: If called before any residualization.


   .. py:property:: n
      :type: Optional[int]


      Total grid cells (ny * nx).


   .. py:property:: nx
      :type: Optional[int]


      Grid width (set after first residualization).


   .. py:property:: ny
      :type: Optional[int]


      Grid height (set after first residualization).


   .. py:property:: signal_bounds_fixed
      :type: bool


      True when BOTH ``constant_value_bounds`` and ``length_scale_bounds`` are ``"fixed"``.


.. py:class:: GPyTorchKernelGPR(constant_value = 1.0, constant_value_bounds = (0.001, 1000.0), length_scale = 1.0, length_scale_bounds = 'fixed', n_inducing = None, n_iter = 100, lr = 0.01, device = 'cpu')

   Bases: :py:obj:`KernelGPR`


   GPyTorch-based GP residualizer.

   Uses GPyTorch's lazy-tensor infrastructure so that all linear-system
   solves during training use Lanczos-based conjugate gradients rather
   than dense Cholesky.  Hyperparameters are optimized with Adam on the
   marginal log-likelihood.

   For most datasets (n <= ~10,000) the exact-GP path is used.  Passing
   ``n_inducing > 0`` enables the FITC sparse GP approximation via
   InducingPointKernel.

   :param constant_value: Initial signal amplitude (outputscale).
   :type constant_value: float
   :param constant_value_bounds: Search bounds for the signal amplitude.  ``"fixed"`` disables
                                 optimization of this parameter.
   :type constant_value_bounds: tuple or "fixed"
   :param length_scale: Initial RBF length scale.
   :type length_scale: float
   :param length_scale_bounds: Search bounds for the length scale.
   :type length_scale_bounds: tuple or "fixed"
   :param n_inducing: Number of inducing points for FITC sparse GP approximation. When set,
                      memory scales as O(n x n_inducing) rather than O(n²). Set to None to
                      use exact GP.
   :type n_inducing: int or None
   :param n_iter: Adam optimizer iterations for hyperparameter learning.
   :type n_iter: int
   :param lr: Adam learning rate.
   :type lr: float
   :param device: PyTorch device string, e.g. ``"cpu"`` or ``"cuda"``.
   :type device: str

   .. rubric:: Notes

   Requires ``gpytorch`` (optional dependency)::

       pip install gpytorch

   Hyperparameter search uses the mean of all target columns as the
   training signal, then applies the fitted kernel to every column.
   This matches the sklearn backend behavior and amortizes O(n^3) GP
   fitting across outputs.

   The noise (epsilon) search is handled by GPyTorch's
   ``GaussianLikelihood``, which is always optimized regardless of the
   signal bounds setting.


   .. py:method:: fit_residuals(coords, Y)

      Fit GP residuals using GPyTorch's exact-GP backend.

      Hyperparameters are optimized on the mean of all target columns.
      The fitted kernel and noise level are then applied to every column
      of ``Y`` to compute residuals.

      NaN rows are excluded from fitting and reinserted as NaN in output.

      :param coords: Normalized spatial coordinates.
      :type coords: torch.Tensor, shape (n_obs, d)
      :param Y: Target values; may contain NaN rows.
      :type Y: torch.Tensor, shape (n_obs, m)

      :returns: Spatial residuals ``epsilon * (K + epsilon * I)**(-1) @ Y``.
                NaN rows from Y are preserved as NaN.
      :rtype: torch.Tensor, shape (n_obs, m)


   .. py:attribute:: device
      :value: 'cpu'


   .. py:attribute:: lr
      :value: 0.01


   .. py:attribute:: n_inducing
      :value: None


   .. py:attribute:: n_iter
      :value: 100


   .. py:property:: signal_bounds_fixed
      :type: bool


      True when both constant_value and length_scale bounds are ``"fixed"``.


.. py:class:: KernelGPR

   Bases: :py:obj:`abc.ABC`


   Abstract GPR residualizer: learns a spatial kernel and returns residuals.

   Subclasses determine how hyperparameters are optimized (MLE, variational,
   fixed) and whether the kernel matrix is materialized (dense) or kept
   implicit.

   All subclasses expose the same ``fit_residuals`` interface so that the
   DU-test pipeline in ``SplisosmNP`` and ``SplisosmFFT`` is backend-agnostic.


   .. py:method:: fit_residuals(coords, Y)
      :abstractmethod:


      Fit GP to (coords, Y) and return spatial residuals.

      :param coords: Spatial coordinates (should be pre-normalized by the caller).
      :type coords: torch.Tensor, shape (n_obs, d)
      :param Y: Target values (may contain NaN rows, which are handled internally).
      :type Y: torch.Tensor, shape (n_obs, m)

      :returns: **residuals** -- Residuals Y - GP_smooth(Y | coords). NaN rows in Y are preserved as NaN.
      :rtype: torch.Tensor, shape (n_obs, m)


   .. py:method:: fit_residuals_batch(coords, Y_list)

      Fit residuals for many targets sharing the same coordinates.

      The default implementation calls :meth:`fit_residuals` sequentially.
      Subclasses may override this to amortize work across targets
      (e.g. ``SklearnKernelGPR`` precomputes a shared eigendecomposition
      when signal bounds are fixed).

      :param coords: Spatial coordinates.
      :type coords: torch.Tensor, shape (n_obs, d)
      :param Y_list: List of target matrices, each shape (n_obs, m_i).
      :type Y_list: list of torch.Tensor

      :returns: List of residual matrices, each shape (n_obs, m_i).
      :rtype: list of torch.Tensor


   .. py:method:: get_kernel_op(coords)
      :abstractmethod:


      Build the kernel operator for given coordinates.

      Returns a :class:`SpatialKernelOp` whose hyperparameters have been set
      (but not yet optimized for a specific target Y).  Useful when the
      caller needs the operator for its own purposes (e.g. computing
      eigenvalues for the HSIC null distribution).

      Not all backends support this method.

      :param coords: Spatial coordinates.
      :type coords: torch.Tensor, shape (n_obs, d)

      :returns: Kernel operator for the given coordinates.
      :rtype: SpatialKernelOp

      :raises NotImplementedError: If not implemented by the subclass.


.. py:class:: SklearnKernelGPR(constant_value = 1.0, constant_value_bounds = (0.001, 1000.0), length_scale = 1.0, length_scale_bounds = 'fixed', n_inducing = 5000)

   Bases: :py:obj:`KernelGPR`


   sklearn-based GPR residualizer.

   Uses ``GaussianProcessRegressor`` with a ``ConstantKernel x RBF +
   WhiteKernel`` to learn spatial hyperparameters and residualize targets.

   When both signal bounds are ``"fixed"`` (:attr:`signal_bounds_fixed`),
   the base kernel matrix is the same for every target sharing the same
   coordinates.  Calling :meth:`precompute_shared_kernel` once before a
   loop over many targets reduces per-target cost from O(n^3) GP fitting
   to O(n^2) spectral operations (see :func:`_kernel_residuals_from_eigdecomp`).

   :param constant_value: Initial signal amplitude.
   :type constant_value: float
   :param constant_value_bounds: Search bounds for the signal amplitude.  ``"fixed"`` disables
                                 optimization of this parameter.
   :type constant_value_bounds: tuple or ``"fixed"``
   :param length_scale: Initial RBF length scale.
   :type length_scale: float
   :param length_scale_bounds: Search bounds for the length scale.
   :type length_scale_bounds: tuple or ``"fixed"``
   :param n_inducing: Maximum number of observations used for hyperparameter fitting.
                      When ``n_obs <= n_inducing`` (or ``n_inducing`` is ``None``), all
                      observations are used for an exact GP fit.  When ``n_obs > n_inducing``,
                      a randomly sampled **subset-of-data** of ``n_inducing`` points is used
                      for hyperparameter search, then the fitted kernel is applied to predict
                      on all observations.  This is **not** an inducing-point (FITC/VFE)
                      approximation — the subset is only used to determine hyperparameters.
                      Setting ``n_inducing=None`` disables the subset shortcut and always
                      uses the full dataset; a warning is issued when ``n_obs > 10_000``.
   :type n_inducing: int or None

   .. rubric:: Notes

   This backend always materialises a dense n x n kernel matrix (up to
   ``n_inducing`` x ``n_inducing`` when the subset path is taken).  For
   very large n, consider ``GPyTorchKernelGPR`` with FITC inducing points
   or ``FFTKernelGPR`` for regular raster grids.


   .. py:method:: fit_residuals(coords, Y)

      Fit GP residuals for a single target matrix.

      Chooses the fast spectral path when :attr:`signal_bounds_fixed` and
      the shared eigendecomposition has been precomputed; otherwise falls
      back to a full sklearn GP fit when n_obs <= n_inducing, or an
      inducing-point approximation when n_obs > n_inducing.

      NaN rows in ``Y`` are excluded from fitting and reinserted as NaN
      in the output.

      :param coords: Normalized spatial coordinates.
      :type coords: torch.Tensor, shape (n_obs, d)
      :param Y: Target values; may contain NaN rows.
      :type Y: torch.Tensor, shape (n_obs, m)

      :returns: Spatial residuals. NaN rows from Y are preserved as NaN.
      :rtype: torch.Tensor, shape (n_obs, m)


   .. py:method:: fit_residuals_batch(coords, Y_list)

      Fit residuals for multiple targets, amortizing kernel setup.

      If :attr:`signal_bounds_fixed` and no eigendecomposition has been
      precomputed yet, this method precomputes it before the loop.

      :param coords: Normalized spatial coordinates.
      :type coords: torch.Tensor, shape (n_obs, d)
      :param Y_list: List of target matrices, each shape (n_obs, m_i).
      :type Y_list: list of torch.Tensor

      :returns: List of residual matrices, each shape (n_obs, m_i).
      :rtype: list of torch.Tensor


   .. py:method:: from_config(config)
      :classmethod:


      Construct from a configuration dictionary.

      :param config: Configuration dict with optional keys: ``constant_value``,
                     ``constant_value_bounds``, ``length_scale``, ``length_scale_bounds``,
                     ``n_inducing``.
      :type config: dict

      :returns: New instance constructed from the configuration.
      :rtype: SklearnKernelGPR


   .. py:method:: get_kernel_op(coords)

      Return a ``DenseKernelOp`` built from the current (fixed) params.

      :param coords: Normalized spatial coordinates.
      :type coords: torch.Tensor, shape (n_obs, d)

      :returns: Kernel operator for the given coordinates.
      :rtype: DenseKernelOp


   .. py:method:: precompute_shared_kernel(coords)

      Precompute and cache the eigendecomposition of the fixed-param kernel.

      Call this once before a loop over many targets that share the same
      coordinates and have ``signal_bounds_fixed == True``.  Subsequent calls
      to :meth:`fit_residuals` will use the cached decomposition and run in
      O(n^2) rather than O(n^3).

      Has no effect (and logs a warning) when ``signal_bounds_fixed`` is
      False.

      :param coords: Normalized spatial coordinates, shape (n_obs, d).
      :type coords: torch.Tensor, shape (n_obs, d)

      :rtype: None


   .. py:attribute:: n_inducing
      :value: 5000


   .. py:property:: signal_bounds_fixed
      :type: bool


      True when both constant_value and length_scale bounds are fixed.


.. py:function:: fit_kernel_gpr(X, Y, normalize_x = True, normalize_y = True, return_residuals = True, constant_value = 1.0, constant_value_bounds = (0.001, 1000.0), length_scale = 1.0, length_scale_bounds = (0.01, 100.0))

   Fit a Gaussian process regression and optionally return residuals.

   .. deprecated:: 1.0.4
       Use :class:`SklearnKernelGPR` directly.  This wrapper is kept for
       backward compatibility and will be removed in a future release.

   :param X: Input spatial coordinates.
   :type X: torch.Tensor, shape (n_obs, d)
   :param Y: Target values.
   :type Y: torch.Tensor, shape (n_obs, m)
   :param normalize_x: Whether to z-score normalize X before fitting.
   :type normalize_x: bool
   :param normalize_y: Whether to z-score normalize Y before fitting.
   :type normalize_y: bool
   :param return_residuals: If True, return residuals; if False, return ``(Kxy, epsilon)``.
   :type return_residuals: bool
   :param constant_value: Initial signal amplitude.
   :type constant_value: float
   :param constant_value_bounds: Search bounds for the signal amplitude.
   :type constant_value_bounds: tuple or ``"fixed"``
   :param length_scale: Initial RBF length scale.
   :type length_scale: float
   :param length_scale_bounds: Search bounds for the length scale.
   :type length_scale_bounds: tuple or ``"fixed"``

   :returns: If ``return_residuals`` is False: ``(Kxy, epsilon)`` kernel matrix and
             noise level.  If ``return_residuals`` is True: residual tensor of shape
             (n_obs, m).
   :rtype: tuple[torch.Tensor, float] or torch.Tensor


.. py:function:: linear_hsic_test(X, Y, centering = True)

   The linear HSIC test (multivariate RV coefficient).

   Equivalent to a multivariate extension of Pearson correlation.

   Supports sparse inputs for memory and speed efficiency:

   * **Sparse X** (scipy sparse matrix, shape ``(n, p)``): the cross-product
     ``Y_c.T @ X_c`` is computed via a sparse matrix multiply
     (``X.T.dot(Y_c)``).  Because ``Y`` is mean-centred, the ``X`` centering
     correction reduces to zero, so only the original sparse ``X`` is needed.
     ``X_c.T @ X_c`` is obtained as ``X.T @ X  -  n * mean_X ⊗ mean_X``,
     keeping the first term sparse.
   * **Sparse Y** (scipy sparse or torch sparse COO): densified once upfront
     before any computation.

   :param X:
   :type X: torch.Tensor or scipy.sparse.spmatrix, shape (n_samples, n_x)
   :param Y:
   :type Y: torch.Tensor or scipy.sparse.spmatrix, shape (n_samples, n_y)
   :param centering: Whether to mean-centre X and Y before computing the statistic.
   :type centering: bool

   :returns: * **hsic** (*float*) -- HSIC test statistic (scaled by 1 / (n - 1)**2).
             * **pvalue** (*float*) -- P-value from the asymptotic chi-squared mixture distribution.


.. py:function:: make_kernel_gpr(backend = 'sklearn', **kwargs)

   Construct a KernelGPR from a backend name.

   :param backend: Backend to use.  ``"fft"`` accepts the same kwargs as ``"sklearn"``
                   (``constant_value``, ``constant_value_bounds``, ``length_scale``,
                   ``length_scale_bounds``) plus ``epsilon_bounds`` and ``workers``.
                   ``n_inducing`` is supported by both ``"sklearn"`` and ``"gpytorch"``
                   with the same name (see :data:`_DEFAULT_GPR_CONFIGS`).
   :type backend: {"sklearn", "gpytorch", "fft"}
   :param \*\*kwargs: Passed to the backend constructor.

   :returns: GPR residualizer instance for the specified backend.
   :rtype: KernelGPR

   :raises ValueError: If backend is not one of the supported options.