splisosm.hyptest_fft
====================

.. py:module:: splisosm.hyptest_fft

.. autoapi-nested-parse::

   FFT-accelerated non-parametric hypothesis tests for SPLISOSM.


Classes
-------

.. autoapisummary::

   splisosm.hyptest_fft.FFTKernel
   splisosm.hyptest_fft.SplisosmFFT


Module Contents
---------------

.. py:class:: FFTKernel(shape, spacing = (1.0, 1.0), rho = 0.99, neighbor_degree = 1, workers = None)

   FFT-based spatial kernel on a periodic 2D raster grid.

   This implementation currently supports only a CAR-style spatial kernel
   equivalent to a periodic, neighborhood graph-based autoregressive model.

   :param shape: Grid shape ``(ny, nx)``.
   :param spacing: Physical spacing ``(dy, dx)`` between neighboring raster cells.
   :param rho: Spatial autocorrelation coefficient in CAR kernel.
   :param neighbor_degree: Neighbor ring degree for graph construction.
                           ``1`` uses nearest neighbors in the periodic metric.
   :param workers: Number of workers used by ``scipy.fft.fft2``.


   .. py:method:: apply_residual_op(x, epsilon)

      Apply the kernel regression residual operator ``R = epsilon * (K + epsilon * I)**(-1)``.

      Computed in O(N log N) via FFT as::

          R @ v = IFFT2( epsilon / (lambda + epsilon) * FFT2(v) )

      :param x: Input with shape ``(ny, nx)`` or ``(ny, nx, m)``.
      :param epsilon: Regularization / noise level.

      :returns: Residuals of the same shape as ``x``.
      :rtype: np.ndarray


   .. py:method:: eigenvalues(k = None)

      Return kernel eigenvalues.

      :param k: Number of leading eigenvalues to return. If ``None``, return all.

      :returns: Eigenvalues in descending order.
      :rtype: np.ndarray


   .. py:method:: power_spectral_density_1d(bins = 50)

      Compute the 1D power spectral density (radial profile).

      :param bins: Number of bins for the 1D radial frequency.

      :returns: * **freq_bins** (*np.ndarray*) -- The center frequencies of the valid bins.
                * **psd_1d** (*np.ndarray*) -- The average power (eigenvalue) in each frequency bin.


   .. py:method:: square_trace()

      Return ``trace(K^2)``.


   .. py:method:: trace()

      Return ``trace(K)``.


   .. py:method:: xtKx(x)

      Compute ``x^T K x`` in ``O(N log N)`` via FFT.

      :param x: Input with shape ``(ny, nx)`` or ``(ny, nx, m)``.

      :returns: Scalar for 2D input, or shape ``(m,)`` for 3D input.
      :rtype: float or np.ndarray


   .. py:attribute:: n_grid


   .. py:attribute:: neighbor_degree
      :value: 1


   .. py:attribute:: rho


   .. py:attribute:: spectrum


   .. py:attribute:: workers
      :value: None


.. py:class:: SplisosmFFT(rho = 0.99, neighbor_degree = 1, spacing = (1.0, 1.0), workers = None)

   FFT-accelerated SPLISOSM model for rasterized spatial isoform testing.

   The class follows the non-parametric SPLISOSM workflow but consumes a
   SpatialData table directly and rasterizes per-gene isoform counts on demand.

   .. rubric:: Examples

   >>> from splisosm import SplisosmFFT
   >>> model = SplisosmFFT(rho=0.9, neighbor_degree=1)
   >>> model.setup_data(
   ...     sdata=sdata,
   ...     bins="ID_square_016um",
   ...     table_name="square_016um",
   ...     col_key="array_col",
   ...     row_key="array_row",
   ...     layer="counts",
   ...     group_iso_by="gene_ids",
   ...     gene_names="gene_name",
   ...     min_counts=10,
   ...     min_bin_pct=0.0,
   ... )
   >>> model.test_spatial_variability(method="hsic-ir")
   >>> sv_results = model.get_formatted_test_results(test_type="sv")

   :param rho: Spatial autocorrelation coefficient for CAR kernel.
   :param neighbor_degree: Neighbor ring degree for CAR graph construction.
   :param spacing: Raster spacing ``(dy, dx)``.
   :param workers: Number of FFT workers.


   .. py:method:: extract_feature_summary(level = 'gene', print_progress = True)

      Compute filtered feature-level summary statistics.

      Gene-level statistics are aggregated across all isoforms that passed
      the filters applied in :meth:`setup_data`.  Isoform-level statistics
      are computed per isoform and augmented onto the corresponding rows of
      ``adata.var``.

      Results are cached: repeated calls with the same ``level`` return the
      cached :class:`pandas.DataFrame` without recomputation.

      :param level: Summary granularity.
                    ``'gene'``: one row per gene.
                    ``'isoform'``: one row per isoform that passed filtering.
      :param print_progress: Whether to show a progress bar.

      :returns: For ``level='gene'``, the index is the gene display name and the
                columns are:

                - ``'n_isos'``: int. Number of isoforms retained after filtering.
                - ``'perplexity'``: float. Effective number of isoforms based on
                  the marginal isoform usage entropy.
                - ``'pct_bin_on'``: float. Fraction of bins with non-zero total
                  gene counts.
                - ``'count_avg'``: float. Mean per-spot total count for the gene.
                - ``'count_std'``: float. Std of per-spot total count for the gene.

                For ``level='isoform'``, the index is the isoform name (matching
                ``adata.var_names``) and the columns are the original ``adata.var``
                columns plus:

                - ``'pct_bin_on'``: float. Fraction of bins with count > 0.
                - ``'count_total'``: float. Total counts across all bins.
                - ``'count_avg'``: float. Mean count per bin.
                - ``'count_std'``: float. Std of count per bin.
                - ``'ratio_total'``: float. Fraction of total gene counts
                  attributable to this isoform.
                - ``'ratio_avg'``: float. Mean per-bin isoform usage ratio
                  (computed over bins with non-zero gene coverage).
                - ``'ratio_std'``: float. Std of per-bin isoform usage ratio
                  (computed over bins with non-zero gene coverage).
      :rtype: pandas.DataFrame

      :raises RuntimeError: If :meth:`setup_data` has not been called.
      :raises ValueError: If ``level`` is not ``'gene'`` or ``'isoform'``.


   .. py:method:: get_formatted_test_results(test_type)

      Get formatted test results as a pandas DataFrame.

      :param test_type: Test type: ``"sv"`` for spatial variability or ``"du"`` for
                        differential usage.

      :returns: Formatted result table.
      :rtype: pandas.DataFrame


   .. py:method:: setup_data(sdata, bins, table_name, col_key, row_key, layer = 'counts', group_iso_by = 'gene_symbol', gene_names = None, min_counts = 10, min_bin_pct = 0.0, filter_single_iso_genes = True, design_mtx = None, covariate_names = None)

      Setup SpatialData-backed isoform data for FFT-based testing.

      (bins, table_name, col_key, row_key) are passed to
      :func:`spatialdata.rasterize_bins` to rasterize isoform counts.

      :param sdata: SpatialData-like object with ``tables`` mapping.
      :param bins: Name of the SpatialData bin geometry for rasterization.
      :param table_name: Key of the table in ``sdata.tables``.
      :param col_key: Column index key in ``adata.obs`` for rasterization.
      :param row_key: Row index key in ``adata.obs`` for rasterization.
      :param layer: AnnData layer that stores isoform count matrix.
      :param group_iso_by: Column in ``adata.var`` used to group isoforms by gene. The
                           unique values of this column define the gene-level groups.
      :param gene_names: Optional column name in ``adata.var`` whose values are used as
                         display gene names in results. If ``None``, the values of
                         ``group_iso_by`` are used directly.
      :param min_counts: Minimum total count (summed across all spots) required for an
                         isoform to be retained. Isoforms below this threshold are
                         excluded before gene grouping. Genes with fewer than two
                         remaining isoforms after filtering are also excluded.
      :param min_bin_pct: Minimum percentage of bins in which an isoform must be expressed
                          (count greater than zero) to be retained. Values in ``[0, 1]`` are
                          interpreted as fractions of bins, and values in ``(1, 100]`` are
                          interpreted as percentages.
      :param filter_single_iso_genes: If ``True`` (default), genes with fewer than two isoforms passing
                                      QC filters are removed — they cannot contribute to within-gene ratio
                                      tests.  Set to ``False`` to keep single-isoform genes, e.g. when
                                      testing **gene-level expression variability** with
                                      ``test_spatial_variability(method="hsic-gc")``.
      :type filter_single_iso_genes: bool, optional
      :param design_mtx: Design matrix specification.  Three input modes:

                         1. **Table name** (``str`` matching a key in ``sdata.tables``): Use the
                            existing AnnData table's ``X`` as the design matrix.  Must have the
                            same number of observations as the isoform table.
                         2. **Obs column names** (``str`` or ``list[str]`` not matching a table):
                            Extract the named columns from the isoform table's ``adata.obs``.
                            Categorical columns are one-hot encoded automatically.
                         3. **Pre-computed matrix** (ndarray, sparse, or DataFrame of shape
                            ``(n_obs, n_factors)``): Used as-is.

                         In cases 2 and 3, the design matrix will be stored as a new AnnData table inside
                         ``sdata``. The matrix is also rasterized via :func:`spatialdata.rasterize_bins`
                         when :meth:`test_differential_usage` is called.
      :type design_mtx: str, list[str], np.ndarray, scipy.sparse matrix, pd.DataFrame, or None
      :param covariate_names: Factor names. Will override inferred names.
                              If None, inferred from ``design_mtx`` column names when possible;
                              otherwise auto-generated as ``["factor_0", ...]``.
      :type covariate_names: list[str] or None, optional

      :raises ValueError: If required table/layer/metadata is missing.

      .. seealso::

         :func:`splisosm.hyptest_np.SplisosmNP.setup_data`
             AnnData-based setup for data with general geometry.


   .. py:method:: test_differential_usage(method = 'hsic-gp', ratio_transformation = 'none', gpr_configs = None, residualize = 'cov_only', n_jobs = -1, return_results = False, print_progress = True)

      Test for differential isoform usage against spatial covariate expression.

      Before running this function, the design matrix must be set up using :func:`setup_data`.
      Each column of the design matrix corresponds to a covariate to test for differential
      association with the isoform usage ratios of each gene.
      Test statistics and p-values are computed per (gene, covariate) pair separately.

      Four test strategies are supported, all operating on rasterized grid data
      to avoid densifying the full isoform or covariate matrix in memory:

      - ``"hsic-gp"`` *(default)*: spatially residualize covariates (and
        optionally isoform ratios) with ``FFTKernelGPR``, then compute linear
        HSIC.  Controlled by ``residualize``.
      - ``"hsic"``: linear HSIC between raw centered isoform ratios and raw
        centered covariates—no spatial residualization.
      - ``"t-fisher"``: per-isoform two-sample t-tests (**binary covariates
        only**) combined by Fisher's method (chi-squared, df = 2 × n_isoforms).
      - ``"t-tippett"``: per-isoform two-sample t-tests (**binary covariates
        only**) combined by Tippett's corrected minimum p-value.

      Regardless of method, covariates are processed in chunks of at most 100
      at a time and isoform data is loaded on-the-fly per gene so that neither
      the full covariate grid nor the full isoform matrix is held in memory
      simultaneously.

      :param method: Method for association testing:

                     * ``"hsic"``: Unconditional HSIC test (multivariate RV coefficient).
                       For continuous factors, equivalent to the multivariate Pearson correlation
                       test.  For binary factors, equivalent to the two-sample Hotelling T**2 test.
                     * ``"hsic-gp"``: Conditional HSIC test.  Spatial effects are removed via
                       Gaussian process regression before computing the HSIC statistic.

                     Or one of the T-tests (binary factors only):

                     * ``"t-fisher"``, ``"t-tippett"``: two-sample t-test per isoform
                       (binary covariates only — exactly two distinct non-NaN values required);
                       p-values are combined gene-wise via Fisher's chi-squared or
                       Tippett's corrected minimum method.
      :type method: str, optional
      :param ratio_transformation: Compositional transformation for isoform ratios.
                                   One of ``'none'``, ``'clr'``, ``'ilr'``, ``'alr'``, ``'radial'``
                                   :cite:`park2022kernel`.  See :func:`splisosm.utils.counts_to_ratios`.
      :type ratio_transformation: str, optional
      :param gpr_configs: Nested configuration dict for the GPR objects, with optional keys
                          ``'covariate'`` and/or ``'isoform'``.  Each sub-dict is forwarded to
                          :func:`splisosm.kernel_gpr.make_kernel_gpr`.  Unspecified keys use the
                          defaults from :data:`splisosm.kernel_gpr._DEFAULT_GPR_CONFIGS`::

                              {
                                  "covariate": {
                                      "constant_value": 1.0,
                                      "constant_value_bounds": (1e-3, 1e3),
                                      "length_scale": 1.0,
                                      "length_scale_bounds": "fixed",
                                  },
                                  "isoform": {
                                      "constant_value": 1.0,
                                      "constant_value_bounds": (1e-3, 1e3),
                                      "length_scale": 1.0,
                                      "length_scale_bounds": "fixed",
                                  },
                              }
      :type gpr_configs: dict, optional
      :param residualize: Controls which signals are spatially residualized when
                          ``method="hsic-gp"``:

                          * ``"cov_only"`` (default): residualize covariates only; test
                            HSIC(Z_res, Y_raw).  Fastest; calibration matches ``"both"``
                            when covariate GPR captures most spatial confounding.
                          * ``"both"``: residualize both covariates and isoform ratios.
      :type residualize: {"cov_only", "both"}, optional
      :param n_jobs: Number of parallel jobs. ``-1`` uses all available CPUs.
      :type n_jobs: int, optional
      :param print_progress: Whether to show the progress bar. Default to True.
      :type print_progress: bool, optional
      :param return_results: Whether to return the test statistics and p-values.
                             If False, the results are stored in ``self.du_test_results``.
      :type return_results: bool, optional

      :returns: **results** -- If ``return_results`` is True, returns dict with test statistics and
                p-values. Otherwise, returns None and stores results in
                ``self.du_test_results``.
      :rtype: dict or None

      :raises RuntimeError: If ``setup_data()`` or the ``design_mtx`` argument has not been set.
      :raises ValueError: If ``method``, ``residualize``, or ``ratio_transformation`` is invalid.

      .. seealso::

         :func:`splisosm.hyptest_np.SplisosmNP.test_differential_usage`
             Non-FFT version of this function for comparison.


   .. py:method:: test_spatial_variability(method = 'hsic-ir', ratio_transformation = 'none', n_jobs = -1, return_results = False, print_progress = True)

      Test for spatial variability using FFT-accelerated HSIC.

      :param method: One of ``"hsic-ir"`` (isoform ratios), ``"hsic-ic"`` (isoform counts),
                     or ``"hsic-gc"`` (gene counts).
      :param ratio_transformation: Ratio transform used when ``method="hsic-ir"``.
      :param n_jobs: Number of joblib workers. ``-1`` uses all available CPUs.
      :param return_results: If True, return result dictionary.
      :param print_progress: Whether to show a progress bar.

      :returns: Result dictionary when ``return_results=True``; otherwise ``None``.
      :rtype: dict or None

      .. seealso::

         :func:`splisosm.hyptest_np.SplisosmNP.test_spatial_variability`
             Non-FFT version of this function for comparison.


   .. py:attribute:: covariate_names
      :type:  list[str]
      :value: []


   .. py:attribute:: design_mtx
      :type:  Optional[Any]
      :value: None


   .. py:attribute:: du_test_results
      :type:  dict

      Dictionary to store the differential usage test results after running test_differential_usage().
      It contains the following keys:

      - ``'method'``: str, the method used for the test.
      - ``'statistic'``: numpy.ndarray of shape (n_genes, n_covariates), the test statistic for each gene and covariate.
      - ``'pvalue'``: numpy.ndarray of shape (n_genes, n_covariates), the p-value for each gene and covariate.
      - ``'pvalue_adj'``: numpy.ndarray of shape (n_genes, n_covariates), the BH adjusted p-value for each gene and covariate. Each column/covariate is adjusted separately.


   .. py:attribute:: gene_names
      :type:  list[str]

      List of gene names corresponding to the genes in the model after filtering.


   .. py:attribute:: kernel
      :type:  FFTKernel | None

      FFTKernel instance used for spatial kernel computations.


   .. py:attribute:: n_factors
      :type:  int
      :value: 0


   .. py:attribute:: n_genes
      :type:  int

      Number of genes after filtering.


   .. py:attribute:: n_grid
      :type:  int

      Number of raster grid bins (including padding). n_grid = n_y * n_x


   .. py:attribute:: n_isos
      :type:  list[int]

      List of numbers of isoforms per gene after filtering.


   .. py:attribute:: n_spots
      :type:  int

      Number of observed spots (bins).


   .. py:attribute:: sdata
      :type:  Any | None

      SpatialData object containing the input data.


   .. py:attribute:: sv_test_results
      :type:  dict

      Dictionary to store the spatial variability test results after running test_spatial_variability().
      It contains the following keys:

      - ``'method'``: str, the method used for the test.
      - ``'statistic'``: numpy.ndarray of shape (n_genes,), the test statistic for each gene.
      - ``'pvalue'``: numpy.ndarray of shape (n_genes,), the p-value for each gene.
      - ``'pvalue_adj'``: numpy.ndarray of shape (n_genes,), the BH adjusted p-value for each gene.