splisosm.hyptest_fft
====================

.. py:module:: splisosm.hyptest_fft

.. autoapi-nested-parse::

   FFT-accelerated non-parametric hypothesis tests for SPLISOSM.


Classes
-------

.. autoapisummary::

   splisosm.hyptest_fft.SplisosmFFT


Module Contents
---------------

.. py:class:: SplisosmFFT(rho = 0.99, neighbor_degree = 1, spacing = (1.0, 1.0), workers = None)

   FFT-accelerated SPLISOSM model for rasterized spatial isoform testing.

   The class follows the non-parametric SPLISOSM workflow but consumes a
   SpatialData table directly and rasterizes per-gene isoform counts on demand.

   .. rubric:: Examples

   Spatial variability test:

   >>> from splisosm import SplisosmFFT
   >>> model = SplisosmFFT(rho=0.9, neighbor_degree=1)
   >>> model.setup_data(
   ...     sdata=sdata,
   ...     bins="ID_square_016um",
   ...     table_name="square_016um",
   ...     col_key="array_col",
   ...     row_key="array_row",
   ...     layer="counts",
   ...     group_iso_by="gene_ids",
   ...     gene_names="gene_name",
   ...     min_counts=10,
   ...     min_bin_pct=0.0,
   ... )
   >>> model.test_spatial_variability(method="hsic-ir")
   >>> sv_results = model.get_formatted_test_results(test_type="sv")

   Differential usage test:

   >>> model = SplisosmFFT(rho=0.9, neighbor_degree=1)
   >>> model.setup_data(
   ...     sdata=sdata,
   ...     bins="ID_square_016um",
   ...     table_name="square_016um_svp",
   ...     design_mtx="square_016um_rbp_sve",
   ...     col_key="array_col",
   ...     row_key="array_row",
   ...     layer="counts",
   ...     group_iso_by="gene_ids",
   ...     gene_names="gene_name",
   ...     min_counts=10,
   ...     min_bin_pct=0.0,
   ... )
   >>> model.test_differential_usage(method="hsic-gp", residualize="cov_only")
   >>> du_results = model.get_formatted_test_results("du")

   :param rho: Spatial autocorrelation coefficient for CAR kernel.
   :param neighbor_degree: Neighbor ring degree for CAR graph construction.
   :param spacing: Raster spacing ``(dy, dx)``.
   :param workers: Number of FFT workers.


   .. py:method:: extract_feature_summary(level = 'gene', print_progress = True)

      Compute filtered feature-level summary statistics.

      Gene-level statistics are aggregated across all isoforms that passed
      the filters applied in :meth:`setup_data`.  Isoform-level statistics
      are computed per isoform and augmented onto the corresponding rows of
      ``adata.var``.

      Results are cached: repeated calls with the same ``level`` return the
      cached :class:`pandas.DataFrame` without recomputation.

      :param level: Summary granularity.
                    ``'gene'``: one row per gene.
                    ``'isoform'``: one row per isoform that passed filtering.
      :param print_progress: Whether to show a progress bar.

      :returns: For ``level='gene'``, the index is the gene display name and the
                columns are:

                - ``'n_isos'``: int. Number of isoforms retained after filtering.
                - ``'perplexity'``: float. Effective number of isoforms based on
                  the marginal isoform usage entropy.
                - ``'pct_bin_on'``: float. Fraction of bins with non-zero total
                  gene counts.
                - ``'count_avg'``: float. Mean per-spot total count for the gene.
                - ``'count_std'``: float. Std of per-spot total count for the gene.

                For ``level='isoform'``, the index is the isoform name (matching
                ``adata.var_names``) and the columns are the original ``adata.var``
                columns plus:

                - ``'pct_bin_on'``: float. Fraction of bins with count > 0.
                - ``'count_total'``: float. Total counts across all bins.
                - ``'count_avg'``: float. Mean count per bin.
                - ``'count_std'``: float. Std of count per bin.
                - ``'ratio_total'``: float. Fraction of total gene counts
                  attributable to this isoform.
                - ``'ratio_avg'``: float. Mean per-bin isoform usage ratio
                  (computed over bins with non-zero gene coverage).
                - ``'ratio_std'``: float. Std of per-bin isoform usage ratio
                  (computed over bins with non-zero gene coverage).
      :rtype: pandas.DataFrame

      :raises RuntimeError: If :meth:`setup_data` has not been called.
      :raises ValueError: If ``level`` is not ``'gene'`` or ``'isoform'``.


   .. py:method:: get_formatted_test_results(test_type, with_gene_summary = False)

      Get formatted test results as a pandas DataFrame.

      :param test_type: Test type: ``"sv"`` for spatial variability or ``"du"`` for
                        differential usage.
      :type test_type: {"sv", "du"}
      :param with_gene_summary: If ``True``, append gene-level summary statistics from
                                :meth:`extract_feature_summary`.
      :type with_gene_summary: bool, optional

      :returns: Formatted result table.
      :rtype: pandas.DataFrame


   .. py:method:: setup_data(sdata, bins, table_name, col_key, row_key, layer = 'counts', group_iso_by = 'gene_symbol', gene_names = None, min_counts = 10, min_bin_pct = 0.0, filter_single_iso_genes = True, design_mtx = None, covariate_names = None)

      Setup SpatialData-backed isoform data for FFT-based testing.

      (bins, table_name, col_key, row_key) are passed to
      :func:`spatialdata.rasterize_bins` to rasterize isoform counts.

      :param sdata: SpatialData-like object with ``tables`` mapping.
      :type sdata: spatialdata.SpatialData
      :param bins: Name of the SpatialData bin geometry for rasterization.
      :type bins: str
      :param table_name: Key of the table in ``sdata.tables``.
      :type table_name: str
      :param col_key: Column index key in ``adata.obs`` for rasterization.
      :type col_key: str
      :param row_key: Row index key in ``adata.obs`` for rasterization.
      :type row_key: str
      :param layer: AnnData layer that stores isoform count matrix.
      :type layer: str, optional
      :param group_iso_by: Column in ``adata.var`` used to group isoforms by gene. The
                           unique values of this column define the gene-level groups.
      :type group_iso_by: str, optional
      :param gene_names: Optional column name in ``adata.var`` whose values are used as
                         display gene names in results. If ``None``, the values of
                         ``group_iso_by`` are used directly.
      :type gene_names: str or None, optional
      :param min_counts: Minimum total count (summed across all spots) required for an
                         isoform to be retained. Isoforms below this threshold are
                         excluded before gene grouping. Genes with fewer than two
                         remaining isoforms after filtering are also excluded.
      :type min_counts: int, optional
      :param min_bin_pct: Minimum percentage of bins in which an isoform must be expressed
                          (count greater than zero) to be retained. Values in ``[0, 1]`` are
                          interpreted as fractions of bins, and values in ``(1, 100]`` are
                          interpreted as percentages.
      :type min_bin_pct: float, optional
      :param filter_single_iso_genes: If ``True`` (default), genes with fewer than two isoforms passing
                                      QC filters are removed — they cannot contribute to within-gene ratio
                                      tests.  Set to ``False`` to keep single-isoform genes, e.g. when
                                      testing **gene-level expression variability** with
                                      ``test_spatial_variability(method="hsic-gc")``.
      :type filter_single_iso_genes: bool, optional
      :param design_mtx: Design matrix specification.  Three input modes:

                         1. **Table name** (``str`` matching a key in ``sdata.tables``): Use the
                            existing AnnData table's ``X`` as the design matrix.  Must have the
                            same number of observations as the isoform table.
                         2. **Obs column names** (``str`` or ``list[str]`` not matching a table):
                            Extract the named columns from the isoform table's ``adata.obs``.
                            Categorical columns are one-hot encoded automatically.
                         3. **Pre-computed matrix** (ndarray, sparse, or DataFrame of shape
                            ``(n_obs, n_factors)``): Used as-is.

                         In cases 2 and 3, the design matrix will be stored as a new AnnData table inside
                         ``sdata``. The matrix is also rasterized via :func:`spatialdata.rasterize_bins`
                         when :meth:`test_differential_usage` is called.
      :type design_mtx: str, list[str], np.ndarray, scipy.sparse matrix, pd.DataFrame, or None
      :param covariate_names: Factor names. Will override inferred names.
                              If None, inferred from ``design_mtx`` column names when possible;
                              otherwise auto-generated as ``["factor_0", ...]``.
      :type covariate_names: list[str] or None, optional

      :raises ValueError: If required table/layer/metadata is missing.

      .. seealso::

         :func:`splisosm.hyptest_np.SplisosmNP.setup_data`
             AnnData-based setup for data with general geometry.


   .. py:method:: test_differential_usage(method = 'hsic-gp', ratio_transformation = 'none', gpr_configs = None, residualize = 'cov_only', n_jobs = -1, return_results = False, print_progress = True)

      Test for differential isoform usage against spatial covariate expression.

      Before running this function, the design matrix must be set up using :func:`setup_data`.
      Each column of the design matrix corresponds to a covariate to test for differential
      association with the isoform usage ratios of each gene.
      Test statistics and p-values are computed per (gene, covariate) pair separately.

      Four test strategies are supported, all operating on rasterized grid data
      to avoid densifying the full isoform or covariate matrix in memory:

      - ``"hsic-gp"`` *(default)*: spatially residualize covariates (and
        optionally isoform ratios) with ``FFTKernelGPR``, then compute linear
        HSIC.  Controlled by ``residualize``.
      - ``"hsic"``: linear HSIC between raw centered isoform ratios and raw
        centered covariates—no spatial residualization.
      - ``"t-fisher"``: per-isoform two-sample t-tests (**binary covariates
        only**) combined by Fisher's method (chi-squared, df = 2 × n_isoforms).
      - ``"t-tippett"``: per-isoform two-sample t-tests (**binary covariates
        only**) combined by Tippett's corrected minimum p-value.

      Regardless of method, covariates are processed in chunks of at most 100
      at a time and isoform data is loaded on-the-fly per gene so that neither
      the full covariate grid nor the full isoform matrix is held in memory
      simultaneously.

      :param method: Method for association testing:

                     * ``"hsic"``: Unconditional HSIC test (multivariate RV coefficient).
                       For continuous factors, equivalent to the multivariate Pearson correlation
                       test.  For binary factors, equivalent to the two-sample Hotelling T**2 test.
                     * ``"hsic-gp"``: Conditional HSIC test.  Spatial effects are removed via
                       Gaussian process regression before computing the HSIC statistic.

                     Or one of the T-tests (binary factors only):

                     * ``"t-fisher"``, ``"t-tippett"``: two-sample t-test per isoform
                       (binary covariates only — exactly two distinct non-NaN values required);
                       p-values are combined gene-wise via Fisher's chi-squared or
                       Tippett's corrected minimum method.
      :type method: str, optional
      :param ratio_transformation: Compositional transformation for isoform ratios.
                                   One of ``'none'``, ``'clr'``, ``'ilr'``, ``'alr'``, ``'radial'``
                                   :cite:`park2022kernel`.  See :func:`splisosm.utils.counts_to_ratios`.
      :type ratio_transformation: str, optional
      :param gpr_configs: Nested configuration dict for the GPR objects, with optional keys
                          ``'covariate'`` and/or ``'isoform'``.  Each sub-dict is forwarded to
                          :func:`splisosm.kernel_gpr.make_kernel_gpr`.  Unspecified keys use the
                          defaults from :data:`splisosm.kernel_gpr._DEFAULT_GPR_CONFIGS`::

                              {
                                  "covariate": {
                                      "constant_value": 1.0,
                                      "constant_value_bounds": (1e-3, 1e3),
                                      "length_scale": 1.0,
                                      "length_scale_bounds": "fixed",
                                  },
                                  "isoform": {
                                      "constant_value": 1.0,
                                      "constant_value_bounds": (1e-3, 1e3),
                                      "length_scale": 1.0,
                                      "length_scale_bounds": "fixed",
                                  },
                              }
      :type gpr_configs: dict, optional
      :param residualize: Controls which signals are spatially residualized when
                          ``method="hsic-gp"``:

                          * ``"cov_only"`` (default): residualize covariates only; test
                            HSIC(Z_res, Y_raw).  Fastest; calibration matches ``"both"``
                            when covariate GPR captures most spatial confounding.
                          * ``"both"``: residualize both covariates and isoform ratios.
      :type residualize: {"cov_only", "both"}, optional
      :param n_jobs: Number of parallel jobs. ``-1`` uses all available CPUs.
      :type n_jobs: int, optional
      :param print_progress: Whether to show the progress bar. Default to True.
      :type print_progress: bool, optional
      :param return_results: Whether to return the test statistics and p-values.
                             If False, the results are stored in ``self._du_test_results``.
      :type return_results: bool, optional

      :returns: **results** -- If ``return_results`` is True, returns dict with test statistics and
                p-values. Otherwise, returns None and stores results in
                ``self._du_test_results``.
      :rtype: dict or None

      :raises RuntimeError: If ``setup_data()`` or the ``design_mtx`` argument has not been set.
      :raises ValueError: If ``method``, ``residualize``, or ``ratio_transformation`` is invalid.

      .. seealso::

         :func:`splisosm.hyptest_np.SplisosmNP.test_differential_usage`
             Non-FFT version of this function for comparison.


   .. py:method:: test_spatial_variability(method = 'hsic-ir', ratio_transformation = 'none', n_jobs = -1, return_results = False, print_progress = True)

      Test for spatial variability using FFT-accelerated HSIC.

      :param method: One of ``"hsic-ir"`` (isoform ratios), ``"hsic-ic"`` (isoform counts),
                     or ``"hsic-gc"`` (gene counts).
      :type method: {"hsic-ir", "hsic-ic", "hsic-gc"}, optional
      :param ratio_transformation: Ratio transform used when ``method="hsic-ir"``.
      :type ratio_transformation: {"none", "clr", "ilr", "alr", "radial"}, optional
      :param n_jobs: Number of joblib workers. ``-1`` uses all available CPUs.
      :type n_jobs: int, optional
      :param return_results: If True, return result dictionary.
      :type return_results: bool, optional
      :param print_progress: Whether to show a progress bar.
      :type print_progress: bool, optional

      :returns: Result dictionary when ``return_results=True``; otherwise ``None``.
      :rtype: dict or None

      .. seealso::

         :func:`splisosm.hyptest_np.SplisosmNP.test_spatial_variability`
             Non-FFT version of this function for comparison.


   .. py:attribute:: covariate_names
      :type:  list[str]

      Covariate display names (length :attr:`n_factors`).


   .. py:attribute:: design_mtx
      :type:  Optional[Any]

      Design matrix stored as an AnnData table inside :attr:`sdata`.
      ``None`` if no covariates.


   .. py:attribute:: gene_names
      :type:  list[str]

      Gene display names (length :attr:`n_genes`).


   .. py:attribute:: n_factors
      :type:  int

      Number of covariates for differential usage testing.


   .. py:attribute:: n_genes
      :type:  int

      Number of genes after filtering.


   .. py:attribute:: n_grid
      :type:  int

      Total raster grid cells (``ny * nx``, including zero-padded positions).


   .. py:attribute:: n_isos_per_gene
      :type:  list[int]

      Number of isoforms per gene (list of length :attr:`n_genes`).


   .. py:attribute:: n_spots
      :type:  int

      Number of observed spots (bins with non-zero data).


   .. py:attribute:: sdata
      :type:  Any | None

      Source ``SpatialData`` object; ``None`` before :meth:`setup_data`.


   .. py:attribute:: sp_kernel
      :type:  splisosm.kernel.FFTKernel | None

      :class:`~splisosm.kernel.FFTKernel` for FFT-accelerated spatial operations.