splisosm.io
===========

.. py:module:: splisosm.io

.. autoapi-nested-parse::

   I/O loaders for SpatialData-based workflows.

   This module provides high-level wrappers for constructing SpatialData objects
   from platform-specific outputs.



Functions
---------

.. autoapisummary::

   splisosm.io.load_visium_probe
   splisosm.io.load_visium_sp_meta
   splisosm.io.load_visiumhd_probe
   splisosm.io.load_xenium_codeword


Module Contents
---------------

.. py:function:: load_visium_probe(path, *, counts_file = 'raw_probe_bc_matrix.h5', library_id = None, load_spatial = True, counts_layer_name = 'counts', filtered_counts_file = True, return_type = 'anndata')

   Load standard Visium Space Ranger probe-based outputs.

   Reads the probe-level count matrix (``raw_probe_bc_matrix.h5`` by default)
   from a Space Ranger ``outs`` directory and optionally attaches spatial
   metadata (coordinates, images, scale factors) from the ``spatial/`` subfolder.

   This is the standard-resolution Visium counterpart of
   :func:`load_visiumhd_probe` (which handles Visium HD multi-bin outputs).

   :param path: Path to the Space Ranger ``outs`` directory, e.g.
                ``<run_id>/outs``.  Must contain the HDF5 count matrix and,
                when ``load_spatial=True``, a ``spatial/`` subfolder.
   :param counts_file: Name of the HDF5 count matrix file inside ``path``.
                       Typical choices:

                       * ``"raw_probe_bc_matrix.h5"`` — all barcodes, probe-level features
                         (default; preserves per-probe information).
                       * ``"raw_feature_bc_matrix.h5"`` — all barcodes, gene-level features.
                       * ``"filtered_feature_bc_matrix.h5"`` — tissue barcodes only,
                         gene-level features.
   :param library_id: Library identifier stored in ``adata.uns["spatial"]`` (AnnData mode)
                      or used to name SpatialData elements (SpatialData mode).
                      Defaults to the parent directory name of *path*.
   :param load_spatial: Whether to load spatial metadata (tissue positions, images,
                        scale factors).  Only used when ``return_type="anndata"``.
   :param counts_layer_name: Layer name for the raw count matrix.  The counts are stored in
                             ``adata.layers[counts_layer_name]``.
   :param filtered_counts_file: If ``True`` (default), keep only in-tissue barcodes that appear in
                                ``filtered_feature_bc_matrix.h5``.  If ``False``, keep all barcodes
                                from ``counts_file`` (including background spots).
   :param return_type: Output format.

                       * ``"anndata"`` (default) — return an :class:`~anndata.AnnData`
                         with spatial metadata in ``.obsm["spatial"]`` and ``.uns["spatial"]``.
                       * ``"spatialdata"`` — return a :class:`~spatialdata.SpatialData` object
                         built by ``spatialdata_io.visium()``, with probe-level ``var``
                         metadata restored and a counts layer added.  Suitable for use with
                         :class:`~splisosm.SplisosmFFT`.

   :returns: When ``return_type="anndata"``:

             * ``.X`` / ``.layers[counts_layer_name]`` — sparse count matrix
             * ``.var`` — feature (probe or gene) metadata
             * ``.obs`` — barcode metadata with ``in_tissue``, ``array_row``,
               ``array_col`` (when ``load_spatial=True``)
             * ``.obsm["spatial"]`` — ``(n_spots, 2)`` pixel coordinates
             * ``.uns["spatial"]`` — images and scale factors

             When ``return_type="spatialdata"``:

             * ``sdata.tables["table"]`` — AnnData with probe-level counts in
               ``.layers[counts_layer_name]`` and full probe metadata in ``.var``
             * ``sdata.shapes[dataset_id]`` — spot geometries
             * ``sdata.images`` — tissue images at multiple resolutions
   :rtype: anndata.AnnData or spatialdata.SpatialData

   :raises FileNotFoundError: If the counts file or spatial directory is missing.

   .. rubric:: Examples

   Load as AnnData (for :class:`~splisosm.SplisosmNP`):

   >>> from splisosm.io import load_visium_probe
   >>> adata = load_visium_probe("sample/outs")

   Load as SpatialData (for :class:`~splisosm.SplisosmFFT`):

   >>> sdata = load_visium_probe("sample/outs", return_type="spatialdata")


.. py:function:: load_visium_sp_meta(adata, path_to_spatial, library_id = None)

   Helper function to load Visium spatial metadata.

   :param adata: Annotated data matrix to store the spatial metadata.
   :param path_to_spatial: Path to the `spatial` folder generated by Space Ranger.
   :param library_id: Library ID of the spatial data.

   :returns: **anndata** -- AnnData with spatial metadata.
   :rtype: anndata.AnnData


.. py:function:: load_visiumhd_probe(path, dataset_id = None, bin_sizes = None, bins_as_squares = True, fullres_image_file = None, load_all_images = False, var_names_make_unique = True, filtered_counts_file = True, counts_layer_name = 'counts', path_to_feature_2um_h5 = None)

   Load Visium HD outputs as SpatialData with probe-level binned tables.

   This wrapper uses ``binned_outputs/square_002um/raw_probe_bc_matrix.h5``
   (or a custom ``path_to_feature_2um_h5``) as the source feature count matrix.
   It aggregates probe/peak/isoform counts to coarser bins or cells (``square_008um``,
   ``square_016um`` and, when available, ``cell_id``) according to the spatial mapping
   ``barcode_mappings.parquet`` (Space Ranger v4.0+ required).

   :param path: Path to Space Ranger ``outs`` directory for Visium HD.
   :param dataset_id: Optional dataset ID passed to the SpatialData reader.
   :param bin_sizes: Bin resolutions to include. Each entry can be ``int`` (for example ``8``)
                     or Visium HD bin string (for example ``"square_008um"``). If ``None``,
                     all available ``square_*um`` bins under ``binned_outputs`` are used.
   :param bins_as_squares: Whether bins are represented as squares when loading shapes.
   :param fullres_image_file: Path to the full-resolution image.
   :param load_all_images: Whether to load all optional images via ``spatialdata_io`` reader.
   :param var_names_make_unique: Whether to call ``var_names_make_unique()`` on probe table variables.
   :param filtered_counts_file: Whether to keep only in-tissue 2um barcodes prior to aggregation.
                                If ``True``, barcodes are taken from the source bin table loaded by
                                ``visium_hd`` (``square_002um``). If unavailable, the function falls
                                back to ``binned_outputs/square_002um/filtered_feature_bc_matrix.h5``.
   :param counts_layer_name: Layer name used to store aggregated probe counts in each output table.
   :param path_to_feature_2um_h5: Optional path to the raw 2um probe/peak/isoform counts matrix H5 or H5AD.
                                  If not provided, will look for ``binned_outputs/square_002um/raw_feature_bc_matrix.h5``.

   :returns: A SpatialData object with probe-level tables for requested bins and,
             if available, cell-level segmentation.
   :rtype: spatialdata.SpatialData

   :raises ImportError: If required optional dependencies are not installed.
   :raises ValueError: If required files or requested bins are missing.


.. py:function:: load_xenium_codeword(path, spatial_resolutions = (8.0, 16.0), quality_threshold = 20.0, n_jobs = -1, chunk_batch_size = 64, counts_layer_name = 'counts', build_cell_codeword_table = True, create_square_shapes = True, cells_boundaries = True, nucleus_boundaries = True, cells_as_circles = False, cells_labels = True, nucleus_labels = True, transcripts = True, morphology_mip = True, morphology_focus = True, aligned_images = True, cells_table = True, gex_only = True, show_progress = True)

   Load Xenium outputs and append multi-resolution codeword bin tables.

   This wrapper reads Xenium Ranger ``outs`` with ``spatialdata-io`` and then
   quantifies codewords into square spatial bins at one or more user-defined
   resolutions using transcript-level chunk data (``grids/0/*``). Counting is
   implemented with vectorized sparse aggregation over ``(spot, codeword)``
   pairs to reduce Python overhead while avoiding dependence on optional
   precomputed density matrices. For each resolution, a table named
   ``square_XXXum`` is added to ``sdata.tables``; optional square geometries
   with a ``_bins`` suffix are added to ``sdata.shapes`` so the tables can be
   used directly with :func:`spatialdata.rasterize_bins`.

   ``transcripts.zarr.zip`` is expected to contain the ``density/codeword`` group
   for codeword indexing (Xenium Ranger v3.1+ required).
   If ``build_cell_codeword_table=True`` and the ``transcripts.parquet`` file is available,
   a cell-by-codeword anndata named ``table_codeword`` will also be built and added to ``sdata.tables``.

   :param path: Path to Xenium Ranger output directory, or its parent containing
                ``outs/``.
   :param spatial_resolutions: Spatial bin sizes in microns.  Pass ``None`` or an empty sequence to
                               skip bin table creation entirely (cell-segmentation-only mode).
   :param quality_threshold: Minimum transcript quality score to retain.
   :param n_jobs: Parallel worker count for chunk processing. Use ``-1`` for all cores.
   :param chunk_batch_size: Number of transcript chunks submitted per processing batch.
   :param counts_layer_name: Layer name used to store codeword counts in each output table.
   :param build_cell_codeword_table: Whether to build a cell-by-codeword table from the transcripts parquet file.
   :param create_square_shapes: Whether to create square bin shapes for each table key.
   :param cells_boundaries: Passed to ``spatialdata_io.readers.xenium.xenium``.
   :param nucleus_boundaries: Passed to ``spatialdata_io.readers.xenium.xenium``.
   :param cells_as_circles: Passed to ``spatialdata_io.readers.xenium.xenium``.
   :param cells_labels: Passed to ``spatialdata_io.readers.xenium.xenium``.
   :param nucleus_labels: Passed to ``spatialdata_io.readers.xenium.xenium``.
   :param transcripts: Passed to ``spatialdata_io.readers.xenium.xenium``.
   :param morphology_mip: Passed to ``spatialdata_io.readers.xenium.xenium``.
   :param morphology_focus: Passed to ``spatialdata_io.readers.xenium.xenium``.
   :param aligned_images: Passed to ``spatialdata_io.readers.xenium.xenium``.
   :param cells_table: Passed to ``spatialdata_io.readers.xenium.xenium``.
   :param gex_only: Passed to ``spatialdata_io.readers.xenium.xenium``.
   :param show_progress: Whether to display progress bars while binning codewords.

   :returns: SpatialData object augmented with bin-by-codeword count tables at each
             requested resolution and, when requested, a cell-by-codeword table
             named ``table_codeword``.
   :rtype: spatialdata.SpatialData

   :raises ImportError: If required optional dependencies are not installed.
   :raises ValueError: If path/layout/arguments are invalid.


