SpatialData Loaders#

These functions load common spatial transcriptomics outputs into the data structures expected by SPLISOSM.

splisosm.io.load_visium_sp_meta

Helper function to load Visium spatial metadata.

splisosm.io.load_visium_probe

Load standard Visium Space Ranger probe-based outputs.

splisosm.io.load_visiumhd_probe

Load Visium HD outputs as SpatialData with probe-level binned tables.

splisosm.io.load_xenium_codeword

Load Xenium outputs and append multi-resolution codeword bin tables.

splisosm.io.load_visium_sp_meta(adata, path_to_spatial, library_id=None)#

Helper function to load Visium spatial metadata.

Parameters:
  • adata (AnnData) – Annotated data matrix to store the spatial metadata.

  • path_to_spatial (str | Path) – Path to the spatial folder generated by Space Ranger.

  • library_id (str | None) – Library ID of the spatial data.

Returns:

anndata – AnnData with spatial metadata.

Return type:

AnnData

splisosm.io.load_visium_probe(path, *, counts_file='raw_probe_bc_matrix.h5', library_id=None, load_spatial=True, counts_layer_name='counts', filtered_counts_file=True, return_type='anndata')#

Load standard Visium Space Ranger probe-based outputs.

Reads the probe-level count matrix (raw_probe_bc_matrix.h5 by default) from a Space Ranger outs directory and optionally attaches spatial metadata (coordinates, images, scale factors) from the spatial/ subfolder.

This is the standard-resolution Visium counterpart of load_visiumhd_probe() (which handles Visium HD multi-bin outputs).

Parameters:
  • path (str | Path) – Path to the Space Ranger outs directory, e.g. <run_id>/outs. Must contain the HDF5 count matrix and, when load_spatial=True, a spatial/ subfolder.

  • counts_file (str) –

    Name of the HDF5 count matrix file inside path. Typical choices:

    • "raw_probe_bc_matrix.h5" — all barcodes, probe-level features (default; preserves per-probe information).

    • "raw_feature_bc_matrix.h5" — all barcodes, gene-level features.

    • "filtered_feature_bc_matrix.h5" — tissue barcodes only, gene-level features.

  • library_id (str | None) – Library identifier stored in adata.uns["spatial"] (AnnData mode) or used to name SpatialData elements (SpatialData mode). Defaults to the parent directory name of path.

  • load_spatial (bool) – Whether to load spatial metadata (tissue positions, images, scale factors). Only used when return_type="anndata".

  • counts_layer_name (str) – Layer name for the raw count matrix. The counts are stored in adata.layers[counts_layer_name].

  • filtered_counts_file (bool) – If True (default), keep only in-tissue barcodes that appear in filtered_feature_bc_matrix.h5. If False, keep all barcodes from counts_file (including background spots).

  • return_type (str) –

    Output format.

    • "anndata" (default) — return an AnnData with spatial metadata in .obsm["spatial"] and .uns["spatial"].

    • "spatialdata" — return a SpatialData object built by spatialdata_io.visium(), with probe-level var metadata restored and a counts layer added. Suitable for use with SplisosmFFT.

Returns:

When return_type="anndata":

  • .X / .layers[counts_layer_name] — sparse count matrix

  • .var — feature (probe or gene) metadata

  • .obs — barcode metadata with in_tissue, array_row, array_col (when load_spatial=True)

  • .obsm["spatial"](n_spots, 2) pixel coordinates

  • .uns["spatial"] — images and scale factors

When return_type="spatialdata":

  • sdata.tables["table"] — AnnData with probe-level counts in .layers[counts_layer_name] and full probe metadata in .var

  • sdata.shapes[dataset_id] — spot geometries

  • sdata.images — tissue images at multiple resolutions

Return type:

AnnData or SpatialData

Raises:

FileNotFoundError – If the counts file or spatial directory is missing.

Examples

Load as AnnData (for SplisosmNP):

>>> from splisosm.io import load_visium_probe
>>> adata = load_visium_probe("sample/outs")

Load as SpatialData (for SplisosmFFT):

>>> sdata = load_visium_probe("sample/outs", return_type="spatialdata")
splisosm.io.load_visiumhd_probe(path, dataset_id=None, bin_sizes=None, bins_as_squares=True, fullres_image_file=None, load_all_images=False, var_names_make_unique=True, filtered_counts_file=True, counts_layer_name='counts', path_to_feature_2um_h5=None)#

Load Visium HD outputs as SpatialData with probe-level binned tables.

This wrapper uses binned_outputs/square_002um/raw_probe_bc_matrix.h5 (or a custom path_to_feature_2um_h5) as the source feature count matrix. It aggregates probe/peak/isoform counts to coarser bins or cells (square_008um, square_016um and, when available, cell_id) according to the spatial mapping barcode_mappings.parquet (Space Ranger v4.0+ required).

Parameters:
  • path (str | Path) – Path to Space Ranger outs directory for Visium HD.

  • dataset_id (str | None) – Optional dataset ID passed to the SpatialData reader.

  • bin_sizes (list[int | str] | None) – Bin resolutions to include. Each entry can be int (for example 8) or Visium HD bin string (for example "square_008um"). If None, all available square_*um bins under binned_outputs are used.

  • bins_as_squares (bool) – Whether bins are represented as squares when loading shapes.

  • fullres_image_file (str | Path | None) – Path to the full-resolution image.

  • load_all_images (bool) – Whether to load all optional images via spatialdata_io reader.

  • var_names_make_unique (bool) – Whether to call var_names_make_unique() on probe table variables.

  • filtered_counts_file (bool) – Whether to keep only in-tissue 2um barcodes prior to aggregation. If True, barcodes are taken from the source bin table loaded by visium_hd (square_002um). If unavailable, the function falls back to binned_outputs/square_002um/filtered_feature_bc_matrix.h5.

  • counts_layer_name (str) – Layer name used to store aggregated probe counts in each output table.

  • path_to_feature_2um_h5 (str | Path | None) – Optional path to the raw 2um probe/peak/isoform counts matrix H5 or H5AD. If not provided, will look for binned_outputs/square_002um/raw_feature_bc_matrix.h5.

Returns:

A SpatialData object with probe-level tables for requested bins and, if available, cell-level segmentation.

Return type:

SpatialData

Raises:
  • ImportError – If required optional dependencies are not installed.

  • ValueError – If required files or requested bins are missing.

splisosm.io.load_xenium_codeword(path, spatial_resolutions=(8.0, 16.0), quality_threshold=20.0, n_jobs=-1, chunk_batch_size=64, counts_layer_name='counts', build_cell_codeword_table=True, create_square_shapes=True, cells_boundaries=True, nucleus_boundaries=True, cells_as_circles=False, cells_labels=True, nucleus_labels=True, transcripts=True, morphology_mip=True, morphology_focus=True, aligned_images=True, cells_table=True, gex_only=True, show_progress=True)#

Load Xenium outputs and append multi-resolution codeword bin tables.

This wrapper reads Xenium Ranger outs with spatialdata-io and then quantifies codewords into square spatial bins at one or more user-defined resolutions using transcript-level chunk data (grids/0/*). Counting is implemented with vectorized sparse aggregation over (spot, codeword) pairs to reduce Python overhead while avoiding dependence on optional precomputed density matrices. For each resolution, a table named square_XXXum is added to sdata.tables; optional square geometries with a _bins suffix are added to sdata.shapes so the tables can be used directly with spatialdata.rasterize_bins().

transcripts.zarr.zip is expected to contain the density/codeword group for codeword indexing (Xenium Ranger v3.1+ required). If build_cell_codeword_table=True and the transcripts.parquet file is available, a cell-by-codeword anndata named table_codeword will also be built and added to sdata.tables.

Parameters:
  • path (str | Path) – Path to Xenium Ranger output directory, or its parent containing outs/.

  • spatial_resolutions (Sequence[float] | None) – Spatial bin sizes in microns. Pass None or an empty sequence to skip bin table creation entirely (cell-segmentation-only mode).

  • quality_threshold (float) – Minimum transcript quality score to retain.

  • n_jobs (int) – Parallel worker count for chunk processing. Use -1 for all cores.

  • chunk_batch_size (int) – Number of transcript chunks submitted per processing batch.

  • counts_layer_name (str) – Layer name used to store codeword counts in each output table.

  • build_cell_codeword_table (bool) – Whether to build a cell-by-codeword table from the transcripts parquet file.

  • create_square_shapes (bool) – Whether to create square bin shapes for each table key.

  • cells_boundaries (bool) – Passed to spatialdata_io.readers.xenium.xenium.

  • nucleus_boundaries (bool) – Passed to spatialdata_io.readers.xenium.xenium.

  • cells_as_circles (bool) – Passed to spatialdata_io.readers.xenium.xenium.

  • cells_labels (bool) – Passed to spatialdata_io.readers.xenium.xenium.

  • nucleus_labels (bool) – Passed to spatialdata_io.readers.xenium.xenium.

  • transcripts (bool) – Passed to spatialdata_io.readers.xenium.xenium.

  • morphology_mip (bool) – Passed to spatialdata_io.readers.xenium.xenium.

  • morphology_focus (bool) – Passed to spatialdata_io.readers.xenium.xenium.

  • aligned_images (bool) – Passed to spatialdata_io.readers.xenium.xenium.

  • cells_table (bool) – Passed to spatialdata_io.readers.xenium.xenium.

  • gex_only (bool) – Passed to spatialdata_io.readers.xenium.xenium.

  • show_progress (bool) – Whether to display progress bars while binning codewords.

Returns:

SpatialData object augmented with bin-by-codeword count tables at each requested resolution and, when requested, a cell-by-codeword table named table_codeword.

Return type:

SpatialData

Raises:
  • ImportError – If required optional dependencies are not installed.

  • ValueError – If path/layout/arguments are invalid.