splisosm.dataset
================

.. py:module:: splisosm.dataset

.. autoapi-nested-parse::

   Dataset helpers for batched GLM and GLMM training.



Classes
-------

.. autoapisummary::

   splisosm.dataset.IsoDataset


Module Contents
---------------

.. py:class:: IsoDataset(data, gene_names = None, group_gene_by_n_iso = False)

   Dataset for batched training of GLM and GLMM models.

   `IsoDataset.get_dataloader` returns a DataLoader that yields batches of genes for training.

   If `group_gene_by_n_iso` is True, genes with the same number of isoforms are grouped together
   and stored as a 3D tensor of shape (n_genes, n_spots, n_isos).
   Otherwise, genes are stored as a list of per-gene tensors of shape (n_spots, n_isos).

   .. rubric:: Example

   >>> from splisosm.dataset import IsoDataset
   >>> import torch
   >>> # Simulate data for 10 genes with different number of isoforms
   >>> data_3_iso = [torch.randn(100, 3) for _ in range(5)]  # 5 genes with 3 isoforms
   >>> data_4_iso = [torch.randn(100, 4) for _ in range(5)]  # 5 genes with 4 isoforms
   >>> data = data_3_iso + data_4_iso
   >>> gene_names = [f"gene_{i}" for i in range(10)]
   >>> dataset = IsoDataset(data, gene_names, group_gene_by_n_iso=True)
   >>> # Get dataloader for batched training
   >>> dataloader = dataset.get_dataloader(batch_size=2)
   >>> batch = next(iter(dataloader))

   :param data: List of tensors with shape (n_spots, n_isos).
   :param gene_names: List of gene names. If None, auto-generated.
   :param group_gene_by_n_iso: Whether to group genes by the number of isoforms.


   .. py:method:: get_dataloader(batch_size = 1)

      Get dataloader for the dataset.

      :param batch_size: Maximum number of genes in a batch.

      :returns: DataLoader iterator.
      :rtype: Iterator[Any]



   .. py:attribute:: data
      :type:  list[torch.Tensor]

      Input list of per-gene isoform count tensor.


   .. py:attribute:: dataset
      :type:  list[torch.utils.data.Dataset]

      If `group_by_n_iso` is True, a list of ``GroupedIsoDataset`` where isoform counts are stored as 3D tensors.
      Otherwise, a list of ``UngroupedIsoDataset`` where isoform counts are stored as a list of 2D tensors.


   .. py:attribute:: datasets
      :value: None



   .. py:attribute:: gene_name
      :type:  list[str]

      List of gene names.


   .. py:attribute:: gene_names


   .. py:attribute:: group_by_n_iso
      :type:  bool

      Whether to group genes by the number of isoforms.


   .. py:attribute:: group_gene_by_n_iso
      :value: False



   .. py:attribute:: n_genes
      :type:  int

      Number of genes.


   .. py:attribute:: n_isos_per_gene
      :type:  list[int]

      List of numbers of isoforms per gene.


   .. py:attribute:: n_spots
      :type:  int

      Number of spots.


