splisosm.dataset#
Dataset helpers for batched GLM and GLMM training.
Classes#
Dataset for batched training of GLM and GLMM models. |
Module Contents#
- class splisosm.dataset.IsoDataset(data, gene_names=None, group_gene_by_n_iso=False)#
Dataset for batched training of GLM and GLMM models.
IsoDataset.get_dataloaderreturns a DataLoader that yields batches of genes for training.If
group_gene_by_n_isois True, genes with the same number of isoforms are grouped together and stored as a 3D tensor of shape (n_genes, n_spots, n_isos). Otherwise, genes are stored as a list of per-gene tensors of shape (n_spots, n_isos).Example
>>> from splisosm.dataset import IsoDataset >>> import torch >>> # Simulate data for 10 genes with different number of isoforms >>> data_3_iso = [torch.randn(100, 3) for _ in range(5)] # 5 genes with 3 isoforms >>> data_4_iso = [torch.randn(100, 4) for _ in range(5)] # 5 genes with 4 isoforms >>> data = data_3_iso + data_4_iso >>> gene_names = [f"gene_{i}" for i in range(10)] >>> dataset = IsoDataset(data, gene_names, group_gene_by_n_iso=True) >>> # Get dataloader for batched training >>> dataloader = dataset.get_dataloader(batch_size=2) >>> batch = next(iter(dataloader))
- Parameters:
data (list[Tensor]) – List of tensors with shape (n_spots, n_isos).
gene_names (Optional[list[str]]) – List of gene names. If None, auto-generated.
group_gene_by_n_iso (bool) – Whether to group genes by the number of isoforms.
- get_dataloader(batch_size=1)#
Get dataloader for the dataset.
- Parameters:
batch_size (int) – Maximum number of genes in a batch.
- Returns:
DataLoader iterator.
- Return type:
Iterator[Any]
- dataset: list[Dataset]#
If
group_by_n_isois True, a list ofGroupedIsoDatasetwhere isoform counts are stored as 3D tensors. Otherwise, a list ofUngroupedIsoDatasetwhere isoform counts are stored as a list of 2D tensors.
- datasets = None#
- gene_name: list[str]#
List of gene names.
- gene_names#
- group_by_n_iso: bool#
Whether to group genes by the number of isoforms.
- group_gene_by_n_iso = False#
- n_genes: int#
Number of genes.
- n_isos_per_gene: list[int]#
List of numbers of isoforms per gene.
- n_spots: int#
Number of spots.