Hello, here is the page
Base dataset¶
- class alodataset.base_dataset.BaseDataset(name, transform_fn=None, ignore_errors=False, print_errors=True, max_retry_on_error=3, retry_offset=20, sample=False, **kwargs)¶
Bases:
Generic
[torch.utils.data.dataset.T_co
]- download_sample()¶
Download a dataset sample, replacing the original dataset.
- Returns
- str
New directory: self.vb_folder+”samples”
- Raises
- Exception
The dataset must be one of DATASETS_DOWNLOAD_PATHS
- rtype
str
..
- get(idx)¶
Get a specific element in the dataset. Note that usually we could call directly dataset[idx] instead of dataset.get(idx). But right now __getitem__ is ony fully support through the stream_loader and the train_loader.
- Parameters
- idx: int
Index of the element to get
- :rtype: :py:class:`~typing.Dict`[:py:class:`str`, :py:class:`~aloscene.frame.Frame`]
- get_dataset_dir()¶
Look for dataset_dir based on the given name. To work properly a alodataset_config.json file must be save into /home/USER/.aloception/alodataset_config.json
- Return type
str
- getitem()¶
- getitem_ignore_errors(idx)¶
Try to get item at index idx. If data is invalid, retry at a shifted index. Repeat until the max limit of authorized tries is reached.
- prepare()¶
Prepare the dataset. Not all child class need to implement this method. However, for some classes, it could be effective to prepare the dataset either to be faster later or to reduce the storage of the whole dataset.
- set_dataset_dir(dataset_dir)¶
Set the dataset_dir into the config file. This method will write the path into /home/USER/.aloception/alodataset_config.json by replacing the current one (if any)
- Parameters
- dataset_dir: str
Path to the new directory
- stream_loader(num_workers=2)¶
Get a stream loader from the dataset. Compared to the
train_loader()
thestream_loader()
do not have batch dimension and do not shuffle the dataset.- Parameters
- datasettorch.utils.data.Dataset
Dataset to make dataloader
- num_workersint
Number of workers, by default 2
- Returns
- torch.utils.data.DataLoader
A generator
- train_loader(batch_size=1, num_workers=2, sampler=<class 'torch.utils.data.sampler.RandomSampler'>)¶
Get training loader from the dataset
- Parameters
- datasettorch.utils.data.Dataset
Dataset to make dataloader
- batch_sizeint, optional
Batch size, by default 1
- num_workersint, optional
Number of workers, by default 2
- samplertorch.utils.data, optional
Callback to sampler the dataset, by default torch.utils.data.RandomSampler
- Returns
- torch.utils.data.DataLoader
A generator
- property vb_folder¶
- class alodataset.base_dataset.Split(value)¶
Bases:
enum.Enum
An enumeration.
- TEST: str = 'test'¶
- TRAIN: str = 'train'¶
- Type: List[str] = ['train', 'val', 'test']¶
- VAL: str = 'val'¶
- alodataset.base_dataset.rename_data_to_none(data)¶
Temporarily remove data names until next call to names property. Necessary for pytorch operations that don’t support named tensors
- alodataset.base_dataset.stream_loader(dataset, num_workers=2)¶
Get a stream loader from the dataset. Compared to the
train_loader()
thestream_loader()
do not have batch dimension and do not shuffle the dataset.- Parameters
- datasettorch.utils.data.Dataset
Dataset to make dataloader
- num_workersint
Number of workers, by default 2
- Returns
- torch.utils.data.DataLoader
A generator
- alodataset.base_dataset.train_loader(dataset, batch_size=1, num_workers=2, sampler=<class 'torch.utils.data.sampler.RandomSampler'>)¶
Get training loader from the dataset
- Parameters
- datasettorch.utils.data.Dataset
Dataset to make dataloader
- batch_sizeint, optional
Batch size, by default 1
- num_workersint, optional
Number of workers, by default 2
- samplertorch.utils.data, optional
Callback to sampler the dataset, by default torch.utils.data.RandomSampler
- Returns
- torch.utils.data.DataLoader
A generator