Hello, here is the page

Base dataset¶

class alodataset.base_dataset.BaseDataset(name, transform_fn=None, ignore_errors=False, print_errors=True, max_retry_on_error=3, retry_offset=20, sample=False, **kwargs)¶

Bases: Generic[torch.utils.data.dataset.T_co]

download_sample()¶

Download a dataset sample, replacing the original dataset.

Returns

str: New directory: self.vb_folder+”samples”

Raises

Exception: The dataset must be one of DATASETS_DOWNLOAD_PATHS

rtype: str ..

get(idx)¶

Get a specific element in the dataset. Note that usually we could call directly dataset[idx] instead of dataset.get(idx). But right now __getitem__ is ony fully support through the stream_loader and the train_loader.

Parameters

idx: int: Index of the element to get
:rtype: :py:class:`~typing.Dict`[:py:class:`str`, :py:class:`~aloscene.frame.Frame`]

get_dataset_dir()¶

Look for dataset_dir based on the given name. To work properly a alodataset_config.json file must be save into /home/USER/.aloception/alodataset_config.json

Return type: str

getitem()¶

getitem_ignore_errors(idx)¶: Try to get item at index idx. If data is invalid, retry at a shifted index. Repeat until the max limit of authorized tries is reached.

prepare()¶: Prepare the dataset. Not all child class need to implement this method. However, for some classes, it could be effective to prepare the dataset either to be faster later or to reduce the storage of the whole dataset.

set_dataset_dir(dataset_dir)¶

Set the dataset_dir into the config file. This method will write the path into /home/USER/.aloception/alodataset_config.json by replacing the current one (if any)

Parameters

dataset_dir: str: Path to the new directory

stream_loader(num_workers=2)¶

Get a stream loader from the dataset. Compared to the train_loader() the stream_loader() do not have batch dimension and do not shuffle the dataset.

Parameters

datasettorch.utils.data.Dataset: Dataset to make dataloader
num_workersint: Number of workers, by default 2

Returns

torch.utils.data.DataLoader: A generator

train_loader(batch_size=1, num_workers=2, sampler=<class 'torch.utils.data.sampler.RandomSampler'>)¶

Get training loader from the dataset

Parameters

datasettorch.utils.data.Dataset: Dataset to make dataloader
batch_sizeint, optional: Batch size, by default 1
num_workersint, optional: Number of workers, by default 2
samplertorch.utils.data, optional: Callback to sampler the dataset, by default torch.utils.data.RandomSampler

Returns

torch.utils.data.DataLoader: A generator

property vb_folder¶

class alodataset.base_dataset.Split(value)¶

Bases: enum.Enum

An enumeration.

TEST: str = 'test'¶

TRAIN: str = 'train'¶

Type: List[str] = ['train', 'val', 'test']¶

VAL: str = 'val'¶

alodataset.base_dataset.rename_data_to_none(data)¶: Temporarily remove data names until next call to names property. Necessary for pytorch operations that don’t support named tensors

alodataset.base_dataset.stream_loader(dataset, num_workers=2)¶

Get a stream loader from the dataset. Compared to the train_loader() the stream_loader() do not have batch dimension and do not shuffle the dataset.

Parameters

datasettorch.utils.data.Dataset: Dataset to make dataloader
num_workersint: Number of workers, by default 2

Returns

torch.utils.data.DataLoader: A generator

alodataset.base_dataset.train_loader(dataset, batch_size=1, num_workers=2, sampler=<class 'torch.utils.data.sampler.RandomSampler'>)¶

Get training loader from the dataset

Parameters

datasettorch.utils.data.Dataset: Dataset to make dataloader
batch_sizeint, optional: Batch size, by default 1
num_workersint, optional: Number of workers, by default 2
samplertorch.utils.data, optional: Callback to sampler the dataset, by default torch.utils.data.RandomSampler

Returns

torch.utils.data.DataLoader: A generator