Hello, here is the page

Base dataset

class alodataset.base_dataset.BaseDataset(name, transform_fn=None, ignore_errors=False, print_errors=True, max_retry_on_error=3, retry_offset=20, sample=False, **kwargs)

Bases: Generic[torch.utils.data.dataset.T_co]

download_sample()

Download a dataset sample, replacing the original dataset.

Returns
str

New directory: self.vb_folder+”samples”

Raises
Exception

The dataset must be one of DATASETS_DOWNLOAD_PATHS

rtype

str ..

get(idx)

Get a specific element in the dataset. Note that usually we could call directly dataset[idx] instead of dataset.get(idx). But right now __getitem__ is ony fully support through the stream_loader and the train_loader.

Parameters
idx: int

Index of the element to get

:rtype: :py:class:`~typing.Dict`[:py:class:`str`, :py:class:`~aloscene.frame.Frame`]
get_dataset_dir()

Look for dataset_dir based on the given name. To work properly a alodataset_config.json file must be save into /home/USER/.aloception/alodataset_config.json

Return type

str

getitem()
getitem_ignore_errors(idx)

Try to get item at index idx. If data is invalid, retry at a shifted index. Repeat until the max limit of authorized tries is reached.

prepare()

Prepare the dataset. Not all child class need to implement this method. However, for some classes, it could be effective to prepare the dataset either to be faster later or to reduce the storage of the whole dataset.

set_dataset_dir(dataset_dir)

Set the dataset_dir into the config file. This method will write the path into /home/USER/.aloception/alodataset_config.json by replacing the current one (if any)

Parameters
dataset_dir: str

Path to the new directory

stream_loader(num_workers=2)

Get a stream loader from the dataset. Compared to the train_loader() the stream_loader() do not have batch dimension and do not shuffle the dataset.

Parameters
datasettorch.utils.data.Dataset

Dataset to make dataloader

num_workersint

Number of workers, by default 2

Returns
torch.utils.data.DataLoader

A generator

train_loader(batch_size=1, num_workers=2, sampler=<class 'torch.utils.data.sampler.RandomSampler'>)

Get training loader from the dataset

Parameters
datasettorch.utils.data.Dataset

Dataset to make dataloader

batch_sizeint, optional

Batch size, by default 1

num_workersint, optional

Number of workers, by default 2

samplertorch.utils.data, optional

Callback to sampler the dataset, by default torch.utils.data.RandomSampler

Returns
torch.utils.data.DataLoader

A generator

property vb_folder
class alodataset.base_dataset.Split(value)

Bases: enum.Enum

An enumeration.

TEST: str = 'test'
TRAIN: str = 'train'
Type: List[str] = ['train', 'val', 'test']
VAL: str = 'val'
alodataset.base_dataset.rename_data_to_none(data)

Temporarily remove data names until next call to names property. Necessary for pytorch operations that don’t support named tensors

alodataset.base_dataset.stream_loader(dataset, num_workers=2)

Get a stream loader from the dataset. Compared to the train_loader() the stream_loader() do not have batch dimension and do not shuffle the dataset.

Parameters
datasettorch.utils.data.Dataset

Dataset to make dataloader

num_workersint

Number of workers, by default 2

Returns
torch.utils.data.DataLoader

A generator

alodataset.base_dataset.train_loader(dataset, batch_size=1, num_workers=2, sampler=<class 'torch.utils.data.sampler.RandomSampler'>)

Get training loader from the dataset

Parameters
datasettorch.utils.data.Dataset

Dataset to make dataloader

batch_sizeint, optional

Batch size, by default 1

num_workersint, optional

Number of workers, by default 2

samplertorch.utils.data, optional

Callback to sampler the dataset, by default torch.utils.data.RandomSampler

Returns
torch.utils.data.DataLoader

A generator