Note

This page is a reference documentation. It only explains the class signature, and not how to use it. Please refer to the user guide for the big picture.

nidl.datasets.BaseImageDataset¶

class nidl.datasets.BaseImageDataset(root, patterns, channels, subject_in_patterns, split='train', targets=None, target_mapping=None, transforms=None, mask=None, withdraw_subjects=None)[source]¶

Bases: BaseDataset

Scalable neuroimaging dataset that uses files.

Parameters:

root: str: the location where are stored the data.
patterns: str or list of str: the relative locations of the images to be loaded.
channels: str or list of str, default=None: the name of the channels.
subject_in_patterns: int or list of int: the folder level where the subject identifiers can be retrieved.
split: str, default ‘train’: define the split to be considered.
targets: str or list of str, default=None: the dataset will also return these tabular data.
target_mapping: dict, default None: optionaly, define a dictionary specifying different replacement values for different existing values. See pandas DataFrame.replace documentation for more information.
transforms: callable, default None: a function that can be called to augment the input images.
mask: str, default None: optionnaly, mask the input data using this numpy array.
withdraw_subjects: list of str, default None: optionaly, provide a list of subjects to remove from the dataset.

Raises:

FileNotFoundError: If the mandatorry input files are not found.
KeyError: If the mandatory key are not found.
UserWarning: If missing data are found.

Notes

A ‘participants.tsv’ file containing subject information (including the requested targets) is expected at the root. A ‘<split>.tsv’ file containg the subject to include is expected at the root. The general idea is not to copy all your data in the root folder but rather use a single symlink per project (if you are working with aggregated data). To enforce reproducibility you can check if the content of each file is persistent using the get_checksum method.

__init__(root, patterns, channels, subject_in_patterns, split='train', targets=None, target_mapping=None, transforms=None, mask=None, withdraw_subjects=None)[source]¶

get_checksum(path)[source]¶: Hashing file.

get_data(idx)[source]¶: Proper data indexing.

sanitize_subject(subject)[source]¶