Note

This page is a reference documentation. It only explains the class signature, and not how to use it. Please refer to the user guide for the big picture.

nidl.datasets.BaseImageDataset

class nidl.datasets.BaseImageDataset(root, patterns, channels, subject_in_patterns, split='train', targets=None, target_mapping=None, transforms=None, mask=None, withdraw_subjects=None)[source]

Bases: BaseDataset

Scalable neuroimaging dataset that uses files.

Parameters:
root: str

the location where are stored the data.

patterns: str or list of str

the relative locations of the images to be loaded.

channels: str or list of str, default=None

the name of the channels.

subject_in_patterns: int or list of int

the folder level where the subject identifiers can be retrieved.

split: str, default ‘train’

define the split to be considered.

targets: str or list of str, default=None

the dataset will also return these tabular data.

target_mapping: dict, default None

optionaly, define a dictionary specifying different replacement values for different existing values. See pandas DataFrame.replace documentation for more information.

transforms: callable, default None

a function that can be called to augment the input images.

mask: str, default None

optionnaly, mask the input data using this numpy array.

withdraw_subjects: list of str, default None

optionaly, provide a list of subjects to remove from the dataset.

Raises:
FileNotFoundError

If the mandatorry input files are not found.

KeyError

If the mandatory key are not found.

UserWarning

If missing data are found.

Notes

A ‘participants.tsv’ file containing subject information (including the requested targets) is expected at the root. A ‘<split>.tsv’ file containg the subject to include is expected at the root. The general idea is not to copy all your data in the root folder but rather use a single symlink per project (if you are working with aggregated data). To enforce reproducibility you can check if the content of each file is persistent using the get_checksum method.

__init__(root, patterns, channels, subject_in_patterns, split='train', targets=None, target_mapping=None, transforms=None, mask=None, withdraw_subjects=None)[source]
get_checksum(path)[source]

Hashing file.

get_data(idx)[source]

Proper data indexing.

sanitize_subject(subject)[source]