Deep learning for NeuroImaging in Python.

Note

This page is a reference documentation. It only explains the class signature, and not how to use it. Please refer to the gallery for the big picture.

class nidl.datasets.base.BaseImageDataset(root, patterns, channels, subject_in_patterns, split='train', targets=None, target_mapping=None, transforms=None, mask=None, withdraw_subjects=None)[source]¶

Bases: BaseDataset

Scalable neuroimaging dataset that uses files.

Parameters:

root : str

the location where are stored the data.

patterns : str or list of str

the relative locations of the images to be loaded.

channels : str or list of str, default=None

the name of the channels.

subject_in_patterns : int or list of int

the folder level where the subject identifiers can be retrieved.

split : str, default ‘train’

define the split to be considered.

targets : str or list of str, default=None

the dataset will also return these tabular data.

target_mapping : dict, default None

optionaly, define a dictionary specifying different replacement values for different existing values. See pandas DataFrame.replace documentation for more information.

transforms : callable, default None

a function that can be called to augment the input images.

mask : str, default None

optionnaly, mask the input data using this numpy array.

withdraw_subjects : list of str, default None

optionaly, provide a list of subjects to remove from the dataset.

Raises:

FileNotFoundError

If the mandatorry input files are not found.

KeyError

If the mandatory key are not found.

UserWarning

If missing data are found.

Notes

A ‘participants.tsv’ file containing subject information (including the requested targets) is expected at the root. A ‘<split>.tsv’ file containg the subject to include is expected at the root. The general idea is not to copy all your data in the root folder but rather use a single symlink per project (if you are working with aggregated data). To enforce reproducibility you can check if the content of each file is persistent using the get_checksum method.

get_checksum(path)[source]¶: Hashing file.

get_data(idx)[source]¶: Proper data indexing.