Deep learning for NeuroImaging in Python.

Note

This page is a reference documentation. It only explains the class signature, and not how to use it. Please refer to the gallery for the big picture.

class nidl.datasets.pandas_dataset.ImageDataFrameDataset(rootdir: str, df: ~pandas.core.frame.DataFrame | ~pandas.core.series.Series | str, image_col: str = 'image_path', label_cols: str | list[str] | None = None, checksum_col: str | None = None, transform: ~typing.Callable | None = None, target_transform: ~typing.Callable | dict[str, ~typing.Callable] | None = None, return_none_if_no_label: bool = True, image_loader: ~typing.Callable = <function default_image_loader>, is_valid_label: ~typing.Callable | dict[str, ~typing.Callable] | None = None, read_csv_kwargs: dict | None = None)[source]¶

Bases: Dataset

Dataset for loading images from a pandas DataFrame.

This dataset assumes that the DataFrame contains:

one column with file paths to image data;
zero or more additional columns containing target labels (optional).
one column containing the image paths checksums (optional).

Images are loaded on-the-fly from disk when accessed. Labels (if provided) are extracted from the specified column(s) and returned alongside the image.

Parameters:

rootdir : str

The path where the dataset is stored.

df : pd.DataFrame or pd.Series or str

DataFrame containing image paths relative to the rootdir and optional labels:

if a DataFrame, it should contain at least one column with the image paths;

if a Series, it should contain the image paths;

if str, it should be the path to a CSV file.

image_col : str, default=”image_path”

Name of the column in df containing image file paths.

label_cols : str, list of str, default=None

Name of the column(s) containing label(s):

if None (default), no labels are returned.

if string, it should be the name of a single column in df.

if list, it should contain the names of multiple columns in df.

checksum_col : str; default=None

Name of the column in df containing the image file paths checksums.

transform : Callable, default=None

Optional transform that takes in the loaded image and returns a transformed version.

target_transform : Callable, Dict[str, Callable], default=None

Optional transform applied to the label(s):

if callable: applied to all labels, e.g. lambda y: torch.tensor(y)

if dictionary: apply different transforms per column. In that case, the keys should be included in the column names in label_cols and values must be callable.

return_none_if_no_label : bool, default=True

If True, returns (<img>, None) when getting an item and label_cols is empty or None (default). Otherwise, only <img> is returned.

image_loader : Callable, default=default_image_loader

Function to load the image from the file path. It takes a string (the file path) as input and returns the loaded image. By default, it accepts the following:

all image extensions supported by PIL (e.g., .jpg, .png, .bmp etc.)

numpy arrays (e.g., .npy, .npz)

3D medical images (e.g., .nii, .nii.gz) using nibabel

is_valid_label : Callable, Dict[str, Callable], default=None

Function to check if a label is valid. If None (default), all labels are considered valid. This can be used to filter out samples with invalid labels from the dataset, e.g. NaN. If label_cols is a string, it takes a label as input and returns a boolean. If label_cols is a list, it takes a list of labels as input and returns a boolean.

read_csv_kwargs : Optional[dict], default=None

Additional keyword arguments to pass to pd.read_csv if df is a string path. For instance you can define the proper ‘ ‘ separator when working with a TSV file.

Raises:

TypeError

If df is not a DataFrame, Series, or path to CSV or if label_cols is not a string or a list of strings.

ValueError

If one the the specified colomn is not found in df or if the targets have incorrect values or if based on checksum data have changed on disk.

Examples

Dataset for supervised computer vision tasks: >>> import pandas as pd >>> from nidl.datasets.pandas_dataset import ImageDataFrameDataset >>> df = pd.DataFrame({ … ‘image_path’: [‘image1.jpg’, ‘image2.jpg’], … ‘label’: [‘cat’, ‘dog’] … }) >>> dataset = ImageDataFrameDataset( … rootdir=’mypath/’, … df=df, … image_col=’image_path’, … label_cols=’label’ … ) >>> image, label = dataset[0] >>> print(label) “cat” >>> print(type(image)) <class ‘PIL.Image.Image’>

Dataset for unsupervised computer vision tasks: >>> df = pd.DataFrame({ … ‘image_path’: [‘image1.jpg’, ‘image2.jpg’] … }) >>> dataset = ImageDataFrameDataset( … rootdir=’mypath/’, … df=df, … image_col=’image_path’ … ) >>> image, _ = dataset[0] >>> print(type(image)) <class ‘PIL.Image.Image’>

Dataset for 3D medical images: >>> df = pd.DataFrame({ … ‘image_path’: [‘mri1.nii’, ‘mri2.nii’], … ‘diagnosis’: [‘patient’, ‘control’], … ‘age’: [30, 25] … }) >>> target_transform = {“diagnosis”: lambda x: 1 if x == ‘patient’ else 0} >>> dataset = ImageDataFrameDataset( … rootdir=’mypath/’, … df=df, … image_col=’image_path’, … label_cols=[‘diagnosis’, ‘age’], … target_transform=target_transform … ) >>> image_mri, (label, age) = dataset[0] >>> print(label_mri, age_mri) (30, 1) >>> print(type(image_mri)) <class ‘nibabel.nifti1.Nifti1Image’>

Attributes

df	(pd.DataFrame) The DataFrame containing image paths and labels.
imgs	(list) List of image paths (before loading).
targets	(list) List of labels (before any transformations).

apply_target_transform(label)[source]¶: Apply the specified target transform to the label(s).

apply_transform(image)[source]¶: Apply the specified transform to the image.