Deep learning for NeuroImaging in Python.
Note
This page is a reference documentation. It only explains the class signature, and not how to use it. Please refer to the gallery for the big picture.
- class nidl.datasets.pandas_dataset.ImageDataFrameDataset(rootdir: str, df: ~pandas.core.frame.DataFrame | ~pandas.core.series.Series | str, image_col: str = 'image_path', label_cols: str | list[str] | None = None, checksum_col: str | None = None, transform: ~typing.Callable | None = None, target_transform: ~typing.Callable | dict[str, ~typing.Callable] | None = None, return_none_if_no_label: bool = True, image_loader: ~typing.Callable = <function default_image_loader>, is_valid_label: ~typing.Callable | dict[str, ~typing.Callable] | None = None, read_csv_kwargs: dict | None = None)[source]¶
Bases:
Dataset
Dataset for loading images from a pandas DataFrame.
This dataset assumes that the DataFrame contains:
one column with file paths to image data;
zero or more additional columns containing target labels (optional).
one column containing the image paths checksums (optional).
Images are loaded on-the-fly from disk when accessed. Labels (if provided) are extracted from the specified column(s) and returned alongside the image.
- Parameters:
rootdir : str
The path where the dataset is stored.
df : pd.DataFrame or pd.Series or str
DataFrame containing image paths relative to the rootdir and optional labels:
if a DataFrame, it should contain at least one column with the image paths;
if a Series, it should contain the image paths;
if str, it should be the path to a CSV file.
image_col : str, default=”image_path”
Name of the column in df containing image file paths.
label_cols : str, list of str, default=None
Name of the column(s) containing label(s):
if None (default), no labels are returned.
if string, it should be the name of a single column in df.
if list, it should contain the names of multiple columns in df.
checksum_col : str; default=None
Name of the column in df containing the image file paths checksums.
transform : Callable, default=None
Optional transform that takes in the loaded image and returns a transformed version.
target_transform : Callable, Dict[str, Callable], default=None
Optional transform applied to the label(s):
if callable: applied to all labels, e.g. lambda y: torch.tensor(y)
if dictionary: apply different transforms per column. In that case, the keys should be included in the column names in label_cols and values must be callable.
return_none_if_no_label : bool, default=True
If True, returns (<img>, None) when getting an item and label_cols is empty or None (default). Otherwise, only <img> is returned.
image_loader : Callable, default=default_image_loader
Function to load the image from the file path. It takes a string (the file path) as input and returns the loaded image. By default, it accepts the following:
all image extensions supported by PIL (e.g., .jpg, .png, .bmp etc.)
numpy arrays (e.g., .npy, .npz)
3D medical images (e.g., .nii, .nii.gz) using nibabel
is_valid_label : Callable, Dict[str, Callable], default=None
Function to check if a label is valid. If None (default), all labels are considered valid. This can be used to filter out samples with invalid labels from the dataset, e.g. NaN. If label_cols is a string, it takes a label as input and returns a boolean. If label_cols is a list, it takes a list of labels as input and returns a boolean.
read_csv_kwargs : Optional[dict], default=None
Additional keyword arguments to pass to pd.read_csv if df is a string path. For instance you can define the proper ‘ ‘ separator when working with a TSV file.
- Raises:
TypeError
If df is not a DataFrame, Series, or path to CSV or if label_cols is not a string or a list of strings.
ValueError
If one the the specified colomn is not found in df or if the targets have incorrect values or if based on checksum data have changed on disk.
Examples
Dataset for supervised computer vision tasks: >>> import pandas as pd >>> from nidl.datasets.pandas_dataset import ImageDataFrameDataset >>> df = pd.DataFrame({ … ‘image_path’: [‘image1.jpg’, ‘image2.jpg’], … ‘label’: [‘cat’, ‘dog’] … }) >>> dataset = ImageDataFrameDataset( … rootdir=’mypath/’, … df=df, … image_col=’image_path’, … label_cols=’label’ … ) >>> image, label = dataset[0] >>> print(label) “cat” >>> print(type(image)) <class ‘PIL.Image.Image’>
Dataset for unsupervised computer vision tasks: >>> df = pd.DataFrame({ … ‘image_path’: [‘image1.jpg’, ‘image2.jpg’] … }) >>> dataset = ImageDataFrameDataset( … rootdir=’mypath/’, … df=df, … image_col=’image_path’ … ) >>> image, _ = dataset[0] >>> print(type(image)) <class ‘PIL.Image.Image’>
Dataset for 3D medical images: >>> df = pd.DataFrame({ … ‘image_path’: [‘mri1.nii’, ‘mri2.nii’], … ‘diagnosis’: [‘patient’, ‘control’], … ‘age’: [30, 25] … }) >>> target_transform = {“diagnosis”: lambda x: 1 if x == ‘patient’ else 0} >>> dataset = ImageDataFrameDataset( … rootdir=’mypath/’, … df=df, … image_col=’image_path’, … label_cols=[‘diagnosis’, ‘age’], … target_transform=target_transform … ) >>> image_mri, (label, age) = dataset[0] >>> print(label_mri, age_mri) (30, 1) >>> print(type(image_mri)) <class ‘nibabel.nifti1.Nifti1Image’>
Attributes
df
(pd.DataFrame) The DataFrame containing image paths and labels.
imgs
(list) List of image paths (before loading).
targets
(list) List of labels (before any transformations).
Follow us