Deep learning for NeuroImaging in Python.
Note
This page is a reference documentation. It only explains the class signature, and not how to use it. Please refer to the gallery for the big picture.
- class nidl.datasets.openbhb.OpenBHB(root: str, modality: str | tuple[str, ...] = 'vbm', target: str | list[str] | None = 'age', split: str = 'train', streaming: bool = True, max_workers: int = 1, transforms: Callable | None = None, target_transforms: Callable | None = None)[source]¶
Bases:
Dataset
OpenBHB dataset [R2].
The Open Big Healthy Brains (OpenBHB) dataset is a large multi-site brain MRI dataset consisting of 3227 training samples and 757 validation samples. It aggregates T1-weighted (T1w) MRI scans from 10 public datasets:
IXI
ABIDE I
ABIDE II
CoRR
GSP
Localizer
MPI-Leipzig
NAR
NPC
RBP
These scans were acquired across 93 centers worldwide (North America, Europe, and China). Only healthy controls aged between 6 and 88 years are included, with balanced representation of males and females.
All T1w MRI scans have been uniformly preprocessed using CAT12 (SPM), FreeSurfer, and Quasi-Raw (in-house minimal preprocessing). Both Voxel-Based Morphometry (VBM) and Surface-Based Morphometry (SBM) features are available.
Warning
The entire OpenBHB takes ~350GB of disk. We recommend enabling streaming=True if you intend to use only a small portion of the dataset.
- Parameters:
root : str
Path to the root data directory where the dataset is stored.
modality : str or tuple of str
Which modality to load for each brain image. If a tuple (multimodal OpenBHB), a dictionary is returned from __getitem__ with modality names as keys and corresponding NumPy arrays as values.
Available modalities:
“vbm”: Whole-brain voxel-based morphometry 3D T1w image, shape (121, 145, 121)
“quasiraw”: Whole-brain T1w image with minimal preprocessing, shape (182, 218, 182)
“vbm_roi”: Gray matter volume per region (Neuromorphometrics atlas, 142 regions by hemisphere), shape (1, 284)
“fs_desikan_roi”: FreeSurfer surface-based features computed on the Desikan atlas (34 regions by hemisphere), shape (7, 68)
“fs_destrieux_roi”: FreeSurfer surface-based features computed on the Destrieux atlas (74 regions by hemisphere), shape (7, 148)
“fs_xhemi”: FreeSurfer surface-based features (curvature, sulcal depth, cortical thickness) computed on the fsaverage7 mesh (163842 vertices by hemisphere), shape (8, 163842)
target : {‘age’, ‘sex’, ‘site’}, list of str, or None
Target(s) to return with each image. If string, returns the target as float (for ‘age’), int (for ‘site’) or string (for ‘sex’). If target is a list of strings, returns multiple targets as dictionary: {<target>: <value>}. If None, no target is returned.
split : {‘train’, ‘val’, ‘internal_val’, ‘external_val’}
Dataset split to use. The ‘val’ split is the union of:
‘internal_val’: Images acquired with the same MRI scanner as training data (in-domain)
‘external_val’: Images acquired with different MRI scanners (out-of-domain)
streaming : bool, default=True
If True, data are downloaded lazily from Hugging Face on demand (when accessed via __getitem__). If False, the entire split is downloaded at initialization for the requested modality.
max_workers : int, default=1
Number of concurrent threads to download files on the Hugging Face, 1 thread = 1 file download. Warning: setting max_workers > 1 can raise Hugging Face 429 errors (too many requests). We recommend keeping this value low.
transforms : callable or None, default=None
A function/transform that takes in a brain image and returns a transformed version. Input depends on modality and can be a 3D image, 1D vector, or dict.
target_transforms : callable or None, default=None
A function/transform applied to the target(s).
Notes
The data are downloaded exclusively from the OpenBHB repository in the HuggingFace either on-the-fly (lazy download) or during initialization (immediate download) if there are not already there.
References
[R2] (1,2)Dufumier, B., Grigis, A., Victor, J., Ambroise, C., Frouin, V. & Duchesnay, E. (2022). OpenBHB: a Large-Scale Multi-Site Brain MRI Data-set for Age Prediction and Debiasing. NeuroImage, 254, 119121. https://doi.org/10.1016/j.neuroimage.2022.119637
Examples
Load the VBM modality from the training split and get the age target:
>>> dataset = OpenBHB( ... root='data/openbhb', modality='vbm', target='age', ... split='train' ... ) >>> image, age = dataset[0] >>> print(image.shape) (1, 121, 145, 121) >>> print(age) 34.0
Load multiple modalities and multiple targets:
>>> dataset = OpenBHB( ... root='data/openbhb', ... modality=('vbm', 'quasiraw'), ... target=['age', 'sex', 'site'], ... split='val' ... ) >>> data, targets = dataset[10] >>> print(data['vbm'].shape) (1, 121, 145, 121) >>> print(data['quasiraw'].shape) (1, 182, 218, 182) >>> print(targets) {'age': 19.0, 'sex': 'female', 'site': 0}
- download_dataset_split(split: str, modality: tuple[str, ...], samples: list[tuple[Any, Any]], incremental: bool = True, max_workers: int = 8)[source]¶
Fetch a split of the dataset from Hugging Face if not present.
- Parameters:
split : {‘train’, ‘val’, ‘internal_val’, ‘external_val’}
Split to download if not present.
modality : tuple of str
Modalities to download (“vbm”, “vbm_roi”, “quasiraw”, “fs_xhemi”, “fs_desikan_roi” or “fs_destrieux_roi”)
samples : list of tuple
List of paths to the data in the current split. This should have been generated by make_dataset.
incremental : bool, default=True
If True, only missing files in the data split are downloaded. Otherwise, all data in the split are downloaded and local data are eventually replaced.
max_workers : int, default=8
Number of concurrent threads to download files (1 thread = 1 file download).
- download_file(filename: str) str [source]¶
Download a single file from the OpenBHB repository on the HF.
- get_cat12_template()[source]¶
Get the CAT12 gray matter tissue probability map as NIfTI image.
This method retrieves the CAT12 gray matter (GM) tissue probability map (TPM) registered to MNI152 space.
- Returns:
nii : nibabel.Nifti1Image, shape (121, 145, 121)
A 3D NIfTI image containing the CAT12 gray matter TPM.
See also
nibabel.load
Function used to load the NIfTI image.
Notes
The template file is expected at: <root>/resource/cat12vbm_space-MNI152_desc-gm_TPM.nii.gz. If the file is not available locally, it will be downloaded from the Hugging Face.
- get_fs_labels(atlas: str = 'destrieux', symmetric: bool = False)[source]¶
Get region names on the given atlas where “fs_destrieux_roi” (for “destrieux” atlas) or “fs_desikan_roi” (for “desikan” atlas) features have been computed in OpenBHB.
The names are extracted from the resource file.
First 74 (resp. 38) regions are from the left hemisphere, last 74 (resp. 38) are from the right hemisphere for the Destrieux (resp. Desikan).
- Parameters:
symmetric : bool
If True, removes “lh-” and “rh-” from labels indicating right and left hemisphere. Final length is divided by two.
- Returns:
labels : list of string
List of region names on the given atlas.
Notes
The resource file is expected at: <root>/resource/resources.json. If it is not present locally, it is automatically downloaded from the Hugging Face.
- get_fs_roi_feature_names()[source]¶
Get the 7 feature names corresponding to “fs_destrieux_roi” and “fs_desikan_roi” data.
The feature names are extracted from the resource file.
- Returns:
features : list of string
List of 7 feature names corresponding to “fs_destrieux_roi” and “fs_desikan_roi” data in OpenBHB.
Notes
The resource file is expected at: <root>/resource/resources.json. If it is not present locally, it is automatically downloaded from the Hugging Face.
- get_fs_xhemi_feature_names()[source]¶
Get the 8 feature names corresponding to “fs_xhemi” data.
The feature names are extracted from the resource file. If it is not present locally, it is automatically downloaded from the Hugging Face.
- Returns:
features : list of string
List of 8 feature names corresponding to “fs_xhemi”.
Notes
The resource file is expected at: <root>/resource/resources.json
- get_neuromorphometrics_atlas()[source]¶
Get the Neuromorphometrics gray matter atlas and its region names.
This method loads the Neuromorphometrics gray matter atlas as a NIfTI image, along with the associated region names (abbreviations).
- Returns:
dict
A dictionary containing:
data :
nibabel.Nifti1Image
, the atlas image.labels : list of region names (string) corresponding to integer labels in the atlas.
See also
nibabel.load
Function used to load the NIfTI image.
Notes
Expects the following files under the resource directory:
<root>/resource/neuromorphometrics.nii : NIfTI atlas file
<root>/resource/neuromorphometrics.csv : CSV with region names.
If the files are not found locally, they will be downloaded from the Hugging Face.
- get_quasiraw_template()[source]¶
Get the quasi-raw MNI152 brain template as a NIfTI image.
This method retrieves the quasi-raw T1-weighted brain template in MNI152 space.
- Returns:
nii : nibabel.Nifti1Image, shape (182, 218, 182)
A 3D NIfTI image containing the quasi-raw MNI152 brain template.
See also
nibabel.load
Function used to load the NIfTI image.
Notes
The template file is expected at: <root>/resource/quasiraw_space-MNI152_desc-brain_T1w.nii.gz. If the file is not present locally, it is automatically downloaded from the Hugging Face.
- get_vbm_roi_labels()[source]¶
Get region names on the Neuromorphometrics atlas where “vbm_roi” features are computed in OpenBHB.
The names are extracted from the resource file. If it is not present locally, it is automatically downloaded from the Hugging Face.
First 142 features are GM volumes, last 142 are CSF volumes.
- Returns:
labels : list of string
List of region names on the Neuromorphometrics atlas where “vbm_roi” features have been computed.
Notes
The resource file is expected at: <root>/resource/resources.json
- make_dataset(split: str)[source]¶
Generate a list of sample file paths and their corresponding targets for a given dataset split.
This method constructs file paths for each participant listed in participants.tsv according to the specified split. It supports both unimodal and multimodal configurations, depending on the modality attribute.
Each returned sample is a tuple of the form:
(str, target): if modality is a single string
(tuple of str, target): if modality is a tuple of strings
If target is None, the sample tuple excludes the target and only contains the path or tuple of paths.
- Parameters:
split : {‘train’, ‘val’, ‘internal_val’, ‘external_val’}
Which participants to include in the dataset.
- Returns:
samples : list of tuple
List of samples in the form (path(s), target), where:
path(s) is either a single file path (str) or a tuple of file paths (if multiple modalities are used).
target is the associated value from the participants metadata (e.g., age, sex, site), or None if target is not set.
Notes
Expects the following file under the root directory: <root>/participants.tsv. If not present, it is downloaded automatically from the Hugging Face.
Follow us