Menu

Deep learning for NeuroImaging in Python.

Note

This page is a reference documentation. It only explains the class signature, and not how to use it. Please refer to the gallery for the big picture.

class nidl.estimators.ssl.simclr.SimCLR(encoder: Module, hidden_dims: Sequence[str], lr: float, temperature: float, weight_decay: float, random_state: int | None = None, **kwargs)[source]

Bases: TransformerMixin, BaseEstimator

SimCLR implementation.

At each iteration, we get for every data x two differently augmented versions, which we refer to as x_i and x_j. Both of these images are encoded into a one-dimensional feature vector, between which we want to maximize similarity which minimizes it to all other data in the batch. The encoder network is split into two parts: a base encoder network f(.), and a projection head g(.). The base network is usually a deep CNN or SCNN, and is responsible for extracting a representation vector from the augmented data examples. Let’s denote the representations obtained from the encoder h=f(x). The projection head g(.) maps the representation h into a space where we apply the contrastive loss, i.e., compare similarities between vectors. In the original SimCLR paper g(.) was defined as a two-layer MLP with ReLU activation in the hidden layer. Note that in the follow-up paper, SimCLRv2, the authors mention that larger/wider MLPs can boost the performance considerably.

After finishing the training with contrastive learning, we will remove the projection head g(.), and use f(.) as a pretrained feature extractor. The representations z that come out of the projection head g(.) have been shown to perform worse than those of the base network f(.) when finetuning the network for a new task. This is likely because the representations z are trained to become invariant to many features that can be important for downstream tasks. Thus, g(.) is only needed for the contrastive learning stage.

Now that the architecture is described, let’s take a closer look at how we train the model. As mentioned before, we want to maximize the similarity between the representations of the two augmented versions of the same image, i.e., z_i and z_j, while minimizing it to all other examples in the batch. SimCLR thereby applies the InfoNCE loss, originally proposed by Aaron van den Oord et al. for contrastive learning. In short, the InfoNCE loss compares the similarity of z_i and z_j to the similarity of z_i to any other representation in the batch by performing a softmax over the similarity values. The loss can be formally written as:

\ell_{i,j} = -\log \frac{\exp(\text{sim}(z_i,z_j)/\tau)}{
             \sum_{k=1}^{2N}\mathbb{1}_{[k\neq i]}
                \exp(\text{sim}(z_i,z_k)/\tau)}
           = -\text{sim}(z_i,z_j)/\tau
             +\log\left[\sum_{k=1}^{2N}\mathbb{1}_{[k\neq i]}
                \exp(\text{sim}(z_i,z_k)/\tau)\right]

The function text{sim} is a similarity metric, and the hyperparameter tau is called temperature determining how peaked the distribution is. Since many similarity metrics are bounded, the temperature parameter allows us to balance the influence of many dissimilar image patches versus one similar patch. The similarity metric that is used in SimCLR is cosine similarity, as defined below:

\text{sim}(z_i,z_j) = \frac{z_i^\top \cdot z_j}{||z_i||\cdot||z_j||}

The maximum cosine similarity possible is 1, while the minimum is -1. In general, we will see that the features of two different images will converge to a cosine similarity around zero since the minimum, -1, would require z_i and z_j to be in the exact opposite direction in all feature dimensions, which does not allow for great flexibility.

Alternatively to performing the validation on the contrastive learning loss as well, we could also take a simple, small downstream task, and track the performance of the base network f(.) on that.

Parameters:

encoder : nn.Module

the encoder f(.). It must store the size of the encoded one-dimensional feature vector in a latent_size parameter.

hidden_dims : list of str

the projector g(.) MLP architecture.

lr : float

the learning rate.

temperature : float

the SimCLR loss temperature parameter.

weight_decay : float

the Adam optimizer weight decay parameter.

max_epochs : int, default=None

optionaly, use a CosineAnnealingLR scheduler.

random_state : int, default=None

setting a seed for reproducibility.

kwargs : dict

Trainer parameters.

Notes

A batch of data must contains two elements: two tensors with contrasted images, and a list of tensors containing auxiliary variables.

Attributes

f

a Module containing the encoder.

g

a Module containing the projection head.

configure_optimizers()[source]

Declare a AdamW optimizer and, optionnaly (max_epochs is defined), a CosineAnnealingLR learning-rate scheduler.

info_nce_loss(batch: tuple[Tensor, Tensor], mode: str)[source]

Compute and log the InfoNCE loss using InfoNCE.

training_step(batch: tuple[Tensor, Tensor], batch_idx: int, dataloader_idx: int | None = 0)[source]

Here you compute and return the training loss and some additional metrics for e.g. the progress bar or logger.

Parameters:

batch : iterable, normally a DataLoader

the current data.

batch_idx : int

the index of this batch.

dataloader_idx : int, default=0

the index of the dataloader that produced this batch (only if multiple dataloaders are used).

Returns:

loss : STEP_OUTPUT

the computed loss:

  • Tensor - the loss tensor.

  • dict - a dictionary which can include any keys, but must include the key 'loss' in the case of automatic optimization.

  • None - in automatic optimization, this will skip to the next batch (but is not supported for multi-GPU, TPU, or DeepSpeed). For manual optimization, this has no special meaning, as returning the loss is not required.

To use multiple optimizers, you can switch to ‘manual optimization’

and control their stepping:

Notes

When accumulate_grad_batches > 1, the loss returned here will be automatically normalized by accumulate_grad_batches internally.

Examples

>>> def __init__(self):
>>>     super().__init__()
>>>     self.automatic_optimization = False
>>>
>>>
>>> # Multiple optimizers (e.g.: GANs)
>>> def training_step(self, batch, batch_idx):
>>>     opt1, opt2 = self.optimizers()
>>>
>>>     # do training_step with encoder
>>>     ...
>>>     opt1.step()
>>>     # do training_step with decoder
>>>     ...
>>>     opt2.step()
transform_step(batch: Tensor, batch_idx: int, dataloader_idx: int | None = 0)[source]

Define a transform step.

Share the same API as BaseEstimator.predict_step().

validation_step(batch: tuple[Tensor, Tensor], batch_idx: int, dataloader_idx: int | None = 0)[source]

Operates on a single batch of data from the validation set. In this step you’d might generate examples or calculate anything of interest like accuracy.

Parameters:

batch : iterable, normally a DataLoader

the current data.

batch_idx : int

the index of this batch.

dataloader_idx : int, default=0

the index of the dataloader that produced this batch (only if multiple dataloaders are used).

Returns:

loss : STEP_OUTPUT

the computed loss:

  • Tensor - the loss tensor.

  • dict - a dictionary. can include any keys, but must include the key 'loss'.

  • None - skip to the next batch.

Notes

When the validation_step() is called, the model has been put in eval mode and PyTorch gradients have been disabled. At the end of validation, the model goes back to training mode and gradients are enabled.

Examples

Self-Supervised Contrastive Learning with SimCLR

Self-Supervised Contrastive Learning with SimCLR

Follow us

© 2025, nidl developers