Note
This page is a reference documentation. It only explains the class signature, and not how to use it. Please refer to the user guide for the big picture.
nidl.estimators.ssl.utils.DINOProjectionHead¶
- class nidl.estimators.ssl.utils.DINOProjectionHead(input_dim=2048, hidden_dim=2048, bottleneck_dim=256, output_dim=4096, batch_norm=True, freeze_last_layer=-1, norm_last_layer=True)[source]¶
Bases:
ProjectionHeadProjection head used in DINO [1].
The projection head consists of a 3-layer multi-layer perceptron (MLP) with hidden dimension 2048 followed by l2 normalization and a weight normalized fully connected layer with K dimensions, which is similar to the design from SwAV [2].
- Parameters:
- input_dim: int, default=2048
The input dimension of the head.
- hidden_dim: int, default=2048
The hidden dimension.
- bottleneck_dim: int, default=256
Dimension of the bottleneck in the last layer of the head.
- output_dim: int, default=4096
The output dimension of the head.
- batch_norm: bool, default=True
Whether to use batch norm or not. Should be set to False when using a vision transformer backbone.
- freeze_last_layer: int, default=-1
Number of epochs during which we keep the output layer fixed. Typically doing so during the first epoch helps training. Try increasing this value if the loss does not decrease.
- norm_last_layer: bool, default=True
Whether or not to weight normalize the last layer of the DINO head. Not normalizing leads to better performance but can make the training unstable.
References
[1]Caron, M., et al., “Emerging Properties in Self-Supervised Vision Transformers” ICCV, 2021. https://arxiv.org/abs/2104.14294
[2]Caron, M., et al., “Unsupervised Learning of Visual Features by Contrasting Cluster Assignments”, NeurIPS, 2020. https://arxiv.org/abs/2006.09882
- __init__(input_dim=2048, hidden_dim=2048, bottleneck_dim=256, output_dim=4096, batch_norm=True, freeze_last_layer=-1, norm_last_layer=True)[source]¶
Initializes the DINOProjectionHead with the specified dimensions.