November 2023

Constrastive learning

Introduction

The goal of this project was to explore contrastive learning, a self-supervised learning technique, and implement a downscaled version of the SimCLR framework. The objective was to understand how contrastive learning works and to apply it on a smaller scale using the CIFAR-10 dataset. This involved adapting the original SimCLR model to use the less complex ResNet-18 architecture, instead of the larger ResNet-50 used in the original paper.

By scaling down the model and the dataset, I aimed to assess whether contrastive learning could still deliver robust feature representations in environments with limited computational resources.

Concept and Approach

SimCLR (A Simple Framework for Contrastive Learning of Visual Representations) is a self-supervised method where a neural network learns to maximize the similarity between different augmented views of the same image while minimizing the similarity between views of different images. This is done without needing any labeled data.

For my project, I focused on using ResNet-18 as the base encoder due to its lower computational cost compared to the ResNet-50 model used in the original SimCLR paper. My dataset of choice was CIFAR-10, which consists of 60,000 small images divided into 10 classes.

The main challenge was to ensure that the contrastive learning process could still capture meaningful representations in this reduced setting. This required careful tuning of hyperparameters, data augmentations, and batch sizes.

Technical Solution

The project utilized ResNet-18 as the backbone architecture for encoding images, followed by a multi-layer perceptron (MLP) as the projection head. I implemented the following steps:

Data Augmentation: I applied various augmentations like random cropping, horizontal flipping, and color jittering to generate positive pairs.
Contrastive Loss: The model was trained using the contrastive loss function, which encourages the model to learn representations that are invariant to the applied augmentations.
Evaluation: I tested the quality of the learned representations using t-SNE visualization and nearest neighbor search, which revealed the model’s ability to cluster similar images.

Challenges and Learnings

One of the main challenges was managing batch size. Larger batch sizes tend to produce better results in contrastive learning, but they also demand more computational resources. Given the limitations of my setup, I experimented with different batch sizes to find the optimal balance between learning efficiency and resource consumption.

Through this project, I gained a deeper understanding of self-supervised learning, particularly contrastive learning, and how it can be adapted to smaller datasets and models.

Conclusion and Future Directions

This project demonstrated that contrastive learning remains effective even when scaled down for smaller datasets and simpler models. However, there are still limitations to address, such as the sensitivity to batch size and the quality of learned representations when compared to larger, more powerful models.

Future work could involve exploring Vision Transformers for contrastive learning or fine-tuning the model with other self-supervised techniques.

For more details, you can check the code on GitHub and access the research paper (in english this time) for further reading.