# Self-Supervised Representation Learning Official repository of the paper **Whitening for Self-Supervised Representation Learning** ICML 2021 | [arXiv:2007.06346](https://arxiv.org/abs/2007.06346) It includes 3 types of losses: - W-MSE [arXiv](https://arxiv.org/abs/2007.06346) - Contrastive [SimCLR arXiv](https://arxiv.org/abs/2002.05709) - BYOL [arXiv](https://arxiv.org/abs/2006.07733) And 5 datasets: - CIFAR-10 and CIFAR-100 - STL-10 - Tiny ImageNet - ImageNet-100 Checkpoints are stored in `data` each 100 epochs during training. The implementation is optimized for a single GPU, although multiple are also supported. It includes fast evaluation: we pre-compute embeddings for the entire dataset and then train a classifier on top. The evaluation of the ResNet-18 encoder takes about one minute. ## Installation The implementation is based on PyTorch. Logging works on [wandb.ai](https://wandb.ai/). See `docker/Dockerfile`. #### ImageNet-100 To get this dataset, take the original ImageNet and filter out [this subset of classes](https://github.com/HobbitLong/CMC/blob/master/imagenet100.txt). We do not use augmentations during testing, and loading big images with resizing on the fly is slow, so we can preprocess classifier train and test images. We recommend [mogrify](https://imagemagick.org/script/mogrify.php) for it. First, you need to resize to 256 (just like `torchvision.transforms.Resize(256)`) and then crop to 224 (like `torchvision.transforms.CenterCrop(224)`). Finally, put the original images to `train`, and resized to `clf` and `test`. ## Usage Detailed settings are good by default, to see all options: ``` python -m train --help python -m test --help ``` To reproduce the results from [table 1](https://arxiv.org/abs/2007.06346): #### W-MSE 4 ``` python -m train --dataset cifar10 --epoch 1000 --lr 3e-3 --num_samples 4 --bs 256 --emb 64 --w_size 128 python -m train --dataset cifar100 --epoch 1000 --lr 3e-3 --num_samples 4 --bs 256 --emb 64 --w_size 128 python -m train --dataset stl10 --epoch 2000 --lr 2e-3 --num_samples 4 --bs 256 --emb 128 --w_size 256 python -m train --dataset tiny_in --epoch 1000 --lr 2e-3 --num_samples 4 --bs 256 --emb 128 --w_size 256 ``` #### W-MSE 2 ``` python -m train --dataset cifar10 --epoch 1000 --lr 3e-3 --emb 64 --w_size 128 python -m train --dataset cifar100 --epoch 1000 --lr 3e-3 --emb 64 --w_size 128 python -m train --dataset stl10 --epoch 2000 --lr 2e-3 --emb 128 --w_size 256 --w_iter 4 python -m train --dataset tiny_in --epoch 1000 --lr 2e-3 --emb 128 --w_size 256 --w_iter 4 ``` #### Contrastive ``` python -m train --dataset cifar10 --epoch 1000 --lr 3e-3 --emb 64 --method contrastive --arch resnet50 python -m train --dataset cifar100 --epoch 1000 --lr 3e-3 --emb 64 --method contrastive --arch resnet50 python -m train --dataset stl10 --epoch 2000 --lr 2e-3 --emb 128 --method contrastive --arch resnet50 python -m train --dataset tiny_in --epoch 1000 --lr 2e-3 --emb 128 --method contrastive --arch resnet50 ``` #### BYOL ``` python -m train --dataset cifar10 --epoch 1000 --lr 3e-3 --emb 64 --method byol python -m train --dataset cifar100 --epoch 1000 --lr 3e-3 --emb 64 --method byol python -m train --dataset stl10 --epoch 2000 --lr 2e-3 --emb 128 --method byol python -m train --dataset tiny_in --epoch 1000 --lr 2e-3 --emb 128 --method byol ``` #### ImageNet-100 ``` python -m train --dataset imagenet --epoch 240 --lr 2e-3 --emb 128 --w_size 256 --crop_s0 0.08 --cj0 0.8 --cj1 0.8 --cj2 0.8 --cj3 0.2 --gs_p 0.2 python -m train --dataset imagenet --epoch 240 --lr 2e-3 --num_samples 4 --bs 256 --emb 128 --w_size 256 --crop_s0 0.08 --cj0 0.8 --cj1 0.8 --cj2 0.8 --cj3 0.2 --gs_p 0.2 ``` Use `--no_norm` to disable normalization (for Euclidean distance). ## Citation ``` @inproceedings{ermolov2021whitening, title={Whitening for self-supervised representation learning}, author={Ermolov, Aleksandr and Siarohin, Aliaksandr and Sangineto, Enver and Sebe, Nicu}, booktitle={International Conference on Machine Learning}, pages={3015--3024}, year={2021}, organization={PMLR} } ```