ILSVRC/imagenet-1k
Viewer • Updated • 1.43M • 77.3k • 822
How to use BooBooWu/visreg with timm:
import timm
model = timm.create_model("hf_hub:BooBooWu/visreg", pretrained=True)Key results:
| File | Architecture | Patch Size | Embed Dim | Params | Pre-training Data |
|---|---|---|---|---|---|
visreg-vit-b-inet1k.pth |
ViT-Base | 16 | 768 | 86M | ImageNet-1K |
visreg-vit-l-inet1k.pth |
ViT-Large | 14 | 1024 | 304M | ImageNet-1K |
import timm
import torch
# ViT-Base/16
model = timm.create_model("vit_base_patch16_224", pretrained=False, num_classes=0, dynamic_img_size=True)
state_dict = torch.load("visreg-vit-b-inet1k.pth", map_location="cpu")
model.load_state_dict(state_dict)
# ViT-Large/14
model = timm.create_model("vit_large_patch14_224", pretrained=False, num_classes=0, dynamic_img_size=True)
state_dict = torch.load("visreg-vit-l-inet1k.pth", map_location="cpu")
model.load_state_dict(state_dict)
from huggingface_hub import hf_hub_download
# ViT-Base/16
path = hf_hub_download(repo_id="BooBooWu/visreg", filename="visreg-vit-b-inet1k.pth")
# ViT-Large/14
path = hf_hub_download(repo_id="BooBooWu/visreg", filename="visreg-vit-l-inet1k.pth")
from PIL import Image
from torchvision import transforms
transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
img = transform(Image.open("image.jpg")).unsqueeze(0)
with torch.no_grad():
features = model(img) # [1, embed_dim]
Full evaluation suite (linear probe, segmentation, fine-tuning) is available in the GitHub repo:
# Linear probe on 10+ datasets
python downstream/linear_prob/run_evaluation.py \
--checkpoint visreg-vit-b-inet1k.pth \
--model vit_b \
--datasets all
@inproceedings{wu2026visreg,
title = {VISReg: Variance-Invariance-Sketching Regularization for JEPA training},
author = {Wu, Haiyu and Balestriero, Randall and Levine, Morgan},
booktitle = {arXiv},
year = {2026}
}
This project (code and pretrained weights) is released under CC BY-NC 4.0 for non-commercial use only.