Model Card

Model Details

  • Architecture: ViT-Base with patch size 32
  • Training Data: Standford Cars dataset

Training Details

Adam Optimizer with a constant learning rate 1e-5 for 4000 steps training (batch_size=32). Only the vision encoder is fine-tuned.

Evaluation Results

  • pre-trained: 0.5987
  • fine-tuned: 0.7819

Usage

load vision model

from transformers import CLIPVisionModel

vision_model = CLIPVisionModel.from_pretrained('tanganke/clip-vit-base-patch32_stanford-cars')

substitute the vision encoder of clip

from transformers import CLIPModel

clip_model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
clip_model.vision_model.load_state_dict(vision_model.vision_model.state_dict())
Downloads last month
85,522
Safetensors
Model size
87.5M params
Tensor type
F32
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for tanganke/clip-vit-base-patch32_stanford-cars

Finetuned
(59)
this model

Dataset used to train tanganke/clip-vit-base-patch32_stanford-cars

Spaces using tanganke/clip-vit-base-patch32_stanford-cars 2

Collection including tanganke/clip-vit-base-patch32_stanford-cars