bbexx commited on
Commit
0c0aebc
1 Parent(s): 4ad0409
Files changed (1) hide show
  1. README.md +11 -4
README.md CHANGED
@@ -1,4 +1,12 @@
1
- # [ViTamin: Design Scalable Vision Models in the Vision-language Era](https://arxiv.org/pdf/2404.02132.pdf)
 
 
 
 
 
 
 
 
2
  Official huggingface models of **ViTamin**, from the following paper:
3
 
4
  [ViTamin: Design Scalable Vision Models in the Vision-language Era](https://arxiv.org/pdf/2404.02132.pdf).\
@@ -6,7 +14,7 @@ Official huggingface models of **ViTamin**, from the following paper:
6
  🏠  Johns Hopkins University, Bytedance
7
 
8
 
9
- Load from HuggingFace:
10
  ```python
11
  import torch
12
  import open_clip
@@ -31,8 +39,7 @@ with torch.no_grad(), torch.cuda.amp.autocast():
31
  image_features, text_features, logit_scale = model(pixel_values, text)
32
  text_probs = (100.0 * image_features @ text_features.to(torch.float).T).softmax(dim=-1)
33
 
34
- print("Label probs:", text_probs)
35
-
36
  ```
37
 
38
  ## Main Results with CLIP Pre-training on DataComp-1B
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - mlfoundations/datacomp_1b
5
+ pipeline_tag: feature-extraction
6
+ ---
7
+
8
+ # Model card for ViTamin-XL-336px
9
+
10
  Official huggingface models of **ViTamin**, from the following paper:
11
 
12
  [ViTamin: Design Scalable Vision Models in the Vision-language Era](https://arxiv.org/pdf/2404.02132.pdf).\
 
14
  🏠  Johns Hopkins University, Bytedance
15
 
16
 
17
+ Load from HuggingFace with transformers.AutoModel:
18
  ```python
19
  import torch
20
  import open_clip
 
39
  image_features, text_features, logit_scale = model(pixel_values, text)
40
  text_probs = (100.0 * image_features @ text_features.to(torch.float).T).softmax(dim=-1)
41
 
42
+ print("Label probs:", text_probs)
 
43
  ```
44
 
45
  ## Main Results with CLIP Pre-training on DataComp-1B