nielsr HF staff commited on
Commit
9cfbef7
1 Parent(s): 1148e37
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -1,10 +1,13 @@
1
  ---
2
  license: apache-2.0
 
 
 
3
  datasets:
4
  - imagenet-21k
5
  ---
6
 
7
- # Vision Transformer base model
8
 
9
  Vision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 21,843 classes) at resolution 224x224. It was introduced in the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Dosovitskiy et al. and first released in [this repository](https://github.com/google-research/vision_transformer). However, the weights were converted from the [timm repository](https://github.com/rwightman/pytorch-image-models) by Ross Wightman, who already converted the weights from JAX to PyTorch. Credits go to him.
10
 
 
1
  ---
2
  license: apache-2.0
3
+ tags:
4
+ - image-classification
5
+ - timm
6
  datasets:
7
  - imagenet-21k
8
  ---
9
 
10
+ # Vision Transformer (base-sized model)
11
 
12
  Vision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 21,843 classes) at resolution 224x224. It was introduced in the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Dosovitskiy et al. and first released in [this repository](https://github.com/google-research/vision_transformer). However, the weights were converted from the [timm repository](https://github.com/rwightman/pytorch-image-models) by Ross Wightman, who already converted the weights from JAX to PyTorch. Credits go to him.
13