AMfeta99 commited on
Commit
e675ac1
1 Parent(s): e50c9ad

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -39,7 +39,10 @@ It achieves the following results on the evaluation set:
39
 
40
  ## Model description
41
 
42
- More information needed
 
 
 
43
 
44
  ## Intended uses & limitations
45
 
 
39
 
40
  ## Model description
41
 
42
+ This model is a fine-tuned version of , which is a Vision Transformer (ViT)
43
+ ViT model is originaly a transformer encoder model pre-trained and fine-tuned on ImageNet 2012.
44
+ It was introduced in the paper "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" by Dosovitskiy et al.
45
+ The model processes images as sequences of 16x16 patches, adding a [CLS] token for classification tasks, and uses absolute position embeddings. Pre-training enables the model to learn rich image representations, which can be leveraged for downstream tasks by adding a linear classifier on top of the [CLS] token. The weights were converted from the timm repository by Ross Wightman.
46
 
47
  ## Intended uses & limitations
48