Dongsung commited on
Commit
dc139a4
1 Parent(s): 1553ef4

Update model description

Browse files
Files changed (1) hide show
  1. README.md +8 -0
README.md CHANGED
@@ -6,6 +6,14 @@ This model contains just the `IPUConfig` files for running the ViT base model (e
6
 
7
  **This model contains no model weights, only an IPUConfig.**
8
 
 
 
 
 
 
 
 
 
9
  ## Usage
10
 
11
  ```
 
6
 
7
  **This model contains no model weights, only an IPUConfig.**
8
 
9
+ ## Model description
10
+
11
+ The Vision Transformer (ViT) is a model for image recognition that employs a Transformer-like architecture over patches of the image which was widely used for NLP pretraining.
12
+
13
+ It uses a standard Transformer encoder as used in NLP and simple, yet scalable, strategy works surprisingly well when coupled with pre-training on large amounts of dataset and tranferred to multiple size image recognition benchmarks while requiring substantially fewer computational resources to train.
14
+
15
+ Paper link : [AN IMAGE IS WORTH 16X16 WORDS:TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE'](https://arxiv.org/pdf/2010.11929.pdf)
16
+
17
  ## Usage
18
 
19
  ```