satpalsr commited on
Commit
37ad0ba
1 Parent(s): 5f29925

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -4
README.md CHANGED
@@ -3,19 +3,29 @@ library_name: keras
3
  ---
4
 
5
  ## Model description
6
-
7
- More information needed
 
 
 
 
 
 
8
 
9
  ## Intended uses & limitations
10
 
11
- More information needed
 
12
 
13
  ## Training and evaluation data
14
 
15
- More information needed
16
 
17
  ## Training procedure
18
 
 
 
 
19
  ### Training hyperparameters
20
 
21
  The following hyperparameters were used during training:
 
3
  ---
4
 
5
  ## Model description
6
+ **This model is implementation of the distillation recipe proposed in DeiT.**
7
+ Visit Keras example on [Distilling Vision Transformers](https://keras.io/examples/vision/deit/).
8
+
9
+ Full credits to: [Sayak Paul](https://twitter.com/RisingSayak)
10
+
11
+ In the original Vision Transformers (ViT) paper (Dosovitskiy et al.), the authors concluded that to perform on par with Convolutional Neural Networks (CNNs), ViTs need to be pre-trained on larger datasets. The larger the better. This is mainly due to the lack of inductive biases in the ViT architecture -- unlike CNNs, they don't have layers that exploit locality.
12
+
13
+ Many groups have proposed different ways to deal with the problem of data-intensiveness of ViT training. One such way was shown in the Data-efficient image Transformers, (DeiT) paper (Touvron et al.). The authors introduced a distillation technique that is specific to transformer-based vision models. DeiT is among the first works to show that it's possible to train ViTs well without using larger datasets.
14
 
15
  ## Intended uses & limitations
16
 
17
+ The model is trained for demonstrative purposes and does not guarantee the best results in production.
18
+ For better results, follow & optimize the [Keras example](https://keras.io/examples/vision/deit/) as per your need.
19
 
20
  ## Training and evaluation data
21
 
22
+ The model is trained and evaluated on [TF Flowers dataset](https://www.tensorflow.org/datasets/catalog/tf_flowers)
23
 
24
  ## Training procedure
25
 
26
+ Training procedure is followed exactly as from the [keras example](https://keras.io/examples/vision/deit/).
27
+ The batch size is however decreased to 16 from the original 256 for accomodating the model in a single V100 GPU memory.
28
+
29
  ### Training hyperparameters
30
 
31
  The following hyperparameters were used during training: