EdoAbati commited on
Commit
9953f90
1 Parent(s): 169689a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -2
README.md CHANGED
@@ -5,9 +5,13 @@ tags:
5
  - vision
6
  ---
7
 
 
 
8
  ## Model description
9
 
10
- More information needed
 
 
11
 
12
  ## Intended uses & limitations
13
 
@@ -15,7 +19,7 @@ More information needed
15
 
16
  ## Training and evaluation data
17
 
18
- More information needed
19
 
20
  ## Training procedure
21
 
 
5
  - vision
6
  ---
7
 
8
+ # Compact Convolutional Transformers
9
+
10
  ## Model description
11
 
12
+ As discussed in the [Vision Transformers (ViT)](https://arxiv.org/abs/2010.11929) paper, a Transformer-based architecture for vision typically requires a larger dataset than usual, as well as a longer pre-training schedule. ImageNet-1k (which has about a million images) is considered to fall under the medium-sized data regime with respect to ViTs. This is primarily because, unlike CNNs, ViTs (or a typical Transformer-based architecture) do not have well-informed inductive biases (such as convolutions for processing images). This begs the question: can't we combine the benefits of convolution and the benefits of Transformers in a single network architecture? These benefits include parameter-efficiency, and self-attention to process long-range and global dependencies (interactions between different regions in an image).
13
+
14
+ In [Escaping the Big Data Paradigm with Compact Transformers](https://arxiv.org/abs/2104.05704), Hassani et al. present an approach for doing exactly this. They proposed the Compact Convolutional Transformer (CCT) architecture. This example is an implementation of CCT.
15
 
16
  ## Intended uses & limitations
17
 
 
19
 
20
  ## Training and evaluation data
21
 
22
+ The model is trained using the CIFAR-10 dataset.
23
 
24
  ## Training procedure
25