pablorodriper commited on
Commit
a0423c1
1 Parent(s): f95a99f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -21
README.md CHANGED
@@ -1,33 +1,48 @@
1
  ---
 
 
 
 
 
 
 
 
2
  library_name: keras
3
  ---
4
 
5
- ## Model description
6
 
7
- More information needed
8
 
9
- ## Intended uses & limitations
 
10
 
11
- More information needed
 
12
 
13
- ## Training and evaluation data
 
 
 
 
 
 
 
14
 
15
- More information needed
 
 
16
 
17
- ## Training procedure
 
18
 
19
- ### Training hyperparameters
20
-
21
- The following hyperparameters were used during training:
22
-
23
- | Hyperparameters | Value |
24
- | :-- | :-- |
25
- | name | Adam |
26
- | learning_rate | 9.999999747378752e-05 |
27
- | decay | 0.0 |
28
- | beta_1 | 0.8999999761581421 |
29
- | beta_2 | 0.9990000128746033 |
30
- | epsilon | 1e-07 |
31
- | amsgrad | False |
32
- | training_precision | float32 |
33
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Video Vision Transformer on medmnist
3
+ emoji: 🧑‍⚕️
4
+ colorFrom: red
5
+ colorTo: green
6
+ sdk: gradio
7
+ app_file: app.py
8
+ pinned: false
9
+ license: apache-2.0
10
  library_name: keras
11
  ---
12
 
13
+ ## Keras Implementation of Video Vision Transformer on medmnist
14
 
15
+ This repo contains the model [to this Keras example on Video Vision Transformer](https://keras.io/examples/vision/vivit/).
16
 
17
+ ## Background Information
18
+ This example implements [ViViT: A Video Vision Transformer](https://arxiv.org/abs/2103.15691) by Arnab et al., a pure Transformer-based model for video classification. The authors propose a novel embedding scheme and a number of Transformer variants to model video clips.
19
 
20
+ ## Datasets
21
+ We use the [MedMNIST v2: A Large-Scale Lightweight Benchmark for 2D and 3D Biomedical Image Classification](https://medmnist.com/) dataset.
22
 
23
+ ## Training Parameters
24
+ ```
25
+ # DATA
26
+ DATASET_NAME = "organmnist3d"
27
+ BATCH_SIZE = 32
28
+ AUTO = tf.data.AUTOTUNE
29
+ INPUT_SHAPE = (28, 28, 28, 1)
30
+ NUM_CLASSES = 11
31
 
32
+ # OPTIMIZER
33
+ LEARNING_RATE = 1e-4
34
+ WEIGHT_DECAY = 1e-5
35
 
36
+ # TRAINING
37
+ EPOCHS = 80
38
 
39
+ # TUBELET EMBEDDING
40
+ PATCH_SIZE = (8, 8, 8)
41
+ NUM_PATCHES = (INPUT_SHAPE[0] // PATCH_SIZE[0]) ** 2
 
 
 
 
 
 
 
 
 
 
 
42
 
43
+ # ViViT ARCHITECTURE
44
+ LAYER_NORM_EPS = 1e-6
45
+ PROJECTION_DIM = 128
46
+ NUM_HEADS = 8
47
+ NUM_LAYERS = 8
48
+ ```