keras-io
/

video-vision-transformer

Model card Files Files and versions Community

video-vision-transformer / README.md

vumichien's picture

Set `library_name` to `tf-keras`. (#2)

78ed3e3 verified 10 months ago

|

history blame contribute delete

1.26 kB

	---
	library_name: tf-keras
	license: apache-2.0
	title: Video Vision Transformer on medmnist
	emoji: 🧑‍⚕️
	colorFrom: red
	colorTo: green
	sdk: gradio
	app_file: app.py
	pinned: false
	---

	## Keras Implementation of Video Vision Transformer on medmnist

	This repo contains the model [to this Keras example on Video Vision Transformer](https://keras.io/examples/vision/vivit/).

	## Background Information
	This example implements [ViViT: A Video Vision Transformer](https://arxiv.org/abs/2103.15691) by Arnab et al., a pure Transformer-based model for video classification. The authors propose a novel embedding scheme and a number of Transformer variants to model video clips.

	## Datasets
	We use the [MedMNIST v2: A Large-Scale Lightweight Benchmark for 2D and 3D Biomedical Image Classification](https://medmnist.com/) dataset.

	## Training Parameters
	```
	# DATA
	DATASET_NAME = "organmnist3d"
	BATCH_SIZE = 32
	AUTO = tf.data.AUTOTUNE
	INPUT_SHAPE = (28, 28, 28, 1)
	NUM_CLASSES = 11

	# OPTIMIZER
	LEARNING_RATE = 1e-4
	WEIGHT_DECAY = 1e-5

	# TRAINING
	EPOCHS = 80

	# TUBELET EMBEDDING
	PATCH_SIZE = (8, 8, 8)
	NUM_PATCHES = (INPUT_SHAPE[0] // PATCH_SIZE[0]) ** 2

	# ViViT ARCHITECTURE
	LAYER_NORM_EPS = 1e-6
	PROJECTION_DIM = 128
	NUM_HEADS = 8
	NUM_LAYERS = 8
	```