nielsr HF staff commited on
Commit
506d188
1 Parent(s): 800b669

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -0
README.md ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: "mit"
3
+ tags:
4
+ - vision
5
+ - video-classification
6
+ ---
7
+
8
+ # ViViT (Video Vision Transformer)
9
+
10
+ ViViT model as introduced in the paper [ViViT: A Video Vision Transformer](https://arxiv.org/abs/2103.15691) by Arnab et al. and first released in [this repository](https://github.com/google-research/scenic/tree/main/scenic/projects/vivit).
11
+
12
+ Disclaimer: The team releasing ViViT did not write a model card for this model so this model card has been written by the Hugging Face team.
13
+
14
+ ## Model description
15
+
16
+ ViViT is an extension of the [Vision Transformer (ViT)](https://huggingface.co/docs/transformers/v4.27.0/model_doc/vit) to video.
17
+
18
+ We refer to the paper for details.
19
+
20
+ ## Intended uses & limitations
21
+
22
+ The model is mostly meant to intended to be fine-tuned on a downstream task, like video classification. See the [model hub](https://huggingface.co/models?filter=vivit) to look for fine-tuned versions on a task that interests you.
23
+
24
+ ### How to use
25
+
26
+ For code examples, we refer to the [documentation](https://huggingface.co/transformers/main/model_doc/vivit).
27
+
28
+ ### BibTeX entry and citation info
29
+
30
+ ```bibtex
31
+ @misc{arnab2021vivit,
32
+ title={ViViT: A Video Vision Transformer},
33
+ author={Anurag Arnab and Mostafa Dehghani and Georg Heigold and Chen Sun and Mario Lučić and Cordelia Schmid},
34
+ year={2021},
35
+ eprint={2103.15691},
36
+ archivePrefix={arXiv},
37
+ primaryClass={cs.CV}
38
+ }
39
+ ```