nielsr HF staff commited on
Commit
ad5d7e3
1 Parent(s): 89c23fb

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -0
README.md ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ datasets:
5
+ - imagenet-21k
6
+ ---
7
+
8
+ # Vision-and-Language Transformer (ViLT), fine-tuned on VQAv2
9
+
10
+ Vision-and-Language Transformer (ViLT) model fine-tuned on [VQAv2](). It was introduced in the paper [ViLT: Vision-and-Language Transformer
11
+ Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by Kim et al. and first released in [this repository](https://github.com/dandelin/ViLT).
12
+
13
+ Disclaimer: The team releasing ViLT did not write a model card for this model so this model card has been written by the Hugging Face team.
14
+
15
+ ## Model description
16
+
17
+ (to do)
18
+
19
+ ## Intended uses & limitations
20
+
21
+ You can use the raw model for visual question answering.
22
+
23
+ ### How to use
24
+
25
+ (to do)
26
+
27
+ ## Training data
28
+
29
+ (to do)
30
+
31
+ ## Training procedure
32
+
33
+ ### Preprocessing
34
+
35
+ (to do)
36
+
37
+ ### Pretraining
38
+
39
+ (to do)
40
+
41
+ ## Evaluation results
42
+
43
+ (to do)
44
+
45
+ ### BibTeX entry and citation info
46
+
47
+ ```bibtex
48
+ @misc{kim2021vilt,
49
+ title={ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision},
50
+ author={Wonjae Kim and Bokyung Son and Ildoo Kim},
51
+ year={2021},
52
+ eprint={2102.03334},
53
+ archivePrefix={arXiv},
54
+ primaryClass={stat.ML}
55
+ }
56
+ ```