Alara Dirik commited on
Commit
e81946e
1 Parent(s): b16489b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -0
README.md CHANGED
@@ -1,6 +1,8 @@
1
  ---
 
2
  tags:
3
  - vision
 
4
  ---
5
 
6
  # Model Card: OWL-ViT
@@ -77,3 +79,14 @@ We primarily imagine the model will be used by researchers to better understand
77
  ## Data
78
 
79
  The CLIP backbone of the model was trained on publicly available image-caption data. This was done through a combination of crawling a handful of websites and using commonly-used pre-existing image datasets such as [YFCC100M](http://projects.dfki.uni-kl.de/yfcc100m/). A large portion of the data comes from our crawling of the internet. This means that the data is more representative of people and societies most connected to the internet. The prediction heads of OWL-ViT, along with the CLIP backbone, are fine-tuned on publicly available object detection datasets such as [COCO](https://cocodataset.org/#home) and [OpenImages](https://storage.googleapis.com/openimages/web/index.html).
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
  tags:
4
  - vision
5
+ - object-detection
6
  ---
7
 
8
  # Model Card: OWL-ViT
 
79
  ## Data
80
 
81
  The CLIP backbone of the model was trained on publicly available image-caption data. This was done through a combination of crawling a handful of websites and using commonly-used pre-existing image datasets such as [YFCC100M](http://projects.dfki.uni-kl.de/yfcc100m/). A large portion of the data comes from our crawling of the internet. This means that the data is more representative of people and societies most connected to the internet. The prediction heads of OWL-ViT, along with the CLIP backbone, are fine-tuned on publicly available object detection datasets such as [COCO](https://cocodataset.org/#home) and [OpenImages](https://storage.googleapis.com/openimages/web/index.html).
82
+
83
+ ### BibTeX entry and citation info
84
+
85
+ ```bibtex
86
+ @article{minderer2022simple,
87
+ title={Simple Open-Vocabulary Object Detection with Vision Transformers},
88
+ author={Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, Neil Houlsby},
89
+ journal={arXiv preprint arXiv:2205.06230},
90
+ year={2022},
91
+ }
92
+ ```