Alara Dirik commited on
Commit
b3135b5
1 Parent(s): 832b1c3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -1
README.md CHANGED
@@ -1,6 +1,8 @@
1
  ---
 
2
  tags:
3
  - vision
 
4
  ---
5
 
6
  # Model Card: OWL-ViT
@@ -76,4 +78,15 @@ We primarily imagine the model will be used by researchers to better understand
76
 
77
  ## Data
78
 
79
- The CLIP backbone of the model was trained on publicly available image-caption data. This was done through a combination of crawling a handful of websites and using commonly-used pre-existing image datasets such as [YFCC100M](http://projects.dfki.uni-kl.de/yfcc100m/). A large portion of the data comes from our crawling of the internet. This means that the data is more representative of people and societies most connected to the internet. The prediction heads of OWL-ViT, along with the CLIP backbone, are fine-tuned on publicly available object detection datasets such as [COCO](https://cocodataset.org/#home) and [OpenImages](https://storage.googleapis.com/openimages/web/index.html).
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
  tags:
4
  - vision
5
+ - object-detection
6
  ---
7
 
8
  # Model Card: OWL-ViT
 
78
 
79
  ## Data
80
 
81
+ The CLIP backbone of the model was trained on publicly available image-caption data. This was done through a combination of crawling a handful of websites and using commonly-used pre-existing image datasets such as [YFCC100M](http://projects.dfki.uni-kl.de/yfcc100m/). A large portion of the data comes from our crawling of the internet. This means that the data is more representative of people and societies most connected to the internet. The prediction heads of OWL-ViT, along with the CLIP backbone, are fine-tuned on publicly available object detection datasets such as [COCO](https://cocodataset.org/#home) and [OpenImages](https://storage.googleapis.com/openimages/web/index.html).
82
+
83
+ ### BibTeX entry and citation info
84
+
85
+ ```bibtex
86
+ @article{minderer2022simple,
87
+ title={Simple Open-Vocabulary Object Detection with Vision Transformers},
88
+ author={Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, Neil Houlsby},
89
+ journal={arXiv preprint arXiv:2205.06230},
90
+ year={2022},
91
+ }
92
+ ```