google
/

owlvit-base-patch32

Zero-Shot Object Detection

Model card Files Files and versions Community

Alara Dirik commited on Aug 5, 2022

Commit

e81946e

·

1 Parent(s): b16489b

Update README.md

Files changed (1) hide show

README.md +13 -0

README.md CHANGED Viewed

@@ -1,6 +1,8 @@
 ---
 tags:
 - vision
 ---
 # Model Card: OWL-ViT
@@ -77,3 +79,14 @@ We primarily imagine the model will be used by researchers to better understand
 ## Data
 The CLIP backbone of the model was trained on publicly available image-caption data. This was done through a combination of crawling a handful of websites and using commonly-used pre-existing image datasets such as [YFCC100M](http://projects.dfki.uni-kl.de/yfcc100m/). A large portion of the data comes from our crawling of the internet. This means that the data is more representative of people and societies most connected to the internet. The prediction heads of OWL-ViT, along with the CLIP backbone, are fine-tuned on publicly available object detection datasets such as [COCO](https://cocodataset.org/#home) and [OpenImages](https://storage.googleapis.com/openimages/web/index.html).

 ---
+license: apache-2.0
 tags:
 - vision
+- object-detection
 ---
 # Model Card: OWL-ViT
 ## Data
 The CLIP backbone of the model was trained on publicly available image-caption data. This was done through a combination of crawling a handful of websites and using commonly-used pre-existing image datasets such as [YFCC100M](http://projects.dfki.uni-kl.de/yfcc100m/). A large portion of the data comes from our crawling of the internet. This means that the data is more representative of people and societies most connected to the internet. The prediction heads of OWL-ViT, along with the CLIP backbone, are fine-tuned on publicly available object detection datasets such as [COCO](https://cocodataset.org/#home) and [OpenImages](https://storage.googleapis.com/openimages/web/index.html).
+### BibTeX entry and citation info
+```bibtex
+@article{minderer2022simple,
+  title={Simple Open-Vocabulary Object Detection with Vision Transformers},
+  author={Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, Neil Houlsby},
+  journal={arXiv preprint arXiv:2205.06230},
+  year={2022},
+}
+```