csmithxc commited on
Commit
3ebca65
1 Parent(s): 5a1343b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -139
README.md CHANGED
@@ -1,77 +1,46 @@
1
  ---
 
 
 
 
 
2
  license: apache-2.0
3
- title: yoloworldtest
4
- pinned: true
 
5
  ---
 
6
  <div align="center">
7
- <img src="./assets/yolo_logo.png" width=60%>
 
 
8
  <br>
9
- <a href="https://scholar.google.com/citations?hl=zh-CN&user=PH8rJHYAAAAJ">Tianheng Cheng</a><sup><span>2,3,*</span></sup>,
10
- <a href="https://linsong.info/">Lin Song</a><sup><span>1,📧,*</span></sup>,
11
- <a href="https://yxgeee.github.io/">Yixiao Ge</a><sup><span>1,🌟,2</span></sup>,
 
12
  <a href="http://eic.hust.edu.cn/professor/liuwenyu/"> Wenyu Liu</a><sup><span>3</span></sup>,
13
- <a href="https://xwcv.github.io/">Xinggang Wang</a><sup><span>3,📧</span></sup>,
14
- <a href="https://scholar.google.com/citations?user=4oXBp9UAAAAJ&hl=en">Ying Shan</a><sup><span>1,2</span></sup>
15
  </br>
16
 
17
- \* Equal contribution 🌟 Project lead 📧 Corresponding author
18
-
19
  <sup>1</sup> Tencent AI Lab, <sup>2</sup> ARC Lab, Tencent PCG
20
  <sup>3</sup> Huazhong University of Science and Technology
21
  <br>
22
  <div>
23
 
24
- [![arxiv paper](https://img.shields.io/badge/Project-Page-green)](https://wondervictor.github.io/)
25
- [![arxiv paper](https://img.shields.io/badge/arXiv-Paper-red)](https://arxiv.org/abs/2401.17270)
26
- <a href="https://colab.research.google.com/github/AILab-CVC/YOLO-World/blob/master/inference.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
27
- [![demo](https://img.shields.io/badge/🤗HugginngFace-Spaces-orange)](https://huggingface.co/spaces/stevengrove/YOLO-World)
28
- [![Replicate](https://replicate.com/zsxkib/yolo-world/badge)](https://replicate.com/zsxkib/yolo-world)
29
- [![hfpaper](https://img.shields.io/badge/🤗HugginngFace-Paper-yellow)](https://huggingface.co/papers/2401.17270)
30
  [![license](https://img.shields.io/badge/License-GPLv3.0-blue)](LICENSE)
31
- [![yoloworldseg](https://img.shields.io/badge/YOLOWorldxEfficientSAM-🤗Spaces-orange)](https://huggingface.co/spaces/SkalskiP/YOLO-World)
32
- [![yologuide](https://img.shields.io/badge/📖Notebook-roboflow-purple)](https://supervision.roboflow.com/develop/notebooks/zero-shot-object-detection-with-yolo-world)
33
- [![deploy](https://media.roboflow.com/deploy.svg)](https://inference.roboflow.com/foundation/yolo_world/)
34
 
35
  </div>
36
  </div>
37
 
38
- ## Notice
39
-
40
- We recommend that everyone **use English to communicate on issues**, as this helps developers from around the world discuss, share experiences, and answer questions together.
41
-
42
- ## 🔥 Updates
43
- `[2024-3-28]:` We provide: (1) more high-resolution pre-trained models (e.g., S, M, X) ([#142](https://github.com/AILab-CVC/YOLO-World/issues/142)); (2) pre-trained models with CLIP-Large text encoders. Most importantly, we preliminarily fix the **fine-tuning without `mask-refine`** and explore a new fine-tuning setting ([#160](https://github.com/AILab-CVC/YOLO-World/issues/160),[#76](https://github.com/AILab-CVC/YOLO-World/issues/76)). In addition, fine-tuning YOLO-World with `mask-refine` also obtains significant improvements, check more details in [configs/finetune_coco](./configs/finetune_coco/).
44
- `[2024-3-16]:` We fix the bugs about the demo ([#110](https://github.com/AILab-CVC/YOLO-World/issues/110),[#94](https://github.com/AILab-CVC/YOLO-World/issues/94),[#129](https://github.com/AILab-CVC/YOLO-World/issues/129), [#125](https://github.com/AILab-CVC/YOLO-World/issues/125)) with visualizations of segmentation masks, and release [**YOLO-World with Embeddings**](./docs/prompt_yolo_world.md), which supports prompt tuning, text prompts and image prompts.
45
- `[2024-3-3]:` We add the **high-resolution YOLO-World**, which supports `1280x1280` resolution with higher accuracy and better performance for small objects!
46
- `[2024-2-29]:` We release the newest version of [ **YOLO-World-v2**](./docs/updates.md) with higher accuracy and faster speed! We hope the community can join us to improve YOLO-World!
47
- `[2024-2-28]:` Excited to announce that YOLO-World has been accepted by **CVPR 2024**! We're continuing to make YOLO-World faster and stronger, as well as making it better to use for all.
48
- `[2024-2-22]:` We sincerely thank [RoboFlow](https://roboflow.com/) and [@Skalskip92](https://twitter.com/skalskip92) for the [**Video Guide**](https://www.youtube.com/watch?v=X7gKBGVz4vs) about YOLO-World, nice work!
49
- `[2024-2-18]:` We thank [@Skalskip92](https://twitter.com/skalskip92) for developing the wonderful segmentation demo via connecting YOLO-World and EfficientSAM. You can try it now at the [🤗 HuggingFace Spaces](https://huggingface.co/spaces/SkalskiP/YOLO-World).
50
- `[2024-2-17]:` The largest model **X** of YOLO-World is released, which achieves better zero-shot performance!
51
- `[2024-2-17]:` We release the code & models for **YOLO-World-Seg** now! YOLO-World now supports open-vocabulary / zero-shot object segmentation!
52
- `[2024-2-15]:` The pre-traind YOLO-World-L with CC3M-Lite is released!
53
- `[2024-2-14]:` We provide the [`image_demo`](demo.py) for inference on images or directories.
54
- `[2024-2-10]:` We provide the [fine-tuning](./docs/finetuning.md) and [data](./docs/data.md) details for fine-tuning YOLO-World on the COCO dataset or the custom datasets!
55
- `[2024-2-3]:` We support the `Gradio` demo now in the repo and you can build the YOLO-World demo on your own device!
56
- `[2024-2-1]:` We've released the code and weights of YOLO-World now!
57
- `[2024-2-1]:` We deploy the YOLO-World demo on [HuggingFace 🤗](https://huggingface.co/spaces/stevengrove/YOLO-World), you can try it now!
58
- `[2024-1-31]:` We are excited to launch **YOLO-World**, a cutting-edge real-time open-vocabulary object detector.
59
-
60
 
61
- ## TODO
62
 
63
- YOLO-World is under active development and please stay tuned ☕️!
64
- If you have suggestions📃 or ideas💡,**we would love for you to bring them up in the [Roadmap](https://github.com/AILab-CVC/YOLO-World/issues/109)** ❤️!
65
- > YOLO-World 目前正在积极开发中📃,如果你有建议或者想法💡,**我们非常希望您在 [Roadmap](https://github.com/AILab-CVC/YOLO-World/issues/109) 中提出来** ❤️!
66
 
67
- ## [FAQ (Frequently Asked Questions)](https://github.com/AILab-CVC/YOLO-World/discussions/149)
68
-
69
- We have set up an FAQ about YOLO-World in the discussion on GitHub. We hope everyone can raise issues or solutions during use here, and we also hope that everyone can quickly find solutions from it.
70
-
71
- > 我们在GitHub的discussion中建立了关于YOLO-World的常见问答,这里将收集一些常见问题,同时大家可以在此提出使用中的问题或者解决方案,也希望大家能够从中快速寻找到解决方案
72
-
73
-
74
- ## Highlights & Introduction
75
 
76
  This repo contains the PyTorch implementation, pre-trained weights, and pre-training/fine-tuning code for YOLO-World.
77
 
@@ -79,51 +48,36 @@ This repo contains the PyTorch implementation, pre-trained weights, and pre-trai
79
 
80
  * YOLO-World is the next-generation YOLO detector, with a strong open-vocabulary detection capability and grounding ability.
81
 
82
- * YOLO-World presents a *prompt-then-detect* paradigm for efficient user-vocabulary inference, which re-parameterizes vocabulary embeddings as parameters into the model and achieve superior inference speed. You can try to export your own detection model without extra training or fine-tuning in our [online demo](https://huggingface.co/spaces/stevengrove/YOLO-World)!
83
 
84
 
85
  <center>
86
  <img width=800px src="./assets/yolo_arch.png">
87
  </center>
88
 
89
- ## Model Zoo
90
 
91
- We've pre-trained YOLO-World-S/M/L from scratch and evaluate on the `LVIS val-1.0` and `LVIS minival`. We provide the pre-trained model weights and training logs for applications/research or re-producing the results.
92
 
93
- ### Zero-shot Inference on LVIS dataset
94
 
95
- <div><font size=2>
96
-
97
- | model | Pre-train Data | Size | AP<sup>mini</su> | AP<sub>r</sub> | AP<sub>c</sub> | AP<sub>f</sub> | AP<sup>val</su> | AP<sub>r</sub> | AP<sub>c</sub> | AP<sub>f</sub> | weights |
98
- | :------------------------------------------------------------------------------------------------------------------- | :------------------- | :----------------- | :--------------: | :------------: | :------------: | :------------: | :-------------: | :------------: | :------------: | :------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
99
- | [YOLO-Worldv2-S](./configs/pretrain/yolo_world_v2_s_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py) | O365+GoldG | 640 | 22.7 | 16.3 | 20.8 | 25.5 | 17.3 | 11.3 | 14.9 | 22.7 |[HF Checkpoints 🤗](https://huggingface.co/wondervictor/YOLO-World/blob/main/yolo_world_v2_s_obj365v1_goldg_pretrain-55b943ea.pth)|
100
- | [YOLO-Worldv2-S](./configs/pretrain/yolo_world_v2_s_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_1280ft_lvis_minival.py) | O365+GoldG | 1280&#x1F538; | 24.1 | 18.7 | 22.0 | 26.9 | 18.8 | 14.1 | 16.3 | 23.8 |[HF Checkpoints 🤗](https://huggingface.co/wondervictor/YOLO-World/blob/main/yolo_world_v2_s_obj365v1_goldg_pretrain_1280ft-fc4ff4f7.pth)|
101
- | [YOLO-Worldv2-M](./configs/pretrain/yolo_world_v2_m_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py) | O365+GoldG | 640 | 30.0 | 25.0 | 27.2 | 33.4 | 23.5 | 17.1 | 20.0 | 30.1 | [HF Checkpoints 🤗](https://huggingface.co/wondervictor/YOLO-World/blob/main/yolo_world_v2_m_obj365v1_goldg_pretrain-c6237d5b.pth)|
102
- | [YOLO-Worldv2-M](./configs/pretrain/yolo_world_v2_m_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_1280ft_lvis_minival.py) | O365+GoldG | 1280&#x1F538; | 31.6 | 24.5 | 29.0 | 35.1 | 25.3 | 19.3 | 22.0 | 31.7 | [HF Checkpoints 🤗](https://huggingface.co/wondervictor/YOLO-World/blob/main/yolo_world_v2_m_obj365v1_goldg_pretrain_1280ft-77d0346d.pth)|
103
- | [YOLO-Worldv2-L](./configs/pretrain/yolo_world_v2_l_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py) | O365+GoldG | 640 | 33.0 | 22.6 | 32.0 | 35.8 | 26.0 | 18.6 | 23.0 | 32.6 | [HF Checkpoints 🤗](https://huggingface.co/wondervictor/YOLO-World/blob/main/yolo_world_v2_l_obj365v1_goldg_pretrain-a82b1fe3.pth)|
104
- | [YOLO-Worldv2-L](./configs/pretrain/yolo_world_v2_l_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_1280ft_lvis_minival.py) | O365+GoldG | 1280&#x1F538; | 34.6 | 29.2 | 32.8 | 37.2 | 27.6 | 21.9 | 24.2 | 34.0 | [HF Checkpoints 🤗](https://huggingface.co/wondervictor/YOLO-World/blob/main/yolo_world_v2_l_obj365v1_goldg_pretrain_1280ft-9babe3f6.pth)|
105
- | [YOLO-Worldv2-L (CLIP-Large)](./configs/pretrain/yolo_world_v2_l_clip_large_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py) 🔥 | O365+GoldG | 640 | 34.0 | 22.0 | 32.6 | 37.4 | 27.1 | 19.9 | 23.9 | 33.9 | [HF Checkpoints 🤗](https://huggingface.co/wondervictor/YOLO-World/blob/main/yolo_world_v2_l_clip_large_o365v1_goldg_pretrain-8ff2e744.pth)|
106
- | [YOLO-Worldv2-L (CLIP-Large)](./configs/pretrain/yolo_world_v2_l_clip_large_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_800ft_lvis_minival.py) 🔥 | O365+GoldG | 800&#x1F538; | 35.5 | 28.3 | 33.2 | 38.8 | 28.6 | 22.0 | 25.1 | 35.4 | [HF Checkpoints 🤗](https://huggingface.co/wondervictor/YOLO-World/blob/main/yolo_world_v2_l_clip_large_o365v1_goldg_pretrain_800ft-9df82e55.pth)|
107
- | [YOLO-Worldv2-L](./configs/pretrain/yolo_world_v2_l_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py) | O365+GoldG+CC3M-Lite | 640 | 32.9 | 25.3 | 31.1 | 35.8 | 26.1 | 20.6 | 22.6 | 32.3 | [HF Checkpoints 🤗](https://huggingface.co/wondervictor/YOLO-World/blob/main/yolo_world_v2_l_obj365v1_goldg_cc3mlite_pretrain-ca93cd1f.pth)|
108
- | [YOLO-Worldv2-X](./configs/pretrain/yolo_world_v2_x_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py) | O365+GoldG+CC3M-Lite | 640 | 35.4 | 28.7 | 32.9 | 38.7 | 28.4 | 20.6 | 25.6 | 35.0 | [HF Checkpoints 🤗](https://huggingface.co/wondervictor/YOLO-World/blob/main/yolo_world_v2_x_obj365v1_goldg_cc3mlite_pretrain-8698fbfa.pth) |
109
- | [YOLO-Worldv2-XL](./configs/pretrain/yolo_world_v2_xl_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py) | O365+GoldG+CC3M-Lite | 640 | 36.0 | 25.8 | 34.1 | 39.5 | 29.1 | 21.1 | 26.3 | 35.8 | [HF Checkpoints 🤗](https://huggingface.co/wondervictor/YOLO-World/blob/main/yolo_world_v2_x_obj365v1_goldg_cc3mlite_pretrain-8698fbfa.pth) |
110
-
111
- </font>
112
- </div>
113
 
114
- **NOTE:**
115
- 1. AP<sup>mini</sup>: evaluated on LVIS `minival`.
116
- 3. AP<sup>val</sup>: evaluated on LVIS `val 1.0`.
117
- 4. [HuggingFace Mirror](https://hf-mirror.com/) provides the mirror of HuggingFace, which is a choice for users who are unable to reach.
118
- 5. &#x1F538;: fine-tuning models with the pre-trained data.
 
119
 
120
- **Pre-training Logs:**
121
 
122
- We provide the pre-training logs of `YOLO-World-v2`. Due to the unexpected errors of the local machines, the training might be interrupted several times.
 
 
 
 
123
 
124
- | Model | YOLO-World-v2-S | YOLO-World-v2-M | YOLO-World-v2-L | YOLO-World-v2-X |
125
- | :--- | :-------------: | :--------------: | :-------------: | :-------------: |
126
- |Pre-training Log | [Part-1](https://drive.google.com/file/d/1oib7pKfA2h1U_5-85H_s0Nz8jWd0R-WP/view?usp=drive_link), [Part-2](https://drive.google.com/file/d/11cZ6OZy80VTvBlZy3kzLAHCxx5Iix5-n/view?usp=drive_link) | [Part-1](https://drive.google.com/file/d/1E6vYSS8kBipGc8oQnsjAfeUAx8I9yOX7/view?usp=drive_link), [Part-2](https://drive.google.com/file/d/1fbM7vt2tgSeB8o_7tUDofWvpPNSViNj5/view?usp=drive_link) | [Part-1](https://drive.google.com/file/d/1Tola1QGJZTL6nGy3SBxKuknfNfREDm8J/view?usp=drive_link), [Part-2](https://drive.google.com/file/d/1mTBXniioUb0CdctCG4ckIU6idGo0NnH8/view?usp=drive_link) | [Final part](https://drive.google.com/file/d/1aEUA_EPQbXOrpxHTQYB6ieGXudb1PLpd/view?usp=drive_link)|
127
 
128
 
129
  ## Getting started
@@ -132,16 +86,19 @@ We provide the pre-training logs of `YOLO-World-v2`. Due to the unexpected error
132
 
133
  YOLO-World is developed based on `torch==1.11.0` `mmyolo==0.6.0` and `mmdetection==3.0.0`.
134
 
135
- #### Clone Project
136
-
137
  ```bash
138
- git clone --recursive https://github.com/AILab-CVC/YOLO-World.git
139
- ```
140
- #### Install
 
 
 
 
 
 
 
 
141
 
142
- ```bash
143
- pip install torch wheel -q
144
- pip install -e .
145
  ```
146
 
147
  ### 2. Preparing Data
@@ -162,7 +119,7 @@ chmod +x tools/dist_train.sh
162
  ```
163
  **NOTE:** YOLO-World is pre-trained on 4 nodes with 8 GPUs per node (32 GPUs in total). For pre-training, the `node_rank` and `nnodes` for multi-node training should be specified.
164
 
165
- Evaluating YOLO-World is also easy:
166
 
167
  ```bash
168
  chmod +x tools/dist_test.sh
@@ -171,66 +128,26 @@ chmod +x tools/dist_test.sh
171
 
172
  **NOTE:** We mainly evaluate the performance on LVIS-minival for pre-training.
173
 
174
- ## Fine-tuning YOLO-World
175
-
176
- We provide the details about fine-tuning YOLO-World in [docs/fine-tuning](./docs/finetuning.md).
177
-
178
  ## Deployment
179
 
180
  We provide the details about deployment for downstream applications in [docs/deployment](./docs/deploy.md).
181
- You can directly download the ONNX model through the online [demo](https://huggingface.co/spaces/stevengrove/YOLO-World) in Huggingface Spaces 🤗.
182
-
183
- ## Demo
184
-
185
- ### Gradio Demo
186
-
187
- We provide the [Gradio](https://www.gradio.app/) demo for local devices:
188
-
189
- ```bash
190
- pip install gradio==4.16.0
191
- python demo.py path/to/config path/to/weights
192
- ```
193
-
194
- Additionaly, you can use a Dockerfile to build an image with gradio. As a prerequisite, make sure you have respective drivers installed alongside [nvidia-container-runtime](https://stackoverflow.com/questions/59691207/docker-build-with-nvidia-runtime). Replace MODEL_NAME and WEIGHT_NAME with the respective values or ommit this and use default values from the [Dockerfile](Dockerfile#3)
195
-
196
- ```bash
197
- docker build --build-arg="MODEL=MODEL_NAME" --build-arg="WEIGHT=WEIGHT_NAME" -t yolo_demo .
198
- docker run --runtime nvidia -p 8080:8080
199
- ```
200
-
201
- ### Image Demo
202
-
203
- We provide a simple image demo for inference on images with visualization outputs.
204
-
205
- ```bash
206
- python image_demo.py path/to/config path/to/weights image/path/directory 'person,dog,cat' --topk 100 --threshold 0.005 --output-dir demo_outputs
207
- ```
208
-
209
- **Notes:**
210
- * The `image` can be a directory or a single image.
211
- * The `texts` can be a string of categories (noun phrases) which is separated by a comma. We also support `txt` file in which each line contains a category ( noun phrases).
212
- * The `topk` and `threshold` control the number of predictions and the confidence threshold.
213
-
214
- ### Google Golab Notebook
215
-
216
- We sincerely thank [Onuralp](https://github.com/onuralpszr) for sharing the [Colab Demo](https://colab.research.google.com/drive/1F_7S5lSaFM06irBCZqjhbN7MpUXo6WwO?usp=sharing), you can have a try 😊!
217
-
218
 
219
  ## Acknowledgement
220
 
221
- We sincerely thank [mmyolo](https://github.com/open-mmlab/mmyolo), [mmdetection](https://github.com/open-mmlab/mmdetection), [GLIP](https://github.com/microsoft/GLIP), and [transformers](https://github.com/huggingface/transformers) for providing their wonderful code to the community!
222
 
223
  ## Citations
224
  If you find YOLO-World is useful in your research or applications, please consider giving us a star 🌟 and citing it.
225
 
226
  ```bibtex
227
- @inproceedings{Cheng2024YOLOWorld,
228
  title={YOLO-World: Real-Time Open-Vocabulary Object Detection},
229
  author={Cheng, Tianheng and Song, Lin and Ge, Yixiao and Liu, Wenyu and Wang, Xinggang and Shan, Ying},
230
- booktitle={Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR)},
231
  year={2024}
232
  }
233
  ```
234
 
235
  ## Licence
236
- YOLO-World is under the GPL-v3 Licence and is supported for comercial usage.
 
1
  ---
2
+ title: YOLO World
3
+ emoji: 🔥
4
+ colorFrom: pink
5
+ colorTo: blue
6
+ pinned: false
7
  license: apache-2.0
8
+ app_file: app.py
9
+ sdk: gradio
10
+ sdk_version: 4.16.0
11
  ---
12
+
13
  <div align="center">
14
+ <center>
15
+ <img width=500px src="./assets/yolo_logo.png">
16
+ </center>
17
  <br>
18
+ <a href="https://scholar.google.com/citations?hl=zh-CN&user=PH8rJHYAAAAJ">Tianheng Cheng*</a><sup><span>2,3</span></sup>,
19
+ <a href="https://linsong.info/">Lin Song*</a><sup><span>1</span></sup>,
20
+ <a href="">Yixiao Ge</a><sup><span>1,2</span></sup>,
21
+ <a href="">Xinggang Wang</a><sup><span>3</span></sup>,
22
  <a href="http://eic.hust.edu.cn/professor/liuwenyu/"> Wenyu Liu</a><sup><span>3</span></sup>,
23
+ <a href="">Ying Shan</a><sup><span>1,2</span></sup>
 
24
  </br>
25
 
 
 
26
  <sup>1</sup> Tencent AI Lab, <sup>2</sup> ARC Lab, Tencent PCG
27
  <sup>3</sup> Huazhong University of Science and Technology
28
  <br>
29
  <div>
30
 
31
+ [![arxiv paper](https://img.shields.io/badge/arXiv-Paper-red)](https://arxiv.org/abs/)
32
+ [![video](https://img.shields.io/badge/🤗HugginngFace-Spaces-orange)](https://huggingface.co/)
 
 
 
 
33
  [![license](https://img.shields.io/badge/License-GPLv3.0-blue)](LICENSE)
 
 
 
34
 
35
  </div>
36
  </div>
37
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
 
39
+ ## Updates
40
 
41
+ `[2024-1-25]:` We are excited to launch **YOLO-World**, a cutting-edge real-time open-vocabulary object detector.
 
 
42
 
43
+ ## Highlights
 
 
 
 
 
 
 
44
 
45
  This repo contains the PyTorch implementation, pre-trained weights, and pre-training/fine-tuning code for YOLO-World.
46
 
 
48
 
49
  * YOLO-World is the next-generation YOLO detector, with a strong open-vocabulary detection capability and grounding ability.
50
 
51
+ * YOLO-World presents a *prompt-then-detect* paradigm for efficient user-vocabulary inference, which re-parameterizes vocabulary embeddings as parameters into the model and achieve superior inference speed. You can try to export your own detection model without extra training or fine-tuning in our [online demo]()!
52
 
53
 
54
  <center>
55
  <img width=800px src="./assets/yolo_arch.png">
56
  </center>
57
 
 
58
 
59
+ ## Abstract
60
 
61
+ The You Only Look Once (YOLO) series of detectors have established themselves as efficient and practical tools. However, their reliance on predefined and trained object categories limits their applicability in open scenarios. Addressing this limitation, we introduce YOLO-World, an innovative approach that enhances YOLO with open-vocabulary detection capabilities through vision-language modeling and pre-training on large-scale datasets. Specifically, we propose a new Re-parameterizable Vision-Language Path Aggregation Network (RepVL-PAN) and region-text contrastive loss to facilitate the interaction between visual and linguistic information. Our method excels in detecting a wide range of objects in a zero-shot manner with high efficiency. On the challenging LVIS dataset, YOLO-World achieves 35.4 AP with 52.0 FPS on V100, which outperforms many state-of-the-art methods in terms of both accuracy and speed. Furthermore, the fine-tuned YOLO-World achieves remarkable performance on several downstream tasks, including object detection and open-vocabulary instance segmentation.
62
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
 
64
+ ## Demo
65
+
66
+
67
+ ## Main Results
68
+
69
+ We've pre-trained YOLO-World-S/M/L from scratch and evaluate on the `LVIS val-1.0` and `LVIS minival`. We provide the pre-trained model weights and training logs for applications/research or re-producing the results.
70
 
71
+ ### Zero-shot Inference on LVIS dataset
72
 
73
+ | model | Pre-train Data | AP | AP<sub>r</sub> | AP<sub>c</sub> | AP<sub>f</sub> | FPS(V100) | weights | log |
74
+ | :---- | :------------- | :-:| :------------: |:-------------: | :-------: | :-----: | :---: | :---: |
75
+ | [YOLO-World-S](./configs/pretrain/yolo_world_s_t2i_bn_2e-4_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py) | O365+GoldG | 17.6 | 11.9 | 14.5 | 23.2 | - | [wecom](https://drive.weixin.qq.com/s?k=AJEAIQdfAAoREsieRl) | [log]() |
76
+ | [YOLO-World-M](./configs/pretrain/yolo_world_m_t2i_bn_2e-4_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py) | O365+GoldG | 23.5 | 17.2 | 20.4 | 29.6 | - | [wecom](https://drive.weixin.qq.com/s?k=AJEAIQdfAAoj0byBC0) | [log]() |
77
+ | [YOLO-World-L](./configs/pretrain/yolo_world_l_t2i_bn_2e-4_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py) | O365+GoldG | 25.7 | 18.7 | 22.6 | 32.2 | - | [wecom](https://drive.weixin.qq.com/s?k=AJEAIQdfAAoK06oxO2) | [log]() |
78
 
79
+ **NOTE:**
80
+ 1. The evaluation results are tested on LVIS minival in a zero-shot manner.
 
81
 
82
 
83
  ## Getting started
 
86
 
87
  YOLO-World is developed based on `torch==1.11.0` `mmyolo==0.6.0` and `mmdetection==3.0.0`.
88
 
 
 
89
  ```bash
90
+ # install key dependencies
91
+ pip install mmdetection==3.0.0 mmengine transformers
92
+
93
+ # clone the repo
94
+ git clone https://xxxx.YOLO-World.git
95
+ cd YOLO-World
96
+
97
+ # install mmyolo
98
+ mkdir third_party
99
+ git clone https://github.com/open-mmlab/mmyolo.git
100
+ cd ..
101
 
 
 
 
102
  ```
103
 
104
  ### 2. Preparing Data
 
119
  ```
120
  **NOTE:** YOLO-World is pre-trained on 4 nodes with 8 GPUs per node (32 GPUs in total). For pre-training, the `node_rank` and `nnodes` for multi-node training should be specified.
121
 
122
+ Evalutating YOLO-World is also easy:
123
 
124
  ```bash
125
  chmod +x tools/dist_test.sh
 
128
 
129
  **NOTE:** We mainly evaluate the performance on LVIS-minival for pre-training.
130
 
 
 
 
 
131
  ## Deployment
132
 
133
  We provide the details about deployment for downstream applications in [docs/deployment](./docs/deploy.md).
134
+ You can directly download the ONNX model through the online [demo]() in Huggingface Spaces 🤗.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
135
 
136
  ## Acknowledgement
137
 
138
+ We sincerely thank [mmyolo](https://github.com/open-mmlab/mmyolo), [mmdetection](https://github.com/open-mmlab/mmdetection), and [transformers](https://github.com/huggingface/transformers) for providing their wonderful code to the community!
139
 
140
  ## Citations
141
  If you find YOLO-World is useful in your research or applications, please consider giving us a star 🌟 and citing it.
142
 
143
  ```bibtex
144
+ @article{cheng2024yolow,
145
  title={YOLO-World: Real-Time Open-Vocabulary Object Detection},
146
  author={Cheng, Tianheng and Song, Lin and Ge, Yixiao and Liu, Wenyu and Wang, Xinggang and Shan, Ying},
147
+ journal={arXiv preprint arXiv:},
148
  year={2024}
149
  }
150
  ```
151
 
152
  ## Licence
153
+ YOLO-World is under the GPL-v3 Licence and is supported for comercial usage.