Om AI Lab

Enterprise
company
Activity Feed

AI & ML interests

Multimodal AI, Agents

Recent Activity

Articles

omlab's activity

tianchezย 
posted an update about 2 months ago
view post
Post
4263
Introducing VLM-R1!

GRPO has helped DeepSeek R1 to learn reasoning. Can it also help VLMs perform stronger for general computer vision tasks?

The answer is YES and it generalizes better than SFT. We trained Qwen 2.5 VL 3B on RefCOCO (a visual grounding task) and eval on RefCOCO Val and RefGTA (an OOD task).

https://github.com/om-ai-lab/VLM-R1
ยท