Tony Zhao's picture

Tony Zhao

tianchez

·

https://www.tianchez.com

AI & ML interests

Multimodal Agent, Generative AI

Recent Activity

commented on a paper 9 days ago

VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model

updated a model 10 days ago

omlab/VLM-R1-Qwen2.5VL-3B-Math-0305

updated a model 10 days ago

omlab/Qwen2.5VL-3B-VLM-R1-REC-500steps

View all activity

Organizations

tianchez's activity

commented a paper 9 days ago

VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model

Paper • 2504.07615 • Published 13 days ago • 30 •

updated 3 models 10 days ago

omlab/VLM-R1-Qwen2.5VL-3B-Math-0305

Visual Question Answering • Updated 10 days ago • 326

omlab/Qwen2.5VL-3B-VLM-R1-REC-500steps

Zero-Shot Object Detection • Updated 10 days ago • 817 • 22

omlab/VLM-R1-Qwen2.5VL-3B-OVD-0321

Zero-Shot Object Detection • Updated 10 days ago • 948 • 10

updated a collection 10 days ago

Multimodal Research

10 items • Updated 10 days ago • 2

updated a Space 10 days ago

VLM R1 Referral Expression

Mark regions in images based on text descriptions

upvoted a paper 10 days ago

VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model

Paper • 2504.07615 • Published 13 days ago • 30

replied to AdinaY's post 29 days ago

https://huggingface.co/blog/omlab/vlm-r1-for-ovd
https://huggingface.co/blog/omlab/vlm-ovd-findings

replied to AdinaY's post about 1 month ago

We now share our latest insights in the blog here.
https://om-ai-lab.github.io/index.html

liked 2 Spaces about 1 month ago

OmAgent

Process and answer questions about webpage videos

VLM R1 OVD

VLM-R1 model for Open-Vocabulary Object Detection

published a Space about 1 month ago

VLM R1 OVD

VLM-R1 model for Open-Vocabulary Object Detection

upvoted a collection about 2 months ago

VLM-R1-models

A collection of VLM-R1 Models • 7 items • Updated Mar 22 • 4

New activity in omlab/VLM-R1-Referral-Expression about 2 months ago

Apply for community grant: Personal project (gpu)

#3 opened about 2 months ago by

replied to their post about 2 months ago

looks very cool!

reacted to their post with 👍 about 2 months ago

Post

4327

Introducing VLM-R1!

GRPO has helped DeepSeek R1 to learn reasoning. Can it also help VLMs perform stronger for general computer vision tasks?

The answer is YES and it generalizes better than SFT. We trained Qwen 2.5 VL 3B on RefCOCO (a visual grounding task) and eval on RefCOCO Val and RefGTA (an OOD task).

https://github.com/om-ai-lab/VLM-R1

3 replies

·

New activity in omlab/VLM-R1-Referral-Expression 2 months ago

Fixes 500 error for some users

#1 opened 2 months ago by

reacted to their post with ❤️ 2 months ago

Post

4327

Introducing VLM-R1!

GRPO has helped DeepSeek R1 to learn reasoning. Can it also help VLMs perform stronger for general computer vision tasks?

The answer is YES and it generalizes better than SFT. We trained Qwen 2.5 VL 3B on RefCOCO (a visual grounding task) and eval on RefCOCO Val and RefGTA (an OOD task).

https://github.com/om-ai-lab/VLM-R1

3 replies

·