2 19 51

Tan Minh Tran

minhtt32

tanminhtran168

AI & ML interests

Natural Language Processing

Recent Activity

liked a Space 19 days ago

OpenGVLab/MVBench_Leaderboard

upvoted a paper 19 days ago

Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills

liked a model 20 days ago

ds4sd/SmolDocling-256M-preview

View all activity

Organizations

minhtt32's activity

upvoted a paper 19 days ago

Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills

Paper • 2503.12533 • Published 21 days ago • 63

upvoted a paper 21 days ago

LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Paper • 2503.07536 • Published 27 days ago • 84

upvoted 7 papers 25 days ago

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

Paper • 2502.18411 • Published Feb 25 • 72

R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts

Paper • 2502.20395 • Published Feb 27 • 46

MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning

Paper • 2502.19634 • Published Feb 26 • 63

Visual-RFT: Visual Reinforcement Fine-Tuning

Paper • 2503.01785 • Published Mar 3 • 75

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

Paper • 2503.01743 • Published Mar 3 • 83

R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model

Paper • 2503.05132 • Published about 1 month ago • 54

Unified Reward Model for Multimodal Understanding and Generation

Paper • 2503.05236 • Published about 1 month ago • 114

upvoted an article about 2 months ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

Jan 28

• 835

upvoted an article 2 months ago

Article

We now support VLMs in smolagents!

Jan 24

• 99

upvoted 2 papers 3 months ago

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Paper • 2412.14171 • Published Dec 18, 2024 • 24

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 364

upvoted 2 papers 4 months ago

Agent-as-a-Judge: Evaluate Agents with Agents

Paper • 2410.10934 • Published Oct 14, 2024 • 22

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 152

upvoted 2 papers 5 months ago

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published Nov 15, 2024 • 123

DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation

Paper • 2411.04999 • Published Nov 7, 2024 • 18

upvoted a paper 6 months ago

MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models

Paper • 2410.10139 • Published Oct 14, 2024 • 52

upvoted a paper 7 months ago

Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese

Paper • 2408.12480 • Published Aug 22, 2024 • 23