2 21 52

Tan Minh Tran

minhtt32

tanminhtran168

AI & ML interests

Natural Language Processing

Recent Activity

upvoted a paper 7 days ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

liked a Space 8 days ago

moonshotai/Kimi-VL-A3B

upvoted a paper 9 days ago

Kimi-VL Technical Report

View all activity

Organizations

minhtt32's activity

upvoted a paper 7 days ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published 8 days ago • 239

liked a Space 8 days ago

Chat with Kimi-VL (Image, Agent, Video, PDF)

🚀

Chat with Kimi-VL-A3B-Instruct using text, images, and videos

upvoted a paper 9 days ago

Kimi-VL Technical Report

Paper • 2504.07491 • Published 13 days ago • 118

liked a Space about 1 month ago

MVBench Leaderboard

🐨

Submit model evaluation and view leaderboard

upvoted a paper about 1 month ago

Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills

Paper • 2503.12533 • Published Mar 16 • 64

liked a model about 1 month ago

ds4sd/SmolDocling-256M-preview

Image-Text-to-Text • Updated about 1 month ago • 81.5k • 1.25k

upvoted 8 papers about 1 month ago

LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Paper • 2503.07536 • Published Mar 10 • 84

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

Paper • 2502.18411 • Published Feb 25 • 73

R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts

Paper • 2502.20395 • Published Feb 27 • 47

MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning

Paper • 2502.19634 • Published Feb 26 • 63

Visual-RFT: Visual Reinforcement Fine-Tuning

Paper • 2503.01785 • Published Mar 3 • 77

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

Paper • 2503.01743 • Published Mar 3 • 84

R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model

Paper • 2503.05132 • Published Mar 7 • 56

Unified Reward Model for Multimodal Understanding and Generation

Paper • 2503.05236 • Published Mar 7 • 119

upvoted an article 2 months ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

Jan 28

• 845

upvoted an article 3 months ago

Article

We now support VLMs in smolagents!

Jan 24

• 100

liked a model 3 months ago

ByteDance/Sa2VA-4B

Image-Text-to-Text • Updated Mar 19 • 1.48k • 72

liked a model 4 months ago

erax-ai/EraX-VL-7B-V1.5

Visual Question Answering • Updated 20 days ago • 533 • 8

liked a Space 4 months ago

566

QVQ 72B Preview

🌍

Upload images and ask questions to get answers

upvoted a paper 4 months ago

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Paper • 2412.14171 • Published Dec 18, 2024 • 24