Qinghong (Kevin) Lin's picture

Qinghong (Kevin) Lin

KevinQHLin

·

http://qhlin.me/

AI & ML interests

Vision-Language Model, Video Understanding, Human-AI Interaction

Recent Activity

upvoted a paper 1 day ago

LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

upvoted a paper 13 days ago

V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models

liked a model 22 days ago

yeliudev/VideoMind-2B

View all activity

Organizations

KevinQHLin's activity

upvoted a paper 1 day ago

LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

Paper • 2504.16030 • Published 2 days ago • 23

upvoted a paper 13 days ago

V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models

Paper • 2504.06148 • Published 16 days ago • 13

upvoted a collection 26 days ago

VideoMind

VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning • 8 items • Updated 25 days ago • 3

upvoted 2 papers 30 days ago

Edit Transfer: Learning Image Editing via Vision In-Context Relations

Paper • 2503.13327 • Published Mar 17 • 29

Long-Context Autoregressive Video Modeling with Next-Frame Prediction

Paper • 2503.19325 • Published about 1 month ago • 72

upvoted 3 papers about 1 month ago

VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning

Paper • 2503.13444 • Published Mar 17 • 15

TPDiff: Temporal Pyramid Video Diffusion Model

Paper • 2503.09566 • Published Mar 12 • 45

Automated Movie Generation via Multi-Agent CoT Planning

Paper • 2503.07314 • Published Mar 10 • 45

upvoted 3 papers about 2 months ago

DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles

Paper • 2503.03651 • Published Mar 5 • 16

Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models

Paper • 2503.01774 • Published Mar 3 • 44

PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data

Paper • 2502.14397 • Published Feb 20 • 42

upvoted 2 papers 2 months ago

WorldGUI: Dynamic Testing for Comprehensive Desktop GUI Automation

Paper • 2502.08047 • Published Feb 12 • 27

TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation

Paper • 2502.07870 • Published Feb 11 • 44

upvoted an article 2 months ago

Article

Welcome to Inference Providers on the Hub 🔥

Jan 28

• 479

upvoted a paper 4 months ago

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 40

upvoted a collection 4 months ago

AGUVIS: Unified Pure Vision GUI Agents

https://aguvis-project.github.io • 3 items • Updated Dec 20, 2024 • 5

upvoted 2 papers 5 months ago

Factorized Visual Tokenization and Generation

Paper • 2411.16681 • Published Nov 25, 2024 • 19

ROICtrl: Boosting Instance Control for Visual Generation

Paper • 2411.17949 • Published Nov 27, 2024 • 88

upvoted 2 collections 5 months ago

GUI Models

9 items • Updated Feb 21 • 3

Research on GUI Models

18 items • Updated Feb 21 • 4