8 63 20

Zesen Cheng

ClownRat

AI & ML interests

multi-modal foundation model; Segmentation, Detection, and Tracking;

Recent Activity

upvoted a paper 4 days ago

MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation

authored a paper 4 days ago

MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation

upvoted a paper 10 days ago

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

View all activity

Organizations

ClownRat's activity

upvoted a paper 4 days ago

MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation

Paper • 2503.14428 • Published 11 days ago • 8

upvoted 2 papers 10 days ago

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published Nov 15, 2024 • 123

Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data

Paper • 2410.18558 • Published Oct 24, 2024 • 20

upvoted a paper 15 days ago

Transformers without Normalization

Paper • 2503.10622 • Published 16 days ago • 141

upvoted 2 papers 28 days ago

LongRoPE2: Near-Lossless LLM Context Window Scaling

Paper • 2502.20082 • Published about 1 month ago • 36

Self-rewarding correction for mathematical reasoning

Paper • 2502.19613 • Published about 1 month ago • 82

upvoted 2 articles about 1 month ago

Article

Mixture of Experts Explained

Dec 11, 2023

• 491

Article

SigLIP 2: A better multilingual vision language encoder

Feb 21

• 147

upvoted 3 papers about 1 month ago

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 175

Small Models Struggle to Learn from Strong Reasoners

Paper • 2502.12143 • Published Feb 17 • 33

LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization

Paper • 2502.13922 • Published Feb 19 • 25

upvoted 7 papers about 2 months ago

Kimi k1.5: Scaling Reinforcement Learning with LLMs

Paper • 2501.12599 • Published Jan 22 • 112

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 366

Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding

Paper • 2501.07888 • Published Jan 14 • 15

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 282

upvoted 2 papers 2 months ago

Valley2: Exploring Multimodal Models with Scalable Vision-Language Design

Paper • 2501.05901 • Published Jan 10 • 1

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Paper • 2408.15998 • Published Aug 28, 2024 • 87