Jingfeng Yao

MapleF9

AI & ML interests

None yet

Recent Activity

upvoted a paper about 1 hour ago

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

upvoted a paper about 24 hours ago

REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers

upvoted a paper 9 days ago

OmniSVG: A Unified Scalable Vector Graphics Generation Model

View all activity

Organizations

MapleF9's activity

upvoted a paper about 1 hour ago

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

Paper • 2504.12626 • Published 1 day ago • 16

upvoted a paper about 24 hours ago

REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers

Paper • 2504.10483 • Published 4 days ago • 15

upvoted a paper 9 days ago

OmniSVG: A Unified Scalable Vector Graphics Generation Model

Paper • 2504.06263 • Published 10 days ago • 143

upvoted a paper 10 days ago

One-Minute Video Generation with Test-Time Training

Paper • 2504.05298 • Published 11 days ago • 94

upvoted a paper 11 days ago

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Paper • 2504.01990 • Published 18 days ago • 242

upvoted a paper 16 days ago

Scaling Language-Free Visual Representation Learning

Paper • 2504.01017 • Published 17 days ago • 26

upvoted 6 papers about 1 month ago

OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models

Paper • 2503.08686 • Published Mar 11 • 18

AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning

Paper • 2503.07608 • Published Mar 10 • 21

Improve Representation for Imbalanced Regression through Geometric Constraints

Paper • 2503.00876 • Published Mar 2 • 6

upvoted a paper about 2 months ago

Visual-RFT: Visual Reinforcement Fine-Tuning

Paper • 2503.01785 • Published Mar 3 • 76

upvoted an article about 2 months ago

Article

SigLIP 2: A better multilingual vision language encoder

Feb 21

• 152

upvoted 3 papers about 2 months ago

RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning

Paper • 2502.13144 • Published Feb 18 • 38

Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation

Paper • 2502.13145 • Published Feb 18 • 38

Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model

Paper • 2502.10248 • Published Feb 14 • 55

upvoted 3 papers 2 months ago

QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation

Paper • 2502.05178 • Published Feb 7 • 10

Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT

Paper • 2502.06782 • Published Feb 10 • 14

MatAnyone: Stable Video Matting with Consistent Memory Propagation

Paper • 2501.14677 • Published Jan 24 • 35