Xing Yun's picture

Xing Yun

xing0047

·

xing0047

AI & ML interests

Computer Vision

Recent Activity

upvoted a paper 5 days ago

EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer

upvoted a paper 9 days ago

UniTok: A Unified Tokenizer for Visual Generation and Understanding

upvoted a paper 9 days ago

R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts

View all activity

Organizations

xing0047's activity

upvoted a paper 5 days ago

EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer

Paper • 2503.07027 • Published 6 days ago • 23

upvoted 3 papers 9 days ago

UniTok: A Unified Tokenizer for Visual Generation and Understanding

Paper • 2502.20321 • Published 17 days ago • 29

R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts

Paper • 2502.20395 • Published 17 days ago • 44

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

Paper • 2503.01743 • Published 13 days ago • 72

upvoted a paper 11 days ago

Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers

Paper • 2503.00865 • Published 14 days ago • 58

upvoted a paper 12 days ago

Visual-RFT: Visual Reinforcement Fine-Tuning

Paper • 2503.01785 • Published 13 days ago • 66

upvoted 3 papers 13 days ago

Self-rewarding correction for mathematical reasoning

Paper • 2502.19613 • Published 18 days ago • 77

MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning

Paper • 2502.19634 • Published 18 days ago • 58

LongRoPE2: Near-Lossless LLM Context Window Scaling

Paper • 2502.20082 • Published 17 days ago • 31

upvoted a paper 18 days ago

DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks

Paper • 2502.17157 • Published 20 days ago • 51

upvoted 5 papers 23 days ago

Magma: A Foundation Model for Multimodal AI Agents

Paper • 2502.13130 • Published 26 days ago • 56

LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization

Paper • 2502.13922 • Published 25 days ago • 25

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published 25 days ago • 164

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published 24 days ago • 130

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Paper • 2502.14739 • Published 24 days ago • 97

upvoted a paper 28 days ago

ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features

Paper • 2502.04320 • Published Feb 6 • 35

upvoted a paper about 2 months ago

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 276

upvoted 3 papers 2 months ago

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?

Paper • 2501.05510 • Published Jan 9 • 39

EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation

Paper • 2501.01895 • Published Jan 3 • 51

An Empirical Study of Autoregressive Pre-training from Videos

Paper • 2501.05453 • Published Jan 9 • 37