Xirui Li's picture

Xirui Li PRO

AIcell

·

https://xirui-li.github.io/

AI & ML interests

Foundation LLM and VLM

Recent Activity

upvoted a paper 7 days ago

FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding

upvoted a paper 7 days ago

VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

new activity 8 days ago

turningpoint-ai/VisualThinker-R1-Zero:Fix huggingface model checkpoint name to "turningpoint-ai/VisualThinker-R1-Zero"

View all activity

Organizations

AIcell's activity

upvoted 2 papers 7 days ago

FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding

Paper • 2504.09925 • Published 9 days ago • 38

VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Paper • 2504.08837 • Published 12 days ago • 42

upvoted a collection about 1 month ago

InternVL2.5

Better than InternVL 2.0 • 19 items • Updated 3 days ago • 92

upvoted a paper about 1 month ago

R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model

Paper • 2503.05132 • Published Mar 7 • 57

upvoted a paper about 2 months ago

Thus Spake Long-Context Large Language Model

Paper • 2502.17129 • Published Feb 24 • 73

upvoted 5 papers 4 months ago

NVILA: Efficient Frontier Visual Language Models

Paper • 2412.04468 • Published Dec 5, 2024 • 60

Perception Tokens Enhance Visual Reasoning in Multimodal Language Models

Paper • 2412.03548 • Published Dec 4, 2024 • 17

STIV: Scalable Text and Image Conditioned Video Generation

Paper • 2412.07730 • Published Dec 10, 2024 • 74

Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 101

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Paper • 2412.14171 • Published Dec 18, 2024 • 24