2 77 132

Wenhao Chai

wchai

http://rese1f.github.io

AI & ML interests

computer vision, artificial intelligence

Recent Activity

upvoted a paper 1 day ago

Step1X-Edit: A Practical Framework for General Image Editing

upvoted a paper 5 days ago

It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization

upvoted a paper 9 days ago

WORLDMEM: Long-term Consistent World Simulation with Memory

View all activity

Organizations

wchai's activity

upvoted a paper 1 day ago

Step1X-Edit: A Practical Framework for General Image Editing

Paper • 2504.17761 • Published 2 days ago • 63

upvoted a paper 5 days ago

It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization

Paper • 2504.13173 • Published 9 days ago • 17

upvoted a paper 9 days ago

WORLDMEM: Long-term Consistent World Simulation with Memory

Paper • 2504.12369 • Published 10 days ago • 30

upvoted a paper 13 days ago

GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation

Paper • 2504.08736 • Published 15 days ago • 47

liked a dataset 15 days ago

nyu-visionx/CV-Bench

Viewer • Updated 25 days ago • 5.28k • 5.81k • 30

upvoted 2 papers 16 days ago

HoloPart: Generative 3D Part Amodal Segmentation

Paper • 2504.07943 • Published 16 days ago • 28

DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning

Paper • 2504.07128 • Published 25 days ago • 82

liked a model 16 days ago

agentica-org/DeepCoder-14B-Preview

Text Generation • Updated 17 days ago • 42.2k • 610

upvoted a paper 17 days ago

DDT: Decoupled Diffusion Transformer

Paper • 2504.05741 • Published 19 days ago • 73

reacted to AdinaY's post with 🔥 17 days ago

Post

2711

Moonshot AI 月之暗面 🌛 @Kimi_Moonshotis just dropped an MoE VLM and an MoE Reasoning VLM on the hub!!

Model:https://huggingface.co/collections/moonshotai/kimi-vl-a3b-67f67b6ac91d3b03d382dd85

✨3B with MIT license
✨Long context windows up to 128K
✨Strong multimodal reasoning (36.8% on MathVision, on par with 10x larger models) and agent skills (34.5% on ScreenSpot-Pro)