view article Article Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial By open-r1 • Jan 31 • 42
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models Paper • 2404.13013 • Published Apr 19, 2024 • 31
view article Article A failed experiment: Infini-Attention, and why we should keep trying? Aug 14, 2024 • 60
view article Article Docmatix - a huge dataset for Document Visual Question Answering Jul 18, 2024 • 72
Running 543 543 Vision Arena (Testing VLMs side-by-side) 🖼 Analyze images to detect and label objects