unsloth/Llama-4-Scout-17B-16E-Instruct Image-Text-to-Text • Updated 35 minutes ago • 5.13k • 47
Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning Paper • 2407.15815 • Published Jul 22, 2024 • 14
Wavelets Are All You Need for Autoregressive Image Generation Paper • 2406.19997 • Published Jun 28, 2024 • 31
internlm/internlm-xcomposer2d5-7b Visual Question Answering • Updated Jul 22, 2024 • 4.79k • 204
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output Paper • 2407.03320 • Published Jul 3, 2024 • 95
Running on Zero 749 749 Florence 2 📉 Analyze images to generate captions, detect objects, or perform OCR
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs Paper • 2406.16860 • Published Jun 24, 2024 • 60
view article Article Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models Jun 24, 2024 • 191
LiveMind: Low-latency Large Language Models with Simultaneous Inference Paper • 2406.14319 • Published Jun 20, 2024 • 14