The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer Paper • 2502.15631 • Published 4 days ago • 6
Running 92 92 Qwen2.5 VL 72B Instruct 💻 Interact with Qwen2.5-VL-72B to get responses and generate images
WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training Paper • 2501.18511 • Published 26 days ago • 19
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published Dec 18, 2024 • 134
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections Paper • 2502.12170 • Published 12 days ago • 11
Continuous Diffusion Model for Language Modeling Paper • 2502.11564 • Published 8 days ago • 49
Phantom: Subject-consistent video generation via cross-modal alignment Paper • 2502.11079 • Published 9 days ago • 50
Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation Paper • 2502.13145 • Published 7 days ago • 34