-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 145 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 12 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 52 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 45
Collections
Discover the best community collections!
Collections including paper arxiv:2407.04078
-
Unlocking Continual Learning Abilities in Language Models
Paper • 2406.17245 • Published • 28 -
A Closer Look into Mixture-of-Experts in Large Language Models
Paper • 2406.18219 • Published • 15 -
Symbolic Learning Enables Self-Evolving Agents
Paper • 2406.18532 • Published • 11 -
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
Paper • 2406.18629 • Published • 41
-
MathScale: Scaling Instruction Tuning for Mathematical Reasoning
Paper • 2403.02884 • Published • 15 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 73 -
Improving Small Language Models' Mathematical Reasoning via Mix Thoughts Distillation
Paper • 2401.11864 • Published • 2 -
Common 7B Language Models Already Possess Strong Math Capabilities
Paper • 2403.04706 • Published • 16
-
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Paper • 2402.12226 • Published • 41 -
DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning
Paper • 2407.04078 • Published • 17 -
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
Paper • 2412.04424 • Published • 57
-
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Paper • 1701.06538 • Published • 5 -
Sparse Networks from Scratch: Faster Training without Losing Performance
Paper • 1907.04840 • Published • 3 -
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Paper • 1910.02054 • Published • 4 -
A Mixture of h-1 Heads is Better than h Heads
Paper • 2005.06537 • Published • 2