-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper • 2211.04325 • Published -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 17 -
On the Opportunities and Risks of Foundation Models
Paper • 2108.07258 • Published -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 1
Collections
Discover the best community collections!
Collections including paper arxiv:2403.03507
-
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
Paper • 2503.12937 • Published • 26 -
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Paper • 2503.12605 • Published • 30 -
Transformer^2: Self-adaptive LLMs
Paper • 2501.06252 • Published • 54 -
s1: Simple test-time scaling
Paper • 2501.19393 • Published • 113
-
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Paper • 1910.10683 • Published • 11 -
AutoTrain: No-code training for state-of-the-art models
Paper • 2410.15735 • Published • 60 -
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Paper • 2405.00732 • Published • 121 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 37
-
SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding
Paper • 2408.15545 • Published • 36 -
Controllable Text Generation for Large Language Models: A Survey
Paper • 2408.12599 • Published • 65 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 42 -
Automated Design of Agentic Systems
Paper • 2408.08435 • Published • 39
-
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 143 -
Elucidating the Design Space of Diffusion-Based Generative Models
Paper • 2206.00364 • Published • 16 -
GLU Variants Improve Transformer
Paper • 2002.05202 • Published • 3 -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 138