Collections
Discover the best community collections!
Collections including paper arxiv:2304.12244
-
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper ā¢ 2402.19427 ā¢ Published ā¢ 49 -
Self-Rewarding Language Models
Paper ā¢ 2401.10020 ā¢ Published ā¢ 135 -
Tuning Language Models by Proxy
Paper ā¢ 2401.08565 ā¢ Published ā¢ 19 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper ā¢ 2401.06066 ā¢ Published ā¢ 35
-
Attention Is All You Need
Paper ā¢ 1706.03762 ā¢ Published ā¢ 34 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper ā¢ 1810.04805 ā¢ Published ā¢ 11 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper ā¢ 1907.11692 ā¢ Published ā¢ 7 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper ā¢ 1910.01108 ā¢ Published ā¢ 10
-
Attention Is All You Need
Paper ā¢ 1706.03762 ā¢ Published ā¢ 34 -
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper ā¢ 2307.08691 ā¢ Published ā¢ 6 -
Mixtral of Experts
Paper ā¢ 2401.04088 ā¢ Published ā¢ 152 -
Mistral 7B
Paper ā¢ 2310.06825 ā¢ Published ā¢ 42
-
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs
Paper ā¢ 2310.13961 ā¢ Published ā¢ 4 -
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs
Paper ā¢ 2309.09582 ā¢ Published ā¢ 4 -
Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models
Paper ā¢ 2310.13127 ā¢ Published ā¢ 10 -
Evaluating the Robustness to Instructions of Large Language Models
Paper ā¢ 2308.14306 ā¢ Published ā¢ 1
-
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs
Paper ā¢ 2310.13961 ā¢ Published ā¢ 4 -
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
Paper ā¢ 2202.07922 ā¢ Published ā¢ 1 -
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
Paper ā¢ 2310.13671 ā¢ Published ā¢ 17 -
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs
Paper ā¢ 2309.09582 ā¢ Published ā¢ 4
-
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
Paper ā¢ 2308.09583 ā¢ Published ā¢ 7 -
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
Paper ā¢ 2306.08568 ā¢ Published ā¢ 27 -
WizardLM: Empowering Large Language Models to Follow Complex Instructions
Paper ā¢ 2304.12244 ā¢ Published ā¢ 13