-
Attention Is All You Need
Paper • 1706.03762 • Published • 36 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 12 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 7 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 11
Collections
Discover the best community collections!
Collections including paper arxiv:2201.11903
-
Attention Is All You Need
Paper • 1706.03762 • Published • 36 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 10 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper • 2201.11903 • Published • 7 -
Orca 2: Teaching Small Language Models How to Reason
Paper • 2311.11045 • Published • 69
-
RA-DIT: Retrieval-Augmented Dual Instruction Tuning
Paper • 2310.01352 • Published • 6 -
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Paper • 2203.11171 • Published • 1 -
MemGPT: Towards LLMs as Operating Systems
Paper • 2310.08560 • Published • 6 -
Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models
Paper • 2310.06117 • Published • 3
-
Contrastive Chain-of-Thought Prompting
Paper • 2311.09277 • Published • 31 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper • 2201.11903 • Published • 7 -
Orca 2: Teaching Small Language Models How to Reason
Paper • 2311.11045 • Published • 69 -
System 2 Attention (is something you might need too)
Paper • 2311.11829 • Published • 38
-
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 60 -
Learning To Teach Large Language Models Logical Reasoning
Paper • 2310.09158 • Published • 1 -
ChipNeMo: Domain-Adapted LLMs for Chip Design
Paper • 2311.00176 • Published • 7 -
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
Paper • 2308.09583 • Published • 7
-
Retentive Network: A Successor to Transformer for Large Language Models
Paper • 2307.08621 • Published • 167 -
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Paper • 2303.12712 • Published • 2 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 3 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper • 2201.11903 • Published • 7
-
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
Paper • 2309.03883 • Published • 14 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 25 -
Agents: An Open-source Framework for Autonomous Language Agents
Paper • 2309.07870 • Published • 39 -
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Paper • 2309.00267 • Published • 45