-
Lost in the Middle: How Language Models Use Long Contexts
Paper • 2307.03172 • Published • 31 -
Efficient Estimation of Word Representations in Vector Space
Paper • 1301.3781 • Published • 6 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 11 -
Attention Is All You Need
Paper • 1706.03762 • Published • 34
Collections
Discover the best community collections!
Collections including paper arxiv:2203.15556
-
SELF: Language-Driven Self-Evolution for Large Language Model
Paper • 2310.00533 • Published • 2 -
GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length
Paper • 2310.00576 • Published • 2 -
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity
Paper • 2305.13169 • Published • 2 -
Transformers Can Achieve Length Generalization But Not Robustly
Paper • 2402.09371 • Published • 12
-
TinyLLaVA: A Framework of Small-scale Large Multimodal Models
Paper • 2402.14289 • Published • 16 -
ImageBind: One Embedding Space To Bind Them All
Paper • 2305.05665 • Published • 3 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 173 -
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts
Paper • 2206.02770 • Published • 3
-
Measuring the Effects of Data Parallelism on Neural Network Training
Paper • 1811.03600 • Published • 2 -
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Paper • 1804.04235 • Published • 2 -
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Paper • 1905.11946 • Published • 2 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 58
-
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 7 -
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Paper • 1909.08053 • Published • 2 -
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Paper • 1910.10683 • Published • 6 -
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Paper • 2304.01373 • Published • 7
-
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 7 -
Perspectives on the State and Future of Deep Learning -- 2023
Paper • 2312.09323 • Published • 5 -
MobileSAMv2: Faster Segment Anything to Everything
Paper • 2312.09579 • Published • 20 -
Point Transformer V3: Simpler, Faster, Stronger
Paper • 2312.10035 • Published • 17
-
Attention Is All You Need
Paper • 1706.03762 • Published • 34 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 11 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 7 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 10
-
Attention Is All You Need
Paper • 1706.03762 • Published • 34 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 11 -
Universal Language Model Fine-tuning for Text Classification
Paper • 1801.06146 • Published • 6 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 9
-
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
Paper • 2309.03883 • Published • 14 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 24 -
Agents: An Open-source Framework for Autonomous Language Agents
Paper • 2309.07870 • Published • 39 -
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Paper • 2309.00267 • Published • 45