-
PDFTriage: Question Answering over Long, Structured Documents
Paper • 2309.08872 • Published • 53 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 77 -
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper • 2310.09263 • Published • 39 -
Context-Aware Meta-Learning
Paper • 2310.10971 • Published • 16
Collections
Discover the best community collections!
Collections including paper arxiv:2406.14491
-
Textbooks Are All You Need
Paper • 2306.11644 • Published • 142 -
Textbooks Are All You Need II: phi-1.5 technical report
Paper • 2309.05463 • Published • 87 -
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
Paper • 2305.07759 • Published • 33 -
Scaling Synthetic Data Creation with 1,000,000,000 Personas
Paper • 2406.20094 • Published • 95
-
Training Task Experts through Retrieval Based Distillation
Paper • 2407.05463 • Published • 7 -
Instruction Pre-Training: Language Models are Supervised Multitask Learners
Paper • 2406.14491 • Published • 86 -
Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization
Paper • 2409.12903 • Published • 21 -
A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models
Paper • 2410.13841 • Published • 14
-
Instruction Pre-Training: Language Models are Supervised Multitask Learners
Paper • 2406.14491 • Published • 86 -
Pre-training Small Base LMs with Fewer Tokens
Paper • 2404.08634 • Published • 34 -
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training
Paper • 2405.15319 • Published • 25 -
Can LLMs Learn by Teaching? A Preliminary Study
Paper • 2406.14629 • Published • 19
-
Instruction Pre-Training: Language Models are Supervised Multitask Learners
Paper • 2406.14491 • Published • 86 -
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Paper • 2405.21060 • Published • 63 -
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models
Paper • 2405.20541 • Published • 21 -
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Paper • 2406.01574 • Published • 43
-
Instruction Pre-Training: Language Models are Supervised Multitask Learners
Paper • 2406.14491 • Published • 86 -
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
Paper • 2410.23743 • Published • 59 -
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks
Paper • 2410.20650 • Published • 16