Collections
Discover the best community collections!
Collections including paper arxiv:2407.01370
-
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
Paper • 2406.11813 • Published • 29 -
From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries
Paper • 2406.12824 • Published • 20 -
Tokenization Falling Short: The Curse of Tokenization
Paper • 2406.11687 • Published • 14 -
Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level
Paper • 2406.11817 • Published • 13
-
LLoCO: Learning Long Contexts Offline
Paper • 2404.07979 • Published • 19 -
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper • 2402.13753 • Published • 109 -
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration
Paper • 2402.11550 • Published • 14 -
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Paper • 2401.18058 • Published • 21
-
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
Paper • 2402.14848 • Published • 18 -
The Prompt Report: A Systematic Survey of Prompting Techniques
Paper • 2406.06608 • Published • 51 -
CRAG -- Comprehensive RAG Benchmark
Paper • 2406.04744 • Published • 40 -
Transformers meet Neural Algorithmic Reasoners
Paper • 2406.09308 • Published • 43
-
Attention Is All You Need
Paper • 1706.03762 • Published • 41 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 14 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 13 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 11
-
The Unreasonable Ineffectiveness of the Deeper Layers
Paper • 2403.17887 • Published • 77 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 102 -
ReFT: Representation Finetuning for Language Models
Paper • 2404.03592 • Published • 85 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 58
-
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset
Paper • 2403.09029 • Published • 54 -
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
Paper • 2403.12968 • Published • 24 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 65 -
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Paper • 2403.09629 • Published • 55
-
MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries
Paper • 2401.15391 • Published • 6 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 65 -
Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation
Paper • 2404.06910 • Published • 1 -
Stylus: Automatic Adapter Selection for Diffusion Models
Paper • 2404.18928 • Published • 14