RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval Paper • 2409.10516 • Published Sep 16 • 39
Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse Paper • 2409.11242 • Published Sep 17 • 5
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models Paper • 2409.11136 • Published Sep 17 • 21
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis Paper • 2410.02749 • Published Oct 3 • 12
L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding? Paper • 2410.02115 • Published Oct 3 • 10
Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations Paper • 2410.02762 • Published Oct 3 • 9
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models Paper • 2410.01335 • Published Oct 2 • 5
RATIONALYST: Pre-training Process-Supervision for Improving Reasoning Paper • 2410.01044 • Published Oct 1 • 34
Quantifying Generalization Complexity for Large Language Models Paper • 2410.01769 • Published Oct 2 • 13
InfiniPot: Infinite Context Processing on Memory-Constrained LLMs Paper • 2410.01518 • Published Oct 2 • 2
Law of the Weakest Link: Cross Capabilities of Large Language Models Paper • 2409.19951 • Published Sep 30 • 53
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations Paper • 2410.02707 • Published Oct 3 • 47
Addition is All You Need for Energy-efficient Language Models Paper • 2410.00907 • Published Oct 1 • 144
Mentor-KD: Making Small Language Models Better Multi-step Reasoners Paper • 2410.09037 • Published Oct 11 • 4
Rethinking Data Selection at Scale: Random Selection is Almost All You Need Paper • 2410.09335 • Published Oct 12 • 16
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization Paper • 2410.08815 • Published Oct 11 • 43
SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights Paper • 2410.09008 • Published Oct 11 • 16
SimpleStrat: Diversifying Language Model Generation with Stratification Paper • 2410.09038 • Published Oct 11 • 4
PositionID: LLMs can Control Lengths, Copy and Paste with Explicit Positional Awareness Paper • 2410.07035 • Published Oct 9 • 16
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs Paper • 2410.12405 • Published Oct 16 • 13
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free Paper • 2410.10814 • Published Oct 14 • 48
Vector-ICL: In-context Learning with Continuous Vector Representations Paper • 2410.05629 • Published Oct 8 • 3
Pre-training Distillation for Large Language Models: A Design Space Exploration Paper • 2410.16215 • Published Oct 21 • 15
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs Paper • 2410.13276 • Published Oct 17 • 25
How Do Training Methods Influence the Utilization of Vision Models? Paper • 2410.14470 • Published Oct 18 • 4
Context is Key(NMF): Modelling Topical Information Dynamics in Chinese Diaspora Media Paper • 2410.12791 • Published Oct 16 • 4
Counting Ability of Large Language Models and Impact of Tokenization Paper • 2410.19730 • Published Oct 25 • 10
Analysing the Residual Stream of Language Models Under Knowledge Conflicts Paper • 2410.16090 • Published Oct 21 • 7
Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning Paper • 2410.19290 • Published Oct 25 • 10
On Memorization of Large Language Models in Logical Reasoning Paper • 2410.23123 • Published Oct 30 • 18
Toxicity of the Commons: Curating Open-Source Pre-Training Data Paper • 2410.22587 • Published Oct 29 • 10
Zero-Shot Dense Retrieval with Embeddings from Relevance Feedback Paper • 2410.21242 • Published Oct 28 • 8
RARe: Retrieval Augmented Retrieval with In-Context Examples Paper • 2410.20088 • Published Oct 26 • 5
LongReward: Improving Long-context Large Language Models with AI Feedback Paper • 2410.21252 • Published Oct 28 • 17
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters Paper • 2410.23168 • Published Oct 30 • 24
Constraint Back-translation Improves Complex Instruction Following of Large Language Models Paper • 2410.24175 • Published Oct 31 • 16
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective Paper • 2410.23743 • Published Oct 31 • 59
LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding Paper • 2411.01106 • Published Nov 2 • 4
Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models Paper • 2411.00743 • Published Nov 1 • 6
Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks? Paper • 2411.05000 • Published Nov 7 • 21
Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination Paper • 2411.03823 • Published Nov 6 • 43
DELIFT: Data Efficient Language model Instruction Fine Tuning Paper • 2411.04425 • Published Nov 7 • 9
The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities Paper • 2411.04986 • Published Nov 7 • 5
Large Language Models Can Self-Improve in Long-context Reasoning Paper • 2411.08147 • Published Nov 12 • 62
Can sparse autoencoders be used to decompose and interpret steering vectors? Paper • 2411.08790 • Published Nov 13 • 8
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework Paper • 2411.06176 • Published Nov 9 • 44
Drowning in Documents: Consequences of Scaling Reranker Inference Paper • 2411.11767 • Published Nov 18 • 17
Multimodal Autoregressive Pre-training of Large Vision Encoders Paper • 2411.14402 • Published about 1 month ago • 41
OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs Paper • 2411.14199 • Published about 1 month ago • 28
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models Paper • 2411.14257 • Published about 1 month ago • 9
Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS Paper • 2411.18478 • Published 25 days ago • 32
Star Attention: Efficient LLM Inference over Long Sequences Paper • 2411.17116 • Published 26 days ago • 47
Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-OASIS Paper • 2411.19655 • Published 23 days ago • 20
Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability Paper • 2411.19943 • Published 23 days ago • 55
Establishing Task Scaling Laws via Compute-Efficient Model Ladders Paper • 2412.04403 • Published 17 days ago • 2
Marco-LLM: Bridging Languages via Massive Multilingual Training for Cross-Lingual Enhancement Paper • 2412.04003 • Published 17 days ago • 9
Evaluating Language Models as Synthetic Data Generators Paper • 2412.03679 • Published 18 days ago • 43
If You Can't Use Them, Recycle Them: Optimizing Merging at Scale Mitigates Performance Tradeoffs Paper • 2412.04144 • Published 17 days ago • 3
KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models Paper • 2412.06071 • Published 13 days ago • 7
I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token Paper • 2412.06676 • Published 13 days ago • 7
Multimodal Latent Language Modeling with Next-Token Diffusion Paper • 2412.08635 • Published 11 days ago • 39
Smaller Language Models Are Better Instruction Evolvers Paper • 2412.11231 • Published 7 days ago • 23
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation Paper • 2412.11919 • Published 6 days ago • 33
No More Adam: Learning Rate Scaling at Initialization is All You Need Paper • 2412.11768 • Published 6 days ago • 36
Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers Paper • 2412.12276 • Published 6 days ago • 14
AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge Paper • 2412.13670 • Published 4 days ago • 4
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published 4 days ago • 90