Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling Paper • 2504.13169 • Published 5 days ago • 38
Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations Paper • 2504.13816 • Published 4 days ago • 14
EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models Paper • 2504.15133 • Published 1 day ago • 13
LeetCodeDataset: A Temporal Dataset for Robust Evaluation and Efficient Training of Code LLMs Paper • 2504.14655 • Published 2 days ago • 12
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention Paper • 2504.06261 • Published 14 days ago • 103
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility Paper • 2504.07086 • Published 13 days ago • 20
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill? Paper • 2504.06514 • Published 14 days ago • 39
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens Paper • 2504.07096 • Published 13 days ago • 73
MM-IFEngine: Towards Multimodal Instruction Following Paper • 2504.07957 • Published 12 days ago • 34
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning Paper • 2504.07128 • Published 21 days ago • 82
SAEs Can Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs Paper • 2504.08192 • Published 12 days ago • 4
Do PhD-level LLMs Truly Grasp Elementary Addition? Probing Rule Learning vs. Memorization in Large Language Models Paper • 2504.05262 • Published 15 days ago • 11
Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning Paper • 2504.08672 • Published 11 days ago • 53
How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients Paper • 2504.10766 • Published 8 days ago • 39
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness Paper • 2504.10514 • Published 12 days ago • 45
SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning Paper • 2504.09081 • Published 11 days ago • 16
Scaling Analysis of Interleaved Speech-Text Language Models Paper • 2504.02398 • Published 19 days ago • 27
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 385