DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search Paper • 2410.03864 • Published 5 days ago • 6
ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery Paper • 2410.05080 • Published 3 days ago • 18
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations Paper • 2410.02707 • Published 6 days ago • 38
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs Paper • 2410.04698 • Published 3 days ago • 12
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention Paper • 2410.05076 • Published 3 days ago • 4
Only-IF:Revealing the Decisive Effect of Instruction Diversity on Generalization Paper • 2410.04717 • Published 3 days ago • 14
Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise Paper • 2410.03017 • Published 6 days ago • 21
Addition is All You Need for Energy-efficient Language Models Paper • 2410.00907 • Published 8 days ago • 119
SciPrompt: Knowledge-augmented Prompting for Fine-grained Categorization of Scientific Topics Paper • 2410.01946 • Published 7 days ago • 4
MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation Paper • 2410.02458 • Published 7 days ago • 9
Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations Paper • 2410.02762 • Published 6 days ago • 8
L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding? Paper • 2410.02115 • Published 7 days ago • 10
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration Paper • 2410.02367 • Published 7 days ago • 44
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis Paper • 2410.02749 • Published 6 days ago • 12
Quantifying Generalization Complexity for Large Language Models Paper • 2410.01769 • Published 7 days ago • 12
Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis Paper • 2409.20059 • Published 10 days ago • 15
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging Paper • 2410.01215 • Published 8 days ago • 30
RATIONALYST: Pre-training Process-Supervision for Improving Reasoning Paper • 2410.01044 • Published 8 days ago • 34
Law of the Weakest Link: Cross Capabilities of Large Language Models Paper • 2409.19951 • Published 10 days ago • 51
Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code Paper • 2409.19715 • Published 11 days ago • 8
Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models Paper • 2409.18943 • Published 12 days ago • 26
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning Paper • 2409.20566 • Published 9 days ago • 47
LML: Language Model Learning a Dataset for Data-Augmented Prediction Paper • 2409.18957 • Published 12 days ago • 9
Modulated Intervention Preference Optimization (MIPO): Keep the Easy, Refine the Difficult Paper • 2409.17545 • Published 14 days ago • 16
MinerU: An Open-Source Solution for Precise Document Content Extraction Paper • 2409.18839 • Published 13 days ago • 24
VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models Paper • 2409.17066 • Published 14 days ago • 25
Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling Paper • 2409.14683 • Published 17 days ago • 8
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction Paper • 2409.17422 • Published 14 days ago • 23
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models Paper • 2409.17481 • Published 14 days ago • 46
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions Paper • 2409.18042 • Published 13 days ago • 35
A Case Study of Web App Coding with OpenAI Reasoning Models Paper • 2409.13773 • Published 21 days ago • 4
An adapted large language model facilitates multiple medical tasks in diabetes care Paper • 2409.13191 • Published 20 days ago • 6
Style over Substance: Failure Modes of LLM Judges in Alignment Benchmarking Paper • 2409.15268 • Published 16 days ago • 11
Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining for Clinical LLMs Paper • 2409.14988 • Published 17 days ago • 21
Phantom of Latent for Large Language and Vision Models Paper • 2409.14713 • Published 17 days ago • 27
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor? Paper • 2409.15277 • Published 16 days ago • 34
RACER: Rich Language-Guided Failure Recovery Policies for Imitation Learning Paper • 2409.14674 • Published 17 days ago • 41
LLM-Agent-UMF: LLM-based Agent Unified Modeling Framework for Seamless Integration of Multi Active/Passive Core-Agents Paper • 2409.11393 • Published 22 days ago • 2
Minstrel: Structural Prompt Generation with Multi-Agents Coordination for Non-AI Experts Paper • 2409.13449 • Published 20 days ago • 7
Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation Paper • 2409.12941 • Published 20 days ago • 20
MURI: High-Quality Instruction Tuning Datasets for Low-Resource Languages via Reverse Instructions Paper • 2409.12958 • Published 20 days ago • 6
Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization Paper • 2409.12903 • Published 20 days ago • 21
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution Paper • 2409.12961 • Published 20 days ago • 23
B4: Towards Optimal Assessment of Plausible Code Solutions with Plausible Tests Paper • 2409.08692 • Published 27 days ago • 25
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines Paper • 2409.12959 • Published 20 days ago • 35