DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging Paper • 2407.01470 • Published 2 days ago • 4
Show Less, Instruct More: Enriching Prompts with Definitions and Guidelines for Zero-Shot NER Paper • 2407.01272 • Published 2 days ago • 6
Chain-of-Knowledge: Integrating Knowledge Reasoning into Large Language Models by Learning from Knowledge Graphs Paper • 2407.00653 • Published 3 days ago • 8
Is It Really Long Context if All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP Paper • 2407.00402 • Published 4 days ago • 18
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation Paper • 2407.00468 • Published 4 days ago • 33
AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models Paper • 2406.10900 • Published 17 days ago • 11
LiveBench: A Challenging, Contamination-Free LLM Benchmark Paper • 2406.19314 • Published 6 days ago • 12
Aligning Teacher with Student Preferences for Tailored Training Data Generation Paper • 2406.19227 • Published 6 days ago • 22
Simulating Classroom Education with LLM-Empowered Agents Paper • 2406.19226 • Published 6 days ago • 27
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs Paper • 2406.18629 • Published 7 days ago • 36
WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs Paper • 2406.18495 • Published 7 days ago • 11
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs Paper • 2406.18521 • Published 7 days ago • 25
EHRCon: Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records Paper • 2406.16341 • Published 10 days ago • 11
Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models Paper • 2406.15718 • Published 12 days ago • 14
Can Few-shot Work in Long-Context? Recycling the Context to Generate Demonstrations Paper • 2406.13632 • Published 14 days ago • 5
Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization Paper • 2406.16008 • Published 11 days ago • 6
Preference Tuning For Toxicity Mitigation Generalizes Across Languages Paper • 2406.16235 • Published 10 days ago • 12
Evaluating D-MERIT of Partial-annotation on Information Retrieval Paper • 2406.16048 • Published 10 days ago • 34
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation Paper • 2406.16855 • Published 9 days ago • 53
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models Paper • 2406.16338 • Published 10 days ago • 22
ICAL: Continual Learning of Multimodal Agents by Transforming Trajectories into Actionable Insights Paper • 2406.14596 • Published 13 days ago • 4
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? Paper • 2406.13121 • Published 15 days ago • 1
Cognitive Map for Language Models: Optimal Planning via Verbally Representing the World Model Paper • 2406.15275 • Published 12 days ago • 10
Two Giraffes in a Dirt Field: Using Game Play to Investigate Situation Modelling in Large Multimodal Models Paper • 2406.14035 • Published 13 days ago • 10
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs Paper • 2406.15319 • Published 12 days ago • 55
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges Paper • 2406.12624 • Published 15 days ago • 34
Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level Paper • 2406.11817 • Published 16 days ago • 13
PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents Paper • 2406.13923 • Published 14 days ago • 20
Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities Paper • 2406.14562 • Published 13 days ago • 26
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs Paper • 2406.14544 • Published 13 days ago • 33
Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks Paper • 2406.12066 • Published 16 days ago • 7
Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning Paper • 2406.12742 • Published 15 days ago • 14
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content Paper • 2406.11811 • Published 16 days ago • 14
From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries Paper • 2406.12824 • Published 15 days ago • 20
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models Paper • 2406.11230 • Published 17 days ago • 34
Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks Paper • 2404.00376 • Published Mar 30 • 3
Biomedical NLP papers Collection Papers posted on @ArxivHealthcareNLP@sigmoid.social (Clinical, Healthcare & Biomedical NLP) • 126 items • Updated 2 days ago • 28
OLAPH: Improving Factuality in Biomedical Long-form Question Answering Paper • 2405.12701 • Published May 21 • 1
Gecko: Versatile Text Embeddings Distilled from Large Language Models Paper • 2403.20327 • Published Mar 29 • 47
view article Article The Open Medical-LLM Leaderboard: Benchmarking Large Language Models in Healthcare Apr 19 • 79