How Many Parameters Does it Take to Change a Light Bulb? Evaluating Performance in Self-Play of Conversational Games as a Function of Model Characteristics Paper • 2406.14051 • Published Jun 20, 2024 • 9
Preference Tuning For Toxicity Mitigation Generalizes Across Languages Paper • 2406.16235 • Published Jun 23, 2024 • 11
AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models Paper • 2406.16714 • Published Jun 24, 2024 • 10
Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models Paper • 2406.15718 • Published Jun 22, 2024 • 14
Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs Paper • 2406.15927 • Published Jun 22, 2024 • 13
Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters Paper • 2406.16758 • Published Jun 24, 2024 • 20
WARP: On the Benefits of Weight Averaged Rewarded Policies Paper • 2406.16768 • Published Jun 24, 2024 • 23
Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers Paper • 2406.16747 • Published Jun 24, 2024 • 19
Efficient Continual Pre-training by Mitigating the Stability Gap Paper • 2406.14833 • Published Jun 21, 2024 • 20
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models Paper • 2406.16338 • Published Jun 24, 2024 • 26
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs Paper • 2406.16860 • Published Jun 24, 2024 • 60
Evaluating D-MERIT of Partial-annotation on Information Retrieval Paper • 2406.16048 • Published Jun 23, 2024 • 35
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions Paper • 2406.15877 • Published Jun 22, 2024 • 46
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation Paper • 2406.16855 • Published Jun 24, 2024 • 55
YouDream: Generating Anatomically Controllable Consistent Text-to-3D Animals Paper • 2406.16273 • Published Jun 24, 2024 • 41