Submitted by yuchenlin 22 The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism · 4 authors 4
Submitted by shumingma 20 Q-Sparse: All Large Language Models can be Fully Sparsely-Activated · 4 authors 3
Submitted by tuvu 12 Foundational Autoraters: Taming Large Language Models for Better Automatic Evaluation · 6 authors 7
Submitted by cheryyunl 9 Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion · 6 authors 2
Submitted by akhaliq 6 Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity · 4 authors 2
Submitted by rshcao 5 Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? · 23 authors 2
Submitted by davanstrien 4 MMM: Multilingual Mutual Reinforcement Effect Mix Datasets & Test with Open-domain Information Extraction Large Language Models · 11 authors 2
Submitted by akhaliq 4 LAB-Bench: Measuring Capabilities of Language Models for Biology Research · 9 authors 2
Submitted by akhaliq 4 Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models · 7 authors 2
Submitted by Paranioar 4 SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning · 7 authors 2