Submitted by akhaliq 27 NNsight and NDIF: Democratizing Access to Foundation Model Internals · 20 authors 1
Submitted by Ningyu 24 Knowledge Mechanisms in Large Language Models: A Survey and Perspective · 13 authors 1
Submitted by akhaliq 18 SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models · 8 authors 3
Submitted by grafft 13 POGEMA: A Benchmark Platform for Cooperative Multi-Agent Navigation · 6 authors 1
Submitted by teowu 12 LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding · 4 authors 3
Submitted by yulunliu 11 BoostMVSNeRFs: Boosting MVS-based NeRFs to Generalizable View Synthesis in Large-scale Scenes · 6 authors 1
Submitted by akhaliq 6 HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions · 5 authors 1
Submitted by xhyandwyy 5 MIBench: Evaluating Multimodal Large Language Models over Multiple Images · 11 authors 1
Submitted by Ori 5 AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks? · 6 authors 1
Submitted by akhaliq 5 Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models · 7 authors 1
Submitted by akhaliq 5 Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning · 20 authors 1
Submitted by akhaliq 4 MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation · 4 authors 1
Submitted by akhaliq 3 Artist: Aesthetically Controllable Text-Driven Stylization without Training · 2 authors 3
Submitted by akhaliq 3 GET-Zero: Graph Embodiment Transformer for Zero-shot Embodiment Generalization · 2 authors 1
Submitted by davidchan 2 Visual Haystacks: Answering Harder Questions About Sets of Images · 7 authors 1
Submitted by liuhuohuo 1 CGB-DM: Content and Graphic Balance Layout Generation with Transformer-based Diffusion Model · 5 authors 1