ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities Paper • 2407.14482 • Published 3 days ago • 9
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference Paper • 2407.14057 • Published 3 days ago • 6
EVLM: An Efficient Vision-Language Model for Visual Understanding Paper • 2407.14177 • Published 3 days ago • 18
ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter Paper • 2407.11298 • Published 6 days ago • 3
Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion Paper • 2407.13759 • Published 4 days ago • 12
Understanding Reference Policies in Direct Preference Optimization Paper • 2407.13709 • Published 4 days ago • 11
Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study Paper • 2406.07057 • Published Jun 11 • 9
Scaling Retrieval-Based Language Models with a Trillion-Token Datastore Paper • 2407.12854 • Published 13 days ago • 26
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies Paper • 2407.13623 • Published 4 days ago • 37
AUITestAgent: Automatic Requirements Oriented GUI Function Testing Paper • 2407.09018 • Published 10 days ago • 5
GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression Paper • 2407.12077 • Published 6 days ago • 43
Splatfacto-W: A Nerfstudio Implementation of Gaussian Splatting for Unconstrained Photo Collections Paper • 2407.12306 • Published 5 days ago • 5
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control Paper • 2407.12781 • Published 5 days ago • 10
Audio Conditioning for Music Generation via Discrete Bottleneck Features Paper • 2407.12563 • Published 5 days ago • 5
The Art of Saying No: Contextual Noncompliance in Language Models Paper • 2407.12043 • Published 20 days ago • 4
Goldfish: Vision-Language Understanding of Arbitrarily Long Videos Paper • 2407.12679 • Published 5 days ago • 6
E5-V: Universal Embeddings with Multimodal Large Language Models Paper • 2407.12580 • Published 5 days ago • 31
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models Paper • 2407.12772 • Published 5 days ago • 26
Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models Paper • 2407.12327 • Published 5 days ago • 61
FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models Paper • 2407.11522 • Published 6 days ago • 8
YouTube-SL-25: A Large-Scale, Open-Domain Multilingual Sign Language Parallel Corpus Paper • 2407.11144 • Published 7 days ago • 7
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models Paper • 2407.11691 • Published 6 days ago • 11
OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces Paper • 2407.11895 • Published 6 days ago • 7
Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning Paper • 2407.10718 • Published 7 days ago • 11
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients Paper • 2407.11239 • Published 6 days ago • 5
DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation Paper • 2407.11394 • Published 6 days ago • 10
Animate3D: Animating Any 3D Model with Multi-view Video Diffusion Paper • 2407.11398 • Published 6 days ago • 7
Scaling Diffusion Transformers to 16 Billion Parameters Paper • 2407.11633 • Published 6 days ago • 21
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? Paper • 2407.11963 • Published 6 days ago • 37
Data-Juicer Sandbox: A Comprehensive Suite for Multimodal Data-Model Co-development Paper • 2407.11784 • Published 6 days ago • 4
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? Paper • 2407.10956 • Published 7 days ago • 5
Foundational Autoraters: Taming Large Language Models for Better Automatic Evaluation Paper • 2407.10817 • Published 7 days ago • 11
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training Paper • 2407.09121 • Published 10 days ago • 4
TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models Paper • 2407.09012 • Published 10 days ago • 8
Q-Sparse: All Large Language Models can be Fully Sparsely-Activated Paper • 2407.10969 • Published 7 days ago • 16
Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion Paper • 2407.10973 • Published 7 days ago • 9
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning Paper • 2407.07523 • Published 12 days ago • 4
LAB-Bench: Measuring Capabilities of Language Models for Biology Research Paper • 2407.10362 • Published 7 days ago • 4
Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models Paper • 2407.10285 • Published 8 days ago • 4
The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism Paper • 2407.10457 • Published 7 days ago • 19
Generalizable Implicit Motion Modeling for Video Frame Interpolation Paper • 2407.08680 • Published 11 days ago • 7
SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers Paper • 2407.09413 • Published 10 days ago • 9
Characterizing Prompt Compression Methods for Long Context Inference Paper • 2407.08892 • Published 10 days ago • 5
Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning Paper • 2406.02265 • Published Jun 4 • 5
MUSCLE: A Model Update Strategy for Compatible LLM Evolution Paper • 2407.09435 • Published 10 days ago • 18
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models Paper • 2407.09025 • Published 10 days ago • 106
LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages Paper • 2407.05975 • Published 14 days ago • 32