ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities Paper • 2407.14482 • Published 3 days ago • 9
Shape of Motion: 4D Reconstruction from a Single Video Paper • 2407.13764 • Published 4 days ago • 14
Understanding Reference Policies in Direct Preference Optimization Paper • 2407.13709 • Published 4 days ago • 11
Benchmark Agreement Testing Done Right: A Guide for LLM Benchmark Evaluation Paper • 2407.13696 • Published 4 days ago • 2
Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion Paper • 2407.13759 • Published 4 days ago • 12
Case2Code: Learning Inductive Reasoning with Synthetic Data Paper • 2407.12504 • Published 5 days ago • 6
E5-V: Universal Embeddings with Multimodal Large Language Models Paper • 2407.12580 • Published 5 days ago • 31
The Art of Saying No: Contextual Noncompliance in Language Models Paper • 2407.12043 • Published 20 days ago • 4
Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models Paper • 2407.12327 • Published 5 days ago • 61
Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning Paper • 2407.10718 • Published 7 days ago • 11
Click-Gaussian: Interactive Segmentation to Any 3D Gaussians Paper • 2407.11793 • Published 6 days ago • 3
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients Paper • 2407.11239 • Published 6 days ago • 5
Scaling Diffusion Transformers to 16 Billion Parameters Paper • 2407.11633 • Published 6 days ago • 21
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? Paper • 2407.10956 • Published 7 days ago • 5
LLM Circuit Analyses Are Consistent Across Training and Scale Paper • 2407.10827 • Published 7 days ago • 4
Foundational Autoraters: Taming Large Language Models for Better Automatic Evaluation Paper • 2407.10817 • Published 7 days ago • 11
Q-Sparse: All Large Language Models can be Fully Sparsely-Activated Paper • 2407.10969 • Published 7 days ago • 16
The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism Paper • 2407.10457 • Published 7 days ago • 19
Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data Paper • 2407.08726 • Published 11 days ago • 8
Generalizable Implicit Motion Modeling for Video Frame Interpolation Paper • 2407.08680 • Published 11 days ago • 7
Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On Paper • 2407.08348 • Published 11 days ago • 46
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients Paper • 2407.08296 • Published 11 days ago • 28
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models Paper • 2407.07895 • Published 12 days ago • 34
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence Paper • 2407.07061 • Published 13 days ago • 23
InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with Inverse-Instruct Paper • 2407.05700 • Published 14 days ago • 8
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation Paper • 2407.06135 • Published 14 days ago • 19
Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models Paper • 2407.01906 • Published 20 days ago • 33
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion Paper • 2407.01392 • Published 21 days ago • 39
Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages Paper • 2407.03321 • Published 19 days ago • 14
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output Paper • 2407.03320 • Published 19 days ago • 87
To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models Paper • 2407.01920 • Published 20 days ago • 13
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds Paper • 2407.01494 • Published 21 days ago • 10
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation Paper • 2407.02371 • Published 20 days ago • 47
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention Paper • 2407.02490 • Published 20 days ago • 23
Revealing Fine-Grained Values and Opinions in Large Language Models Paper • 2406.19238 • Published 25 days ago • 13
Agentless: Demystifying LLM-based Software Engineering Agents Paper • 2407.01489 • Published 21 days ago • 41
T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge Paper • 2407.00088 • Published 27 days ago • 7
Wavelets Are All You Need for Autoregressive Image Generation Paper • 2406.19997 • Published 24 days ago • 27
Chain-of-Knowledge: Integrating Knowledge Reasoning into Large Language Models by Learning from Knowledge Graphs Paper • 2407.00653 • Published 22 days ago • 11
OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents Paper • 2407.00114 • Published 25 days ago • 12
Is It Really Long Context if All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP Paper • 2407.00402 • Published 23 days ago • 22
InstantStyle-Plus: Style Transfer with Content-Preserving in Text-to-Image Generation Paper • 2407.00788 • Published 22 days ago • 20
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS Paper • 2406.18009 • Published 26 days ago • 18
ColPali: Efficient Document Retrieval with Vision Language Models Paper • 2407.01449 • Published 25 days ago • 29
Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning Paper • 2407.00782 • Published 22 days ago • 21
RegMix: Data Mixture as Regression for Language Model Pre-training Paper • 2407.01492 • Published 21 days ago • 30