Number it: Temporal Grounding Videos like Flipping Manga Paper • 2411.10332 • Published 9 days ago • 12
Evaluating the role of `Constitutions' for learning from AI feedback Paper • 2411.10168 • Published 10 days ago • 5
SEAGULL: No-reference Image Quality Assessment for Regions of Interest via Vision-Language Instruction Tuning Paper • 2411.10161 • Published 10 days ago • 6
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities Paper • 2410.02155 • Published Oct 3 • 2
The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities Paper • 2411.04986 • Published 17 days ago • 5
DELIFT: Data Efficient Language model Instruction Fine Tuning Paper • 2411.04425 • Published 18 days ago • 9
Balancing Pipeline Parallelism with Vocabulary Parallelism Paper • 2411.05288 • Published 17 days ago • 19
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework Paper • 2411.06176 • Published 16 days ago • 44
Direct Preference Optimization Using Sparse Feature-Level Constraints Paper • 2411.07618 • Published 13 days ago • 15
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models Paper • 2411.09595 • Published 10 days ago • 66
Drowning in Documents: Consequences of Scaling Reranker Inference Paper • 2411.11767 • Published 6 days ago • 16
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering Paper • 2411.11504 • Published 7 days ago • 18
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements Paper • 2411.12044 • Published 6 days ago • 13
LLM Pruning and Distillation in Practice: The Minitron Approach Paper • 2408.11796 • Published Aug 21 • 55
HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments Paper • 2408.10945 • Published Aug 20 • 8