iFormer: Integrating ConvNet and Transformer for Mobile Application Paper • 2501.15369 • Published 12 days ago • 10
Temporal Preference Optimization for Long-Form Video Understanding Paper • 2501.13919 • Published 14 days ago • 21
Fixing Imbalanced Attention to Mitigate In-Context Hallucination of Large Vision-Language Model Paper • 2501.12206 • Published 17 days ago • 4
ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing Paper • 2412.14711 • Published Dec 19, 2024 • 16
MapQaTor: A System for Efficient Annotation of Map Query Datasets Paper • 2412.21015 • Published Dec 30, 2024 • 10
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters Paper • 2410.23168 • Published Oct 30, 2024 • 24
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks Paper • 2410.20650 • Published Oct 28, 2024 • 16
HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments Paper • 2408.10945 • Published Aug 20, 2024 • 11
LLaVa-NeXT Collection LLaVa-NeXT (also known as LLaVa-1.6) improves upon the 1.5 series by incorporating higher image resolutions and more reasoning/OCR datasets. • 8 items • Updated Jul 19, 2024 • 29