MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale Paper • 2412.05237 • Published 28 days ago • 46
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation Paper • 2412.00927 • Published Dec 1, 2024 • 26
Data Engineering for Scaling Language Models to 128K Context Paper • 2402.10171 • Published Feb 15, 2024 • 23
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark Paper • 2410.03051 • Published Oct 4, 2024 • 4
Aria: An Open Multimodal Native Mixture-of-Experts Model Paper • 2410.05993 • Published Oct 8, 2024 • 107
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding Paper • 2407.15754 • Published Jul 22, 2024 • 20
CMC-Bench: Towards a New Paradigm of Visual Signal Compression Paper • 2406.09356 • Published Jun 13, 2024 • 4
Visual Evaluation Benchmarks! Collection Q-Bench (ICLR24' Spotlight), Q-Bench-Pair (TPAMI), and A-Bench in HuggingFace Format. Support auto-load as `dataset = load_dataset("q-future/**-HF")` • 3 items • Updated Aug 27, 2024 • 1
A-Bench: Are LMMs Masters at Evaluating AI-generated Images? Paper • 2406.03070 • Published Jun 5, 2024 • 2
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series Paper • 2405.19327 • Published May 29, 2024 • 46
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model Paper • 2405.04434 • Published May 7, 2024 • 14
Idefics2 🐶 Collection Idefics2-8B is a foundation vision-language model. In this collection, you will find the models, datasets and demo related to its creation. • 11 items • Updated May 6, 2024 • 91
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models Paper • 2310.14566 • Published Oct 23, 2023 • 25
A Benchmark for Multi-modal Foundation Models on Low-level Vision: from Single Images to Pairs Paper • 2402.07116 • Published Feb 11, 2024 • 2
Open LLM Leaderboard best models ❤️🔥 Collection A daily uploaded list of models with best evaluations on the LLM leaderboard: • 64 items • Updated about 1 hour ago • 493
Visual Scorers! Collection Variants of Visual Evaluation Models proposed by [Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-defined Levels]. Use by `model.score()`! • 10 items • Updated Dec 2, 2024 • 2