Data Engineering for Scaling Language Models to 128K Context Paper • 2402.10171 • Published Feb 15 • 23
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark Paper • 2410.03051 • Published Oct 4 • 4
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding Paper • 2407.15754 • Published Jul 22 • 19
CMC-Bench: Towards a New Paradigm of Visual Signal Compression Paper • 2406.09356 • Published Jun 13 • 4
Visual Evaluation Benchmarks! Collection Q-Bench (ICLR24' Spotlight), Q-Bench-Pair (TPAMI), and A-Bench in HuggingFace Format. Support auto-load as `dataset = load_dataset("q-future/**-HF")` • 3 items • Updated Aug 27 • 1
A-Bench: Are LMMs Masters at Evaluating AI-generated Images? Paper • 2406.03070 • Published Jun 5 • 2
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series Paper • 2405.19327 • Published May 29 • 46
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model Paper • 2405.04434 • Published May 7 • 13
Idefics2 🐶 Collection Idefics2-8B is a foundation vision-language model. In this collection, you will find the models, datasets and demo related to its creation. • 11 items • Updated May 6 • 89
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models Paper • 2310.14566 • Published Oct 23, 2023 • 25
A Benchmark for Multi-modal Foundation Models on Low-level Vision: from Single Images to Pairs Paper • 2402.07116 • Published Feb 11 • 2
Open LLM Leaderboard best models ❤️🔥 Collection A daily uploaded list of models with best evaluations on the LLM leaderboard: • 60 items • Updated about 10 hours ago • 448
Visual Scorers! Collection Variants of Visual Evaluation Models proposed by [Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-defined Levels]. Use by `model.score()`! • 8 items • Updated Jun 14 • 2
Low-level Visual Assistants! Collection Multi-purpose Assistant for Low-level Visual Perception, from [Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models] • 4 items • Updated Jun 14 • 1
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model Paper • 2401.16420 • Published Jan 29 • 55