-
lmms-lab/llama3-llava-next-8b-hf-sae-131k
Updated • 19 • 1 -
lmms-lab/sae-sample-cache-dataset
Viewer • Updated • 46.7k • 649 -
lmms-lab/llava-sae-explanations-5k
Viewer • Updated • 9.8k • 525 • 3 -
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models
Paper • 2411.14982 • Published • 16
LMMs-Lab
AI & ML interests
Feeling and building the multimodal intelligence.
Recent Activity
[2024-11] 🔔🔔 We are excited to introduce LMMs-Eval/v0.3.0, focusing on audio understanding. Building upon LMMs-Eval/v0.2.0, we have added audio models and tasks. Now, LMMs-Eval provides a consistent evaluation toolkit across image, video, and audio modalities.
[2024-11] 🤯🤯 We introduce Multimodal SAE, the first framework designed to interpret learned features in large-scale multimodal models using Sparse Autoencoders. Through our approach, we leverage LLaVA-OneVision-72B to analyze and explain the SAE-derived features of LLaVA-NeXT-LLaMA3-8B. Furthermore, we demonstrate the ability to steer model behavior by clamping specific features to alleviate hallucinations and avoid safety-related issues.
[2024-10] 🔥🔥 We present
LLaVA-Critic
, the first open-source large multimodal model as a generalist evaluator for assessing LMM-generated responses across diverse multimodal tasks and scenarios.[2024-10] 🎬🎬 Introducing
LLaVA-Video
, a family of open large multimodal models designed specifically for advanced video understanding. We're open-sourcing LLaVA-Video-178K, a high-quality, synthetic dataset for video instruction tuning.[2024-08] 🤞🤞 We present
LLaVA-OneVision
, a family of LMMs developed by consolidating insights into data, models, and visual representations.[2024-06] 🧑🎨🧑🎨 We release
LLaVA-NeXT-Interleave
, an LMM extending capabilities to real-world settings: Multi-image, Multi-frame (videos), Multi-view (3D), and Multi-patch (single-image).[2024-06] 🚀🚀 We release
LongVA
, a long language model with state-of-the-art video understanding performance.
Older Updates (2024-06 and earlier)
[2024-06] 🎬🎬 The
lmms-eval/v0.2
toolkit now supports video evaluations for models like LLaVA-NeXT Video and Gemini 1.5 Pro.[2024-05] 🚀🚀 We release
LLaVA-NeXT Video
, a model performing at Google's Gemini level on video understanding tasks.[2024-05] 🚀🚀 The
LLaVA-NeXT
model family reaches near GPT-4V performance on multimodal benchmarks, with models up to 110B parameters.[2024-03] We release
lmms-eval
, a toolkit for holistic evaluations with 50+ multimodal datasets and 10+ models.