Large Multi-modal Models Can Interpret Features in Large Multi-modal Models Paper • 2411.14982 • Published 25 days ago • 15
Multimodal-SAE Collection The collection of the sae that hooked on llava • 4 items • Updated 22 days ago • 3
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published 13 days ago • 116
view article Article LLaVA-o1: Let Vision Language Models Reason Step-by-Step By mikelabs • 29 days ago • 10
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions Paper • 2411.14405 • Published 26 days ago • 55
view article Article Introducing Observers: AI Observability with Hugging Face datasets through a lightweight SDK By davidberenstein1957 • 27 days ago • 34
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework Paper • 2411.06176 • Published Nov 9 • 44
BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions Paper • 2411.07461 • Published Nov 12 • 21
Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models Paper • 2411.05005 • Published Nov 7 • 13
GrounDiT: Grounding Diffusion Transformers via Noisy Patch Transplantation Paper • 2410.20474 • Published Oct 27 • 14
Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering Paper • 2410.15999 • Published Oct 21 • 19
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction Paper • 2410.17247 • Published Oct 22 • 45
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages Paper • 2410.16153 • Published Oct 21 • 43
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation Paper • 2410.13861 • Published Oct 17 • 52
MedMobile: A mobile-sized language model with expert-level clinical capabilities Paper • 2410.09019 • Published Oct 11 • 8
MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models Paper • 2410.13085 • Published Oct 16 • 20