InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper • 2412.09596 • Published 9 days ago • 89
Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline Paper • 2411.12814 • Published Nov 19 • 21
SegBook: A Simple Baseline and Cookbook for Volumetric Medical Image Segmentation Paper • 2411.14525 • Published about 1 month ago • 19
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI Paper • 2411.14522 • Published about 1 month ago • 31
MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs Paper • 2411.15296 • Published 29 days ago • 19
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models Paper • 2410.17637 • Published Oct 23 • 34
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction Paper • 2410.17247 • Published Oct 22 • 45
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation Paper • 2410.13861 • Published Oct 17 • 52
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution Paper • 2410.16256 • Published Oct 21 • 58
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree Paper • 2410.16268 • Published Oct 21 • 65
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs Paper • 2410.12405 • Published Oct 16 • 13
POINTS: Improving Your Vision-language Model with Affordable Strategies Paper • 2409.04828 • Published Sep 7 • 22
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI Paper • 2408.03361 • Published Aug 6 • 85
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models Paper • 2407.11691 • Published Jul 16 • 13
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output Paper • 2407.03320 • Published Jul 3 • 93
InternVL2.0 Collection Expanding Performance Boundaries of Open-Source MLLM • 15 items • Updated about 17 hours ago • 86
MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning Paper • 2406.17770 • Published Jun 25 • 18