Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
akhaliq 
posted an update Jan 30
Post
InternLM-XComposer2

Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model

paper page: InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model (2401.16420)

Experimental results demonstrate the superiority of InternLM-XComposer2 based on InternLM2-7B in producing high-quality long-text multi-modal content and its exceptional vision-language understanding performance across various benchmarks, where it not only significantly outperforms existing multimodal models but also matches or even surpasses GPT-4V and Gemini Pro in certain assessments.
In this post