FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion Model Paper • 2410.13925 • Published Oct 17 • 22
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images Paper • 2403.11703 • Published Mar 18 • 16
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model Paper • 2401.16420 • Published Jan 29 • 55
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities Paper • 2401.15071 • Published Jan 26 • 35
Recent models: last 100 repos, sorted by creation date Collection The last 100 repos I have created. Sorted by creation date descending, so the most recently created repos appear at the top. • 121 items • Updated Jan 31 • 506