Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models Paper • 2503.06749 • Published 6 days ago • 22
Reflecting Reality: Enabling Diffusion Models to Produce Faithful Mirror Reflections Paper • 2409.14677 • Published Sep 23, 2024 • 16
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22, 2024 • 126
BRAT: Bonus oRthogonAl Token for Architecture Agnostic Textual Inversion Paper • 2408.04785 • Published Aug 8, 2024 • 9
Perturbed Attention Guidance pipelines Collection Pipelines for Perturbed Attention Guidance with 🧨 library • 8 items • Updated Jun 26, 2024 • 6
Reproducibility Study of CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification Paper • 2405.11574 • Published May 19, 2024 • 1