Sana Collection ⚡️Sana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer • 19 items • Updated 10 days ago • 84
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents Paper • 2410.10594 • Published Oct 14, 2024 • 24
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning Paper • 2409.20566 • Published Sep 30, 2024 • 55
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling Paper • 2409.19291 • Published Sep 28, 2024 • 19
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines Paper • 2409.12959 • Published Sep 19, 2024 • 37
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models Paper • 2404.18796 • Published Apr 29, 2024 • 69
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data Paper • 2404.15653 • Published Apr 24, 2024 • 27
RAFT: Adapting Language Model to Domain Specific RAG Paper • 2403.10131 • Published Mar 15, 2024 • 68