-
Visual Instruction Tuning
Paper • 2304.08485 • Published • 8 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 40 -
Improved Baselines with Visual Instruction Tuning
Paper • 2310.03744 • Published • 32 -
Aligning Large Multimodal Models with Factually Augmented RLHF
Paper • 2309.14525 • Published • 29
Collections
Discover the best community collections!
Collections including paper arxiv:2401.01952
-
Boundary Attention: Learning to Find Faint Boundaries at Any Resolution
Paper • 2401.00935 • Published • 16 -
Taming Mode Collapse in Score Distillation for Text-to-3D Generation
Paper • 2401.00909 • Published • 8 -
Q-Refine: A Perceptual Quality Refiner for AI-Generated Image
Paper • 2401.01117 • Published • 6 -
En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data
Paper • 2401.01173 • Published • 10
-
Instruct-Imagen: Image Generation with Multi-modal Instruction
Paper • 2401.01952 • Published • 29 -
ODIN: A Single Model for 2D and 3D Perception
Paper • 2401.02416 • Published • 10 -
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models
Paper • 2404.01367 • Published • 19 -
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models
Paper • 2404.02747 • Published • 11
-
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 24 -
Instruct-Imagen: Image Generation with Multi-modal Instruction
Paper • 2401.01952 • Published • 29 -
mistralai/Mixtral-8x7B-Instruct-v0.1
Text Generation • Updated • 494k • 3.86k -
Gemma: Open Models Based on Gemini Research and Technology
Paper • 2403.08295 • Published • 43
-
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation
Paper • 2312.12491 • Published • 66 -
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
Paper • 2401.11708 • Published • 27 -
Training-Free Consistent Text-to-Image Generation
Paper • 2402.03286 • Published • 62 -
PALP: Prompt Aligned Personalization of Text-to-Image Models
Paper • 2401.06105 • Published • 46