Collections
Discover the best community collections!
Collections including paper arxiv:2312.11556
-
SVGDreamer: Text Guided SVG Generation with Diffusion Model
Paper • 2312.16476 • Published • 1 -
DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models
Paper • 2306.14685 • Published • 1 -
Beyond Pixels: Exploring Human-Readable SVG Generation for Simple Images with Vision Language Models
Paper • 2311.15543 • Published -
StarVector: Generating Scalable Vector Graphics Code from Images
Paper • 2312.11556 • Published • 27
-
aMUSEd: An Open MUSE Reproduction
Paper • 2401.01808 • Published • 28 -
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
Paper • 2401.01885 • Published • 27 -
SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity
Paper • 2401.00604 • Published • 4 -
LARP: Language-Agent Role Play for Open-World Games
Paper • 2312.17653 • Published • 31
-
Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding
Paper • 2306.06094 • Published • 1 -
IconShop: Text-Guided Vector Icon Synthesis with Autoregressive Transformers
Paper • 2304.14400 • Published • 4 -
VecFusion: Vector Font Generation with Diffusion
Paper • 2312.10540 • Published • 21 -
StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis
Paper • 2401.17093 • Published • 19
-
Generative Multimodal Models are In-Context Learners
Paper • 2312.13286 • Published • 34 -
StarVector: Generating Scalable Vector Graphics Code from Images
Paper • 2312.11556 • Published • 27 -
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Paper • 2311.08046 • Published • 1 -
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
Paper • 2312.14233 • Published • 16
-
StarVector: Generating Scalable Vector Graphics Code from Images
Paper • 2312.11556 • Published • 27 -
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model
Paper • 2312.12423 • Published • 12 -
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing
Paper • 2312.11392 • Published • 19 -
stabilityai/stable-video-diffusion-img2vid-xt
Image-to-Video • Updated • 217k • 2.77k
-
DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models
Paper • 2312.09767 • Published • 25 -
Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models
Paper • 2312.09608 • Published • 13 -
StarVector: Generating Scalable Vector Graphics Code from Images
Paper • 2312.11556 • Published • 27
-
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper • 2310.16045 • Published • 14 -
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Paper • 2310.14566 • Published • 25 -
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 8 -
Conditional Diffusion Distillation
Paper • 2310.01407 • Published • 20
-
OmnimatteRF: Robust Omnimatte with 3D Background Modeling
Paper • 2309.07749 • Published • 7 -
AudioSR: Versatile Audio Super-resolution at Scale
Paper • 2309.07314 • Published • 25 -
Generative Image Dynamics
Paper • 2309.07906 • Published • 53 -
MagiCapture: High-Resolution Multi-Concept Portrait Customization
Paper • 2309.06895 • Published • 27