chethan62
's Collections
papers
updated
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Paper
•
2311.10093
•
Published
•
54
NeuroPrompts: An Adaptive Framework to Optimize Prompts for
Text-to-Image Generation
Paper
•
2311.12229
•
Published
•
26
Diffusion Model Alignment Using Direct Preference Optimization
Paper
•
2311.12908
•
Published
•
46
VMC: Video Motion Customization using Temporal Attention Adaption for
Text-to-Video Diffusion Models
Paper
•
2312.00845
•
Published
•
36
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators
Paper
•
2312.03793
•
Published
•
17
One-dimensional Adapter to Rule Them All: Concepts, Diffusion Models and
Erasing Applications
Paper
•
2312.16145
•
Published
•
8
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Paper
•
2312.16862
•
Published
•
28
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
Paper
•
2312.04461
•
Published
•
48
Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent
Diffusion Models for Virtual Try-All
Paper
•
2401.13795
•
Published
•
64
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic
Image Restoration In the Wild
Paper
•
2401.13627
•
Published
•
69
MambaByte: Token-free Selective State Space Model
Paper
•
2401.13660
•
Published
•
47
UNIMO-G: Unified Image Generation through Multimodal Conditional
Diffusion
Paper
•
2401.13388
•
Published
•
9
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper
•
2401.12945
•
Published
•
82
Multilingual and Fully Non-Autoregressive ASR with Large Language Model
Fusion: A Comprehensive Study
Paper
•
2401.12789
•
Published
•
5
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and
Generating with Multimodal LLMs
Paper
•
2401.11708
•
Published
•
27
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass
Diffusion Transformers
Paper
•
2401.11605
•
Published
•
19
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper
•
2401.10891
•
Published
•
53
DiffusionGPT: LLM-Driven Text-to-Image Generation System
Paper
•
2401.10061
•
Published
•
26
Vision Mamba: Efficient Visual Representation Learning with
Bidirectional State Space Model
Paper
•
2401.09417
•
Published
•
51
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and
DeepSpeed-Inference
Paper
•
2401.08671
•
Published
•
12
UFO: A UI-Focused Agent for Windows OS Interaction
Paper
•
2402.07939
•
Published
•
13
λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion
Models by Leveraging CLIP Latent Space
Paper
•
2402.05195
•
Published
•
16
FiT: Flexible Vision Transformer for Diffusion Model
Paper
•
2402.12376
•
Published
•
46
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K
Text-to-Image Generation
Paper
•
2403.04692
•
Published
•
35
Yi: Open Foundation Models by 01.AI
Paper
•
2403.04652
•
Published
•
58
StableDrag: Stable Dragging for Point-based Image Editing
Paper
•
2403.04437
•
Published
•
23
CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion
Paper
•
2403.05121
•
Published
•
16
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion
Distillation
Paper
•
2403.12015
•
Published
•
60
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image
Generation
Paper
•
2403.16990
•
Published
•
24
ViTAR: Vision Transformer with Any Resolution
Paper
•
2403.18361
•
Published
•
48
LLM Agent Operating System
Paper
•
2403.16971
•
Published
•
62
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Paper
•
2404.01197
•
Published
•
29
CosmicMan: A Text-to-Image Foundation Model for Humans
Paper
•
2404.01294
•
Published
•
15
On the Scalability of Diffusion-based Text-to-Image Generation
Paper
•
2404.02883
•
Published
•
17
Applying Guidance in a Limited Interval Improves Sample and Distribution
Quality in Diffusion Models
Paper
•
2404.07724
•
Published
•
10
Ferret-v2: An Improved Baseline for Referring and Grounding with Large
Language Models
Paper
•
2404.07973
•
Published
•
28
EdgeFusion: On-Device Text-to-Image Generation
Paper
•
2404.11925
•
Published
•
19
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation
Paper
•
2404.19427
•
Published
•
64
DressCode: Autoregressively Sewing and Generating Garments from Text
Guidance
Paper
•
2401.16465
•
Published
•
7