chethan62
's Collections
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Paper
•
2311.10093
•
Published
•
56
NeuroPrompts: An Adaptive Framework to Optimize Prompts for
Text-to-Image Generation
Paper
•
2311.12229
•
Published
•
26
Diffusion Model Alignment Using Direct Preference Optimization
Paper
•
2311.12908
•
Published
•
47
VMC: Video Motion Customization using Temporal Attention Adaption for
Text-to-Video Diffusion Models
Paper
•
2312.00845
•
Published
•
36
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators
Paper
•
2312.03793
•
Published
•
17
One-dimensional Adapter to Rule Them All: Concepts, Diffusion Models and
Erasing Applications
Paper
•
2312.16145
•
Published
•
8
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Paper
•
2312.16862
•
Published
•
30
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
Paper
•
2312.04461
•
Published
•
58
Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent
Diffusion Models for Virtual Try-All
Paper
•
2401.13795
•
Published
•
66
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic
Image Restoration In the Wild
Paper
•
2401.13627
•
Published
•
73
MambaByte: Token-free Selective State Space Model
Paper
•
2401.13660
•
Published
•
52
UNIMO-G: Unified Image Generation through Multimodal Conditional
Diffusion
Paper
•
2401.13388
•
Published
•
11
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper
•
2401.12945
•
Published
•
86
Multilingual and Fully Non-Autoregressive ASR with Large Language Model
Fusion: A Comprehensive Study
Paper
•
2401.12789
•
Published
•
7
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and
Generating with Multimodal LLMs
Paper
•
2401.11708
•
Published
•
30
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass
Diffusion Transformers
Paper
•
2401.11605
•
Published
•
22
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper
•
2401.10891
•
Published
•
60
DiffusionGPT: LLM-Driven Text-to-Image Generation System
Paper
•
2401.10061
•
Published
•
29
Vision Mamba: Efficient Visual Representation Learning with
Bidirectional State Space Model
Paper
•
2401.09417
•
Published
•
59
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and
DeepSpeed-Inference
Paper
•
2401.08671
•
Published
•
14
UFO: A UI-Focused Agent for Windows OS Interaction
Paper
•
2402.07939
•
Published
•
13
λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion
Models by Leveraging CLIP Latent Space
Paper
•
2402.05195
•
Published
•
18
FiT: Flexible Vision Transformer for Diffusion Model
Paper
•
2402.12376
•
Published
•
48
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K
Text-to-Image Generation
Paper
•
2403.04692
•
Published
•
39
Yi: Open Foundation Models by 01.AI
Paper
•
2403.04652
•
Published
•
62
StableDrag: Stable Dragging for Point-based Image Editing
Paper
•
2403.04437
•
Published
•
25
CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion
Paper
•
2403.05121
•
Published
•
22
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion
Distillation
Paper
•
2403.12015
•
Published
•
64
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image
Generation
Paper
•
2403.16990
•
Published
•
25
ViTAR: Vision Transformer with Any Resolution
Paper
•
2403.18361
•
Published
•
52
LLM Agent Operating System
Paper
•
2403.16971
•
Published
•
65
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Paper
•
2404.01197
•
Published
•
30
CosmicMan: A Text-to-Image Foundation Model for Humans
Paper
•
2404.01294
•
Published
•
15
On the Scalability of Diffusion-based Text-to-Image Generation
Paper
•
2404.02883
•
Published
•
17
Applying Guidance in a Limited Interval Improves Sample and Distribution
Quality in Diffusion Models
Paper
•
2404.07724
•
Published
•
13
Ferret-v2: An Improved Baseline for Referring and Grounding with Large
Language Models
Paper
•
2404.07973
•
Published
•
30
EdgeFusion: On-Device Text-to-Image Generation
Paper
•
2404.11925
•
Published
•
21
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation
Paper
•
2404.19427
•
Published
•
71
DressCode: Autoregressively Sewing and Generating Garments from Text
Guidance
Paper
•
2401.16465
•
Published
•
11
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and
Video Generation
Paper
•
2406.07686
•
Published
•
14
Wavelets Are All You Need for Autoregressive Image Generation
Paper
•
2406.19997
•
Published
•
29
InstantStyle-Plus: Style Transfer with Content-Preserving in
Text-to-Image Generation
Paper
•
2407.00788
•
Published
•
22
InternLM-XComposer-2.5: A Versatile Large Vision Language Model
Supporting Long-Contextual Input and Output
Paper
•
2407.03320
•
Published
•
93
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion
Paper
•
2407.01392
•
Published
•
39
OmniParser for Pure Vision Based GUI Agent
Paper
•
2408.00203
•
Published
•
24
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation
with Multimodal Generative Pretraining
Paper
•
2408.02657
•
Published
•
33
Eliminating Oversaturation and Artifacts of High Guidance Scales in
Diffusion Models
Paper
•
2410.02416
•
Published
•
26
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image
Generation
Paper
•
2410.08159
•
Published
•
25
Fluid: Scaling Autoregressive Text-to-image Generative Models with
Continuous Tokens
Paper
•
2410.13863
•
Published
•
36