papers - a chethan62 Collection

chethan62 's Collections

spaces

webgpu

papers

models

papers

updated Oct 20

The Chosen One: Consistent Characters in Text-to-Image Diffusion Models

Paper • 2311.10093 • Published Nov 16, 2023 • 56
NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation

Paper • 2311.12229 • Published Nov 20, 2023 • 26
Diffusion Model Alignment Using Direct Preference Optimization

Paper • 2311.12908 • Published Nov 21, 2023 • 47
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models

Paper • 2312.00845 • Published Dec 1, 2023 • 36
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators

Paper • 2312.03793 • Published Dec 6, 2023 • 17
One-dimensional Adapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications

Paper • 2312.16145 • Published Dec 26, 2023 • 8
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones

Paper • 2312.16862 • Published Dec 28, 2023 • 30
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Paper • 2312.04461 • Published Dec 7, 2023 • 58
Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All

Paper • 2401.13795 • Published Jan 24 • 66
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild

Paper • 2401.13627 • Published Jan 24 • 73
MambaByte: Token-free Selective State Space Model

Paper • 2401.13660 • Published Jan 24 • 52
UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion

Paper • 2401.13388 • Published Jan 24 • 11
Lumiere: A Space-Time Diffusion Model for Video Generation

Paper • 2401.12945 • Published Jan 23 • 86
Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study

Paper • 2401.12789 • Published Jan 23 • 7
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs

Paper • 2401.11708 • Published Jan 22 • 30
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers

Paper • 2401.11605 • Published Jan 21 • 22
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Paper • 2401.10891 • Published Jan 19 • 60
DiffusionGPT: LLM-Driven Text-to-Image Generation System

Paper • 2401.10061 • Published Jan 18 • 29
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Paper • 2401.09417 • Published Jan 17 • 59
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference

Paper • 2401.08671 • Published Jan 9 • 14
UFO: A UI-Focused Agent for Windows OS Interaction

Paper • 2402.07939 • Published Feb 8 • 13
λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space

Paper • 2402.05195 • Published Feb 7 • 18
FiT: Flexible Vision Transformer for Diffusion Model

Paper • 2402.12376 • Published Feb 19 • 48
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Paper • 2403.04692 • Published Mar 7 • 39
Yi: Open Foundation Models by 01.AI

Paper • 2403.04652 • Published Mar 7 • 62
StableDrag: Stable Dragging for Point-based Image Editing

Paper • 2403.04437 • Published Mar 7 • 25
CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion

Paper • 2403.05121 • Published Mar 8 • 22
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation

Paper • 2403.12015 • Published Mar 18 • 64
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation

Paper • 2403.16990 • Published Mar 25 • 25
ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27 • 52
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25 • 65
Getting it Right: Improving Spatial Consistency in Text-to-Image Models

Paper • 2404.01197 • Published Apr 1 • 30
CosmicMan: A Text-to-Image Foundation Model for Humans

Paper • 2404.01294 • Published Apr 1 • 15
On the Scalability of Diffusion-based Text-to-Image Generation

Paper • 2404.02883 • Published Apr 3 • 17
Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models

Paper • 2404.07724 • Published Apr 11 • 13
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

Paper • 2404.07973 • Published Apr 11 • 30
EdgeFusion: On-Device Text-to-Image Generation

Paper • 2404.11925 • Published Apr 18 • 21
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation

Paper • 2404.19427 • Published Apr 30 • 71
DressCode: Autoregressively Sewing and Generating Garments from Text Guidance

Paper • 2401.16465 • Published Jan 29 • 11
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation

Paper • 2406.07686 • Published Jun 11 • 14
Wavelets Are All You Need for Autoregressive Image Generation

Paper • 2406.19997 • Published Jun 28 • 29
InstantStyle-Plus: Style Transfer with Content-Preserving in Text-to-Image Generation

Paper • 2407.00788 • Published Jun 30 • 22
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Paper • 2407.03320 • Published Jul 3 • 93
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion

Paper • 2407.01392 • Published Jul 1 • 39
OmniParser for Pure Vision Based GUI Agent

Paper • 2408.00203 • Published Aug 1 • 24
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining

Paper • 2408.02657 • Published Aug 5 • 33
Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models

Paper • 2410.02416 • Published Oct 3 • 26
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation

Paper • 2410.08159 • Published Oct 10 • 25
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens

Paper • 2410.13863 • Published Oct 17 • 36