image - a GEONTT Collection

GEONTT 's Collections

base

3D

LLM

audio

video

image

RAG

image

updated 5 days ago

RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization

Paper • 2403.00483 • Published Mar 1 • 9
Note 训练一个text和subject的attention模块
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

Paper • 2403.01779 • Published Mar 4 • 26
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers

Paper • 2401.11605 • Published Jan 21 • 19
FiT: Flexible Vision Transformer for Diffusion Model

Paper • 2402.12376 • Published Feb 19 • 48
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Paper • 2403.04692 • Published Mar 7 • 40
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

Paper • 2403.05135 • Published Mar 8 • 40
Multistep Consistency Models

Paper • 2403.06807 • Published Mar 11 • 13
Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM

Paper • 2403.07487 • Published Mar 12 • 12
StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control

Paper • 2403.09055 • Published Mar 14 • 24
EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba

Paper • 2403.09977 • Published Mar 15 • 8
Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm

Paper • 2403.11781 • Published Mar 18 • 17
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

Paper • 2403.11703 • Published Mar 18 • 13
SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions

Paper • 2403.16627 • Published Mar 25 • 20
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion

Paper • 2403.18818 • Published Mar 27 • 22
ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27 • 48
CosmicMan: A Text-to-Image Foundation Model for Humans

Paper • 2404.01294 • Published Apr 1 • 15
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

Paper • 2404.02905 • Published Apr 3 • 61
ByteEdit: Boost, Comply and Accelerate Generative Image Editing

Paper • 2404.04860 • Published Apr 7 • 24
SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing

Paper • 2404.05717 • Published Apr 8 • 23
Adapting LLaMA Decoder to Vision Transformer

Paper • 2404.06773 • Published Apr 10 • 13
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

Paper • 2404.07987 • Published Apr 11 • 46
PuLID: Pure and Lightning ID Customization via Contrastive Alignment

Paper • 2404.16022 • Published Apr 24 • 16
Editable Image Elements for Controllable Synthesis

Paper • 2404.16029 • Published Apr 24 • 9
DressCode: Autoregressively Sewing and Generating Garments from Text Guidance

Paper • 2401.16465 • Published Jan 29 • 10
Paint by Inpaint: Learning to Add Image Objects by Removing Them First

Paper • 2404.18212 • Published Apr 28 • 26
Customizing Text-to-Image Models with a Single Image Pair

Paper • 2405.01536 • Published May 2 • 17
LogoMotion: Visually Grounded Code Generation for Content-Aware Animation

Paper • 2405.07065 • Published May 11 • 16
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Paper • 2405.08748 • Published May 14 • 18
Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion

Paper • 2405.09874 • Published May 16 • 15
DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis

Paper • 2405.14224 • Published May 23 • 8
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models

Paper • 2405.15738 • Published May 24 • 43
An Image is Worth 32 Tokens for Reconstruction and Generation

Paper • 2406.07550 • Published 22 days ago • 53
Zero-shot Image Editing with Reference Imitation

Paper • 2406.07547 • Published 22 days ago • 30
Neural Gaffer: Relighting Any Object via Diffusion

Paper • 2406.07520 • Published 22 days ago • 4
FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation

Paper • 2406.08392 • Published 21 days ago • 17
Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering

Paper • 2406.10208 • Published 19 days ago • 21
The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing

Paper • 2406.10601 • Published 18 days ago • 65
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation

Paper • 2406.16855 • Published 9 days ago • 53
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

Paper • 2406.19389 • Published 6 days ago • 49