
Daily Papers

Example-based Motion Synthesis via Generative Motion Matching

StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners
SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds

ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation

Inserting Anybody in Diffusion Models via Celeb Basis

Wuerstchen: Efficient Pretraining of Text-to-Image Models

SafeDiffuser: Safe Planning with Diffusion Probabilistic Models

Bytes Are All You Need: Transformers Operating Directly On File Bytes

Brainformers: Trading Simplicity for Efficiency

Birth of a Transformer: A Memory Viewpoint
The Hidden Language of Diffusion Models

SQL-PaLM: Improved Large Language ModelAdaptation for Text-to-SQL

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

CodeTF: One-stop Transformer Library for State-of-the-art Code LLM

MuseCoco: Generating Symbolic Music from Text

Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance
Diffusion Self-Guidance for Controllable Image Generation
Cocktail: Mixing Multi-Modality Controls for Text-Conditional Image Generation

LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day
