Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding Paper • 2405.08748 • Published 2 days ago • 13
Perturbed Attention Guidance pipelines Collection Pipelines for Perturbed Attention Guidance with 🧨 library • 6 items • Updated 6 days ago • 2
From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation Paper • 2404.15267 • Published 23 days ago • 4
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework Paper • 2404.14619 • Published 24 days ago • 120
HiDiffusion: Unlocking High-Resolution Creativity and Efficiency in Low-Resolution Trained Diffusion Models Paper • 2311.17528 • Published Nov 29, 2023 • 4
Leaderboards and benchmarks ✨ Collection Cool leaderboard spaces collection for models across modalities! Text, vision, audio, ... • 61 items • Updated 2 days ago • 59
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published 24 days ago • 230
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis Paper • 2404.13686 • Published 25 days ago • 25
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time Paper • 2404.10667 • Published about 1 month ago • 12
HairFastGAN: Realistic and Robust Hair Transfer with a Fast Encoder-Based Approach Paper • 2404.01094 • Published Apr 1 • 3
Natural language guidance of high-fidelity text-to-speech with synthetic annotations Paper • 2402.01912 • Published Feb 2 • 7
HF-curated models available on Workers AI Collection A collection of models curated with Hugging Face that can be run on Cloudflare's Workers AI serverless inference platform. • 15 items • Updated Apr 2 • 48
🎭 Avatars Collection The latest AI-powered technologies usher in a new era of realistic avatars! 🚀 • 33 items • Updated 2 days ago • 49
SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion Paper • 2403.12008 • Published Mar 18 • 18
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation Paper • 2403.12015 • Published Mar 18 • 60
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment Paper • 2403.05135 • Published Mar 8 • 39
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference Paper • 2403.04132 • Published Mar 7 • 38
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation Paper • 2403.04692 • Published Mar 7 • 35
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis Paper • 2403.03206 • Published Mar 5 • 40
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers Paper • 2402.19479 • Published Feb 29 • 30
Gemma release Collection Groups the Gemma models released by the Google team. • 40 items • Updated 2 days ago • 302
Text-to-Image Base Models Collection All text-to-image open source base models, with their respective license • 28 items • Updated 6 days ago • 17
MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models Paper • 2402.06178 • Published Feb 9 • 12
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper • 2402.03300 • Published Feb 5 • 61
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning Paper • 2402.00769 • Published Feb 1 • 17
OLMo Suite Collection Artifacts for the first set of OLMo models. • 12 items • Updated about 23 hours ago • 35
PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models Paper • 2401.05252 • Published Jan 10 • 43
DreamTuner: Single Image is Enough for Subject-Driven Generation Paper • 2312.13691 • Published Dec 21, 2023 • 23
HAAR: Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles Paper • 2312.11666 • Published Dec 18, 2023 • 12
Seamless Communication Collection A significant step towards removing language barriers through expressive, fast and high-quality AI translation. • 16 items • Updated Jan 16 • 123
LEDITS++: Limitless Image Editing using Text-to-Image Models Paper • 2311.16711 • Published Nov 28, 2023 • 14
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models Paper • 2311.10093 • Published Nov 16, 2023 • 54
Tied-Lora: Enhacing parameter efficiency of LoRA with weight tying Paper • 2311.09578 • Published Nov 16, 2023 • 11
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs Paper • 2311.09257 • Published Nov 14, 2023 • 43
Music ControlNet: Multiple Time-varying Controls for Music Generation Paper • 2311.07069 • Published Nov 13, 2023 • 43
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module Paper • 2311.05556 • Published Nov 9, 2023 • 73
Diffusion guided image and video editing Collection Spaces for image and video editing using diffusion based techniques • 4 items • Updated Oct 25, 2023 • 2
Latent Consistency Models Weights Collection Full-fine tune weights for Latent Consistency models • 3 items • Updated Nov 9, 2023 • 9
Latent Consistency Models LoRAs Collection Latent Consistency Models for Stable Diffusion - LoRAs and full fine-tuned weights • 4 items • Updated Nov 10, 2023 • 94
MusicGen Stereo Collection A collection of stereo music generation models as part of the v2 MusicGen release. • 4 items • Updated 22 days ago • 7
FlashDecoding++: Faster Large Language Model Inference on GPUs Paper • 2311.01282 • Published Nov 2, 2023 • 30
De-Diffusion Makes Text a Strong Cross-Modal Interface Paper • 2311.00618 • Published Nov 1, 2023 • 21
Controllable Music Production with Diffusion Models and Guidance Gradients Paper • 2311.00613 • Published Nov 1, 2023 • 23
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images Paper • 2310.16825 • Published Oct 25, 2023 • 24
DPM-Solver-v3: Improved Diffusion ODE Solver with Empirical Model Statistics Paper • 2310.13268 • Published Oct 20, 2023 • 15