VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models Paper • 2312.00845 • Published 10 days ago • 34
HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting Paper • 2312.03461 • Published 5 days ago • 12
DiffiT: Diffusion Vision Transformers for Image Generation Paper • 2312.02139 • Published 7 days ago • 9
HiFi Tuner: High-Fidelity Subject-Driven Fine-Tuning for Diffusion Models Paper • 2312.00079 • Published 11 days ago • 11
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models Paper • 2311.13435 • Published 19 days ago • 12
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model Paper • 2311.13231 • Published 19 days ago • 22
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs Paper • 2311.13600 • Published 19 days ago • 37
Diffusion Model Alignment Using Direct Preference Optimization Paper • 2311.12908 • Published 20 days ago • 32
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes Paper • 2311.13384 • Published 19 days ago • 45
3D Gaussian Splatting for Real-Time Radiance Field Rendering Paper • 2308.04079 • Published Aug 8 • 136
NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation Paper • 2311.12229 • Published 20 days ago • 17
SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering Paper • 2311.12775 • Published 20 days ago • 22
Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression Paper • 2311.10794 • Published 24 days ago • 20
Solving The Travelling Salesmen Problem using HNN and HNN-SA algorithms Paper • 2202.13746 • Published Feb 8, 2022 • 1
Assorted text-to-image diffusion models Collection This collection contains a list of my most favorite text-to-image diffusion models. • 8 items • Updated about 1 month ago • 5
Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model Paper • 2311.06214 • Published about 1 month ago • 23
CodeFusion: A Pre-trained Diffusion Model for Code Generation Paper • 2310.17680 • Published Oct 26 • 63
SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding Paper • 2310.15308 • Published Oct 23 • 18
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images Paper • 2310.16825 • Published Oct 25 • 19
SILC: Improving Vision Language Pretraining with Self-Distillation Paper • 2310.13355 • Published Oct 20 • 2
BitNet: Scaling 1-bit Transformers for Large Language Models Paper • 2310.11453 • Published Oct 17 • 68
Leveraging Unpaired Data for Vision-Language Generative Models via Cycle Consistency Paper • 2310.03734 • Published Oct 5 • 11
Aligning Text-to-Image Diffusion Models with Reward Backpropagation Paper • 2310.03739 • Published Oct 5 • 20
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion Paper • 2310.03502 • Published Oct 5 • 73
Enable Language Models to Implicitly Learn Self-Improvement From Data Paper • 2310.00898 • Published Oct 2 • 18
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning Paper • 2309.15091 • Published Sep 26 • 30
Foundation Models for Vision 🧩 Collection Foundation models for computer vision. • 23 items • Updated Sep 29 • 12
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation Paper • 2309.15818 • Published Sep 27 • 16
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models Paper • 2309.14509 • Published Sep 25 • 12
DualToken-ViT: Position-aware Efficient Vision Transformer with Dual Token Fusion Paper • 2309.12424 • Published Sep 21 • 10
Text2Control3D: Controllable 3D Avatar Generation in Neural Radiance Fields using Geometry-Guided Text-to-Image Diffusion Model Paper • 2309.03550 • Published Sep 7 • 10
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models Paper • 2309.05793 • Published Sep 11 • 48
InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation Paper • 2309.06380 • Published Sep 12 • 28
Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts Paper • 2309.04354 • Published Sep 8 • 12
ProPainter: Improving Propagation and Transformer for Video Inpainting Paper • 2309.03897 • Published Sep 7 • 23
Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation Paper • 2309.03549 • Published Sep 7 • 3
Hierarchical Masked 3D Diffusion Model for Video Outpainting Paper • 2309.02119 • Published Sep 5 • 9
VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation Paper • 2309.00398 • Published Sep 1 • 17
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model Paper • 2211.05100 • Published Nov 9, 2022 • 19
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining Paper • 2308.05734 • Published Aug 10 • 32
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models Paper • 2308.01825 • Published Aug 3 • 18
Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models Paper • 2308.00675 • Published Aug 1 • 33
SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning Paper • 2308.00436 • Published Aug 1 • 17
Retentive Network: A Successor to Transformer for Large Language Models Paper • 2307.08621 • Published Jul 17 • 162
TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT Paper • 2307.08674 • Published Jul 17 • 42
DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models Paper • 2307.02421 • Published Jul 5 • 31