PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning Paper • 2404.16994 • Published 7 days ago • 29
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community 18 days ago • 93
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback Paper • 2404.07987 • Published 21 days ago • 45
TransformerFAM: Feedback attention is working memory Paper • 2404.09173 • Published 18 days ago • 42
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners Paper • 2402.17723 • Published Feb 27 • 15
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Paper • 2403.18814 • Published Mar 27 • 37
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Paper • 2404.02905 • Published 29 days ago • 58
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs Paper • 2404.05719 • Published 24 days ago • 55
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Paper • 2404.07143 • Published 22 days ago • 90
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation Paper • 2403.12015 • Published Mar 18 • 60
AnimateDiff-Lightning: Cross-Model Diffusion Distillation Paper • 2403.12706 • Published Mar 19 • 17
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts Paper • 2402.09727 • Published Feb 15 • 35
FiT: Flexible Vision Transformer for Diffusion Model Paper • 2402.12376 • Published Feb 19 • 46
The FinBen: An Holistic Financial Benchmark for Large Language Models Paper • 2402.12659 • Published Feb 20 • 13
Improving Robustness for Joint Optimization of Camera Poses and Decomposed Low-Rank Tensorial Radiance Fields Paper • 2402.13252 • Published Feb 20 • 17
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models Paper • 2402.13064 • Published Feb 20 • 45
SDXL-Lightning: Progressive Adversarial Diffusion Distillation Paper • 2402.13929 • Published Feb 21 • 24
Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All Paper • 2401.13795 • Published Jan 24 • 64
Masked Audio Generation using a Single Non-Autoregressive Transformer Paper • 2401.04577 • Published Jan 9 • 37
City-on-Web: Real-time Neural Rendering of Large-scale Scenes on the Web Paper • 2312.16457 • Published Dec 27, 2023 • 13
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models Paper • 2311.10093 • Published Nov 16, 2023 • 54
AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort Paper • 2311.11243 • Published Nov 19, 2023 • 14
Make Pixels Dance: High-Dynamic Video Generation Paper • 2311.10982 • Published Nov 18, 2023 • 64
NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation Paper • 2311.12229 • Published Nov 20, 2023 • 25
MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer Paper • 2311.12052 • Published Nov 18, 2023 • 28
FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline Paper • 2311.13073 • Published Nov 22, 2023 • 53
Seamless Communication Collection A significant step towards removing language barriers through expressive, fast and high-quality AI translation. • 16 items • Updated Jan 16 • 120
Latent Consistency Model Demos Collection Latent Consistency Models for Stable Diffusion • 8 items • Updated Nov 12, 2023 • 24
In-Context Pretraining: Language Modeling Beyond Document Boundaries Paper • 2310.10638 • Published Oct 16, 2023 • 26
ProPainter: Improving Propagation and Transformer for Video Inpainting Paper • 2309.03897 • Published Sep 7, 2023 • 24
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models Paper • 2309.05793 • Published Sep 11, 2023 • 50
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks Paper • 2309.03895 • Published Sep 7, 2023 • 11
SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame Interpolation Paper • 2308.16876 • Published Aug 31, 2023 • 6
VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation Paper • 2309.00398 • Published Sep 1, 2023 • 18
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning Paper • 2309.02591 • Published Sep 5, 2023 • 12
Platypus: Quick, Cheap, and Powerful Refinement of LLMs Paper • 2308.07317 • Published Aug 14, 2023 • 22
RestoreFormer++: Towards Real-World Blind Face Restoration from Undegraded Key-Value Pairs Paper • 2308.07228 • Published Aug 14, 2023 • 8
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models Paper • 2308.06721 • Published Aug 13, 2023 • 24
Relightable and Animatable Neural Avatar from Sparse-View Video Paper • 2308.07903 • Published Aug 15, 2023 • 9
Dual-Stream Diffusion Net for Text-to-Video Generation Paper • 2308.08316 • Published Aug 16, 2023 • 23
SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning Paper • 2308.00436 • Published Aug 1, 2023 • 20
Predicting masked tokens in stochastic locations improves masked image modeling Paper • 2308.00566 • Published Jul 31, 2023 • 14
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World Paper • 2308.01907 • Published Aug 3, 2023 • 10
HANDAL: A Dataset of Real-World Manipulable Object Categories with Pose Annotations, Affordances, and Reconstructions Paper • 2308.01477 • Published Aug 2, 2023 • 11
Multimodal Neurons in Pretrained Text-Only Transformers Paper • 2308.01544 • Published Aug 3, 2023 • 14
MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies Paper • 2308.01546 • Published Aug 3, 2023 • 15
Mirror-NeRF: Learning Neural Radiance Fields for Mirrors with Whitted-Style Ray Tracing Paper • 2308.03280 • Published Aug 7, 2023 • 6
3D Motion Magnification: Visualizing Subtle Motions with Time Varying Radiance Fields Paper • 2308.03757 • Published Aug 7, 2023 • 10
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining Paper • 2308.05734 • Published Aug 10, 2023 • 33
Guiding Image Captioning Models Toward More Specific Captions Paper • 2307.16686 • Published Jul 31, 2023 • 14