Revisiting Feature Prediction for Learning Visual Representations from Video Paper • 2404.08471 • Published Feb 15 • 1
Jina CLIP: Your CLIP Model Is Also Your Text Retriever Paper • 2405.20204 • Published 2 days ago • 17
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture Paper • 2301.08243 • Published Jan 19, 2023 • 6
Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control Paper • 2405.17414 • Published 5 days ago • 7
BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing Paper • 2305.14720 • Published May 24, 2023 • 2
Transformers Can Do Arithmetic with the Right Embeddings Paper • 2405.17399 • Published 5 days ago • 44
Learning Transferable Visual Models From Natural Language Supervision Paper • 2103.00020 • Published Feb 26, 2021 • 8
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers Paper • 2106.10270 • Published Jun 18, 2021 • 2
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation Paper • 2405.01434 • Published about 1 month ago • 44
view article Article Training Stable Diffusion with Dreambooth using 🧨 Diffusers Nov 7, 2022 • 4
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models Paper • 2112.10741 • Published Dec 20, 2021 • 3
V3D: Video Diffusion Models are Effective 3D Generators Paper • 2403.06738 • Published Mar 11 • 28
Speculative Streaming: Fast LLM Inference without Auxiliary Models Paper • 2402.11131 • Published Feb 16 • 41