Applied Machine Learning Papers

VikramSingh178 's Collections

Dataset Papers

updated about 19 hours ago

Reading List (Mainly Focused of VLM's and Diffusion Models)

Upvote

Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion

Paper • 2310.03502 • Published Oct 5, 2023 • 74
Scalable Diffusion Models with Transformers

Paper • 2212.09748 • Published Dec 19, 2022 • 8
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

Paper • 2311.15127 • Published Nov 25, 2023 • 6
Learning Transferable Visual Models From Natural Language Supervision

Paper • 2103.00020 • Published Feb 26, 2021 • 8
U-Net: Convolutional Networks for Biomedical Image Segmentation

Paper • 1505.04597 • Published May 18, 2015 • 6
Denoising Diffusion Probabilistic Models

Paper • 2006.11239 • Published Jun 19, 2020 • 3
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

Paper • 2112.10741 • Published Dec 20, 2021 • 3
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models

Paper • 2404.14507 • Published Apr 22 • 21
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Paper • 2307.01952 • Published Jul 4, 2023 • 74
Photorealistic Video Generation with Diffusion Models

Paper • 2312.06662 • Published Dec 11, 2023 • 23
PonderNet: Learning to Ponder

Paper • 2107.05407 • Published Jul 12, 2021
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers

Paper • 2106.10270 • Published Jun 18, 2021 • 2
Block-wise LoRA: Revisiting Fine-grained LoRA for Effective Personalization and Stylization in Text-to-Image Generation

Paper • 2403.07500 • Published Mar 12
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published 5 days ago • 63
BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing

Paper • 2305.14720 • Published May 24, 2023 • 2
Vision Transformers Need Registers

Paper • 2309.16588 • Published Sep 28, 2023 • 73
Kosmos-2.5: A Multimodal Literate Model

Paper • 2309.11419 • Published Sep 20, 2023 • 49
Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control

Paper • 2405.17414 • Published 5 days ago • 7
Jina CLIP: Your CLIP Model Is Also Your Text Retriever

Paper • 2405.20204 • Published 2 days ago • 17
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture

Paper • 2301.08243 • Published Jan 19, 2023 • 6
Revisiting Feature Prediction for Learning Visual Representations from Video

Paper • 2404.08471 • Published Feb 15 • 1

Upvote