LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance Paper β’ 2307.00522 β’ Published Jul 2, 2023 β’ 31
Qwen2-VL Collection Vision-language model series based on Qwen2 β’ 15 items β’ Updated 27 days ago β’ 135
view article Article A Dive into Pretraining Strategies for Vision-Language Models Feb 3, 2023 β’ 37
Free Music Archive Collection ISMIR's 2017 FMA Dataset, Optimized for π€ Datasets / π₯ Croissant, with Clear Licensing β’ 4 items β’ Updated Sep 13 β’ 3
view article Article wHy DoNt YoU jUsT uSe ThE lLaMa ToKeNiZeR?? By catherinearnett β’ 18 days ago β’ 33
Adding Conditional Control to Text-to-Image Diffusion Models Paper β’ 2302.05543 β’ Published Feb 10, 2023 β’ 36
view article Article How to generate text: using different decoding methods for language generation with Transformers Mar 1, 2020 β’ 98
view article Article Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA May 24, 2023 β’ 84
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation Paper β’ 2409.09214 β’ Published Sep 13 β’ 45
view article Article Using π€ to Train a GPT-2 Model for Music Generation By juancopi81 β’ Oct 5, 2023 β’ 7
Gemma release Collection Groups the Gemma models released by the Google team. β’ 40 items β’ Updated Jul 31 β’ 325
SpeechVerse: A Large-scale Generalizable Audio Language Model Paper β’ 2405.08295 β’ Published May 14 β’ 14
Quantized-Mistral Collection Quantized Mistral models in 2,4, and 8 bit versions β’ 4 items β’ Updated Aug 31 β’ 4
view article Article Introduction to Quantization cooked in π€ with ππ§βπ³ By merve β’ Aug 25, 2023 β’ 18
FastVoiceGrad: One-step Diffusion-Based Voice Conversion with Adversarial Conditional Diffusion Distillation Paper β’ 2409.02245 β’ Published Sep 3 β’ 9
view article Article makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch By AviSoori1x β’ May 7 β’ 37
view article Article Introducing AuraFace: Open-Source Face Recognition and Identity Preservation Models By isidentical β’ Aug 26 β’ 35
aaliyah Collection personal collection of convnet models and paper implementations for different applications. β’ 2 items β’ Updated Aug 25 β’ 1
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models Paper β’ 2408.02718 β’ Published Aug 5 β’ 60
view article Article Open-sourcing Knowledge Distillation Code and Weights of SD-Small and SD-Tiny Aug 1, 2023 β’ 2
LLM-AD: Large Language Model based Audio Description System Paper β’ 2405.00983 β’ Published May 2 β’ 16
Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion Paper β’ 2407.13759 β’ Published Jul 18 β’ 17
Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity Paper β’ 2407.10387 β’ Published Jul 15 β’ 6
view article Article Train custom AI models with the trainer API and adapt them to π€ By not-lain β’ Jun 29 β’ 33
Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient Vision Transformers Paper β’ 2310.05400 β’ Published Oct 9, 2023 β’ 1