SANA-1.5 Collection SANA-1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer • 6 items • Updated 7 days ago • 2
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM 15 days ago • 345
view article Article Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models Jun 24, 2024 • 190
Remote VAE Inference Endpoints Collection Models and handler code used in https://huggingface.co/blog/remote_vae • 5 items • Updated 16 days ago • 4
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 138
PaliGemma 2 Release Collection Vision-Language Models available in multiple 3B, 10B and 28B variants. • 32 items • Updated 14 days ago • 145
CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion Paper • 2403.05121 • Published Mar 8, 2024 • 24
Enhancing Training Efficiency Using Packing with Flash Attention Paper • 2407.09105 • Published Jul 12, 2024 • 15