PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published 15 days ago • 117
PaliGemma 2 Release Collection Vision-Language Models available in multiple 3B, 10B and 28B variants. • 23 items • Updated 6 days ago • 116
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Paper • 2412.04424 • Published 14 days ago • 54
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models Paper • 2411.18613 • Published 22 days ago • 50
microsoft/LLM2CLIP-Llama-3-8B-Instruct-CC-Finetuned Zero-Shot Classification • Updated about 1 month ago • 11.5k • 27
A Case Study of Web App Coding with OpenAI Reasoning Models Paper • 2409.13773 • Published Sep 19 • 5
Adaptive Caching for Faster Video Generation with Diffusion Transformers Paper • 2411.02397 • Published Nov 4 • 23