PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published 15 days ago • 117
PaliGemma 2 Release Collection Vision-Language Models available in multiple 3B, 10B and 28B variants. • 23 items • Updated 6 days ago • 117
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Paper • 2412.04424 • Published 14 days ago • 54
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models Paper • 2411.18613 • Published 22 days ago • 50
A Case Study of Web App Coding with OpenAI Reasoning Models Paper • 2409.13773 • Published Sep 19 • 5
Adaptive Caching for Faster Video Generation with Diffusion Transformers Paper • 2411.02397 • Published Nov 4 • 23
Loong: Generating Minute-level Long Videos with Autoregressive Language Models Paper • 2410.02757 • Published Oct 3 • 36
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second Paper • 2410.02073 • Published Oct 2 • 40
Molmo Collection Artifacts for open multimodal language models. • 5 items • Updated 22 days ago • 288
Improve Mathematical Reasoning in Language Models by Automated Process Supervision Paper • 2406.06592 • Published Jun 5 • 26
LLM Reasoning Papers Collection Papers to improve reasoning capabilities of LLMs • 16 items • Updated 8 days ago • 87
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper • 2408.08872 • Published Aug 16 • 98
SparseCraft: Few-Shot Neural Reconstruction through Stereopsis Guided Geometric Linearization Paper • 2407.14257 • Published Jul 19 • 5
Scaling Synthetic Data Creation with 1,000,000,000 Personas Paper • 2406.20094 • Published Jun 28 • 95