C4AI Aya Vision Collection Aya Vision is a state-of-the-art family of vision models that brings multimodal capabilities to 23 languages. β’ 5 items β’ Updated 7 days ago β’ 61
olmOCR Collection olmOCR is a document recognition pipeline for efficiently converting documents into plain text. olmocr.allenai.org β’ 3 items β’ Updated 12 days ago β’ 91
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper β’ 2502.14786 β’ Published 19 days ago β’ 128
Ovis2 Collection Our latest advancement in multi-modal large language models (MLLMs) β’ 8 items β’ Updated 22 days ago β’ 55
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency By not-lain β’ Jan 30 β’ 36
Hibiki fr-en Collection Hibiki is a model for streaming speech translation , which can run on device! See https://github.com/kyutai-labs/hibiki. β’ 5 items β’ Updated Feb 6 β’ 50
view article Article π Build a Qwen 2.5 VL API endpoint with Hugging Face spaces and Docker! By ariG23498 β’ Jan 29 β’ 17
AceMath Collection We are releasing math instruction models, math reward models, general instruction models, all training datasets, and a math reward benchmark. β’ 11 items β’ Updated Jan 17 β’ 11
SmolVLM 256M & 500M Collection Collection for models & demos for even smoller SmolVLM release β’ 12 items β’ Updated 19 days ago β’ 70
ViTPose Collection Collection for ViTPose models based on transformers implementation. β’ 10 items β’ Updated Jan 12 β’ 13
Sa2VA Model Zoo Collection Huggingace Model Zoo For Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos By Bytedance Seed CV Research β’ 4 items β’ Updated about 1 month ago β’ 32
Maya: An Instruction Finetuned Multilingual Multimodal Model Paper β’ 2412.07112 β’ Published Dec 10, 2024 β’ 27
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper β’ 2412.10360 β’ Published Dec 13, 2024 β’ 140