AIMv2 Collection A collection of AIMv2 vision encoders that supports a number of resolutions, native resolution, and a distilled checkpoint. • 19 items • Updated 2 days ago • 45
LLM2CLIP Collection LLM2CLIP makes SOTA pretrained CLIP modal more SOTA ever. • 7 items • Updated 5 days ago • 37
Cosmos Tokenizer Collection A suite of image and video tokenizers • 10 items • Updated 18 days ago • 18
OpenCoder Collection OpenCoder is an open and reproducible code LLM family which matches the performance of top-tier code LLMs. • 8 items • Updated about 23 hours ago • 72
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 10 items • Updated 3 days ago • 176
Sparsh Collection Models and datasets for Sparsh: Self-supervised touch representations for vision-based tactile sensing • 15 items • Updated Oct 24 • 11
MobileLLM Collection Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905 • 8 items • Updated 17 days ago • 95
Oryx-1.5 Collection Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution • 2 items • Updated Oct 23 • 3
view article Article ColFlor: Towards BERT-Size Vision-Language Document Retrieval Models By ahmed-masry • Oct 18 • 16
steiner-preview Collection Reasoning models trained on synthetic data using reinforcement learning. • 3 items • Updated Oct 20 • 24
Granite 3.0 Language Models Collection A series of language models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 8 items • Updated 20 days ago • 89
view article Article OCR Processing and Text in Image Analysis with Florence-2-base and Qwen2-VL-2B By PandorAI1995 • Oct 18 • 13
Gemma-APS Release Collection Gemma models for text-to-propositions segmentation. The models are distilled from fine-tuned Gemini Pro model applied to multi-domain synthetic data. • 3 items • Updated Oct 15 • 19
view article Article Model2Vec: Distill a Small Fast Model from any Sentence Transformer By Pringled • Oct 14 • 55
NVLM 1.0 Collection A family of frontier-class multimodal large language models (LLMs) that achieve state-of-the-art results on vision-language tasks and text-only tasks. • 1 item • Updated Oct 1 • 48