Scaling Pre-training to One Hundred Billion Data for Vision Language Models Paper • 2502.07617 • Published 5 days ago • 24
view article Article From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub 5 days ago • 42
view article Article From Llasa to Llasagna 🍕: Finetuning LLaSA to generates Italian speech and other languages By Steveeeeeeen and 1 other • 5 days ago • 21
DepthPro Models Collection Depth Pro: Sharp Monocular Metric Depth in Less Than a Second • 4 items • Updated 9 days ago • 7
view article Article Simplifying Alignment: From RLHF to Direct Preference Optimization (DPO) By ariG23498 • 28 days ago • 13
ViTPose Collection Collection for ViTPose models based on transformers implementation. • 10 items • Updated Jan 12 • 12
Segformer Collection Transformer-based semantic segmentation model by Nvidia • 15 items • Updated Jan 13 • 4
timm tiny test models Collection A collection of very small (~300-500k parameter) models at 160x160 resolution, for testing purposes. Trained on ImageNet-1k. • 13 items • Updated Oct 2, 2024 • 5
view article Article ColPali: Efficient Document Retrieval with Vision Language Models 👀 By manu • Jul 5, 2024 • 200
Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot Paper • 2402.14654 • Published Feb 22, 2024 • 2
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 15 items • Updated Dec 6, 2024 • 569
Jamba-1.5 Collection The AI21 Jamba family of models are state-of-the-art, hybrid SSM-Transformer instruction following foundation models • 2 items • Updated Aug 22, 2024 • 84