view article Article Ļ0 and Ļ0-FAST: Vision-Language-Action Models for General Robot Control Feb 4 ā¢ 128
view article Article LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning? Jul 25, 2024 ā¢ 17
Searching for Better ViT Baselines Collection Exploring ViT hparams and model shapes for the GPU poor (between tiny and base). ā¢ 28 items ā¢ Updated Feb 14 ā¢ 17
MobileNetV4 pretrained weights Collection Weights for MobileNet-V4 pretrained in timm ā¢ 17 items ā¢ Updated Sep 22, 2024 ā¢ 18
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens Paper ā¢ 2406.11271 ā¢ Published Jun 17, 2024 ā¢ 21
What If We Recaption Billions of Web Images with LLaMA-3? Paper ā¢ 2406.08478 ā¢ Published Jun 12, 2024 ā¢ 41
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations Paper ā¢ 2405.18392 ā¢ Published May 28, 2024 ā¢ 12
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models Paper ā¢ 2405.15738 ā¢ Published May 24, 2024 ā¢ 46
PaliGemma Release Collection Pretrained and mix checkpoints for PaliGemma ā¢ 16 items ā¢ Updated 5 days ago ā¢ 145