Another great week in open ML! Here's a small recap 🫰🏻
Model releases ⏯️ Video Language Models AI at Meta released Vision-CAIR/LongVU_Qwen2_7B, a new state-of-the-art long video LM model based on DINOv2, SigLIP, Qwen2 and Llama 3.2
💬 Small language models Hugging Face released HuggingFaceTB/SmolLM2-1.7B, a family of new smol language models with Apache 2.0 license that come in sizes 135M, 360M and 1.7B, along with datasets. Meta released facebook/MobileLLM-1B, a new family of on-device LLMs of sizes 125M, 350M and 600M
Meta AI vision has been cooking @facebook They shipped multiple models and demos for their papers at @ECCV🤗
Here's a compilation of my top picks: - Sapiens is family of foundation models for human-centric depth estimation, segmentation and more, all models have open weights and demos 👏
All models have their demos and even torchscript checkpoints! A collection of models and demos: facebook/sapiens-66d22047daa6402d565cb2fc - VFusion3D is state-of-the-art consistent 3D generation model from images