Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
ByteDance 's Collections
Ouro
Video-As-Prompt
Sa2VA Model Zoo

Sa2VA Model Zoo

updated 11 days ago

Huggingace Model Zoo For Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos By Bytedance Seed CV Research

Upvote
42

  • ByteDance/Sa2VA-4B

    Image-Text-to-Text • 4B • Updated Sep 8 • 176k • • 86

  • ByteDance/Sa2VA-8B

    Image-Text-to-Text • 8B • Updated Sep 8 • 1.12k • 65

  • ByteDance/Sa2VA-1B

    Image-Text-to-Text • 1B • Updated Sep 8 • 610 • 27

  • ByteDance/Sa2VA-26B

    Image-Text-to-Text • 26B • Updated Sep 8 • 62 • 31

  • ByteDance/Sa2VA-InternVL3-2B

    Image-Text-to-Text • 2B • Updated 22 days ago • 251 • 1

  • ByteDance/Sa2VA-InternVL3-8B

    Image-Text-to-Text • 8B • Updated 22 days ago • 126 • 3

  • ByteDance/Sa2VA-InternVL3-14B

    Image-Text-to-Text • 15B • Updated 22 days ago • 222 • 9

  • ByteDance/Sa2VA-Qwen2_5-VL-3B

    Image-Text-to-Text • 4B • Updated 22 days ago • 486 • 1

  • ByteDance/Sa2VA-Qwen2_5-VL-7B

    Image-Text-to-Text • 9B • Updated 22 days ago • 288 • 3

  • ByteDance/Sa2VA-Qwen3-VL-4B

    Image-Text-to-Text • 5B • Updated 17 days ago • 1.05k • 8

  • Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

    Paper • 2501.04001 • Published Jan 7 • 47

    Note Techinical Report For Sa2VA.

Upvote
42
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs