Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
marcusinthesky 's Collections
ZecRec
DS
VLM Benchmarks
Open-vocabulary object detection (OVD).
Multi-modal Mamba
Multimodal Embeddings
Tiny VLM Decoder
PeFT
Foundational
Decoder Upcycled to Embeddings

VLM Benchmarks

updated Oct 15, 2024
Upvote
1

  • MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models

    Paper • 2410.10139 • Published Oct 14, 2024 • 53

  • MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks

    Paper • 2410.10563 • Published Oct 14, 2024 • 39

  • LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content

    Paper • 2410.10783 • Published Oct 14, 2024 • 28

  • TVBench: Redesigning Video-Language Evaluation

    Paper • 2410.07752 • Published Oct 10, 2024 • 6
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs