Spaces:
Running
Running
metadata
title: README
emoji: π
colorFrom: blue
colorTo: blue
sdk: static
pinned: false

On Path to Multimodal Generalist: Levels and Benchmarks
[π Project] [π Leaderboard] [π Paper] [π€ Dataset-HF] [π Dataset-Github]
Does higher performance across tasks indicate a stronger capability of MLLM, and closer to AGI?
NO! Synergy does.
This project introduces:
General-Level, a 5-scale level evaluation system with a new norm for assessing the multimodal generalists (multimodal LLMs/agents). The core is the use of Synergy as the evaluative criterion, categorizing capabilities based on whether MLLMs preserve synergy across comprehension and generation, as well as across multimodal interactions.
General-Bench, a companion massive multimodal benchmark dataset, encompasses a broader spectrum of skills, modalities, formats, and capabilities, including over 700 tasks and 325K instances.