Edit model card

[Paper] [GitHub]

Embodied Ability Evaluation: Performance in RoboVQA and OpenEQA

MLCD
Embodied-7B
LLaVA
OneVision-7B
GPT-4v RoboMamba
RoboVQA BLEU1 73.16 38.12 - 54.9
BLEU2 66.39 33.56 - 44.2
BLEU3 60.61 31.76 - 39.5
BLEU4 56.56 30.97 - 36.3
OpenEQA Object State Recognition 71.83 - 63.2 -
Object Recognition 49.46 - 43.4 -
Functional Reasoning 54.38 - 57.4 -
Spatial Understanding 48.64 - 33.6 -
Attribute Recognition 67.08 - 57.2 -
World Knowledge 53.87 - 50.7 -
Object Localization 43.06 - 42.0 -

General Ability Evaluation: Comparison with LLaVA OneVision-7B and GPT-4

Dataset Split MLCD
Embodied-7B
LLaVA
OneVision-7B
GPT-4v GPT-4o
A12D test 79.9 81.4 78.2 94.2
ChartQA test 83.0 80.0 78.5 85.7
DocVQA test 91.6 87.5 88.4 92.8
InfoVQA val 73.9 70.7 - -
InfoVQA test 70.0 68.8 - -
MMMU val 47.3 48.8 56.8 69.1
MMStar test 58.5 61.7 57.1 63.9
OCRBench - 749.0 697.0 656.0 805.0
RealWorldQA test 68.9 66.3 61.4 58.6
SeedBench image 74.9 75.4 49.9 76.2
MMbench en-dev 81.1 83.2 81.3 83.4
MMbench en-test 80.1 80.8 75.0 -
MME test 578/1603 418/1580 517/1409 -

Usage

A. Installation

git clone https://github.com/deepglint/unicom
cd unicom

# Upgrade pip and install necessary dependencies
pip install --upgrade pip
pip install -e ".[train]"

B. Inference

CUDA_VISIBLE_DEVICES=0 python infer.py --model_dir /path/to/your/model

# example:
# >> Enter 'exit' to end the conversation, 'reset' to clear the chat history.
# >> Enter image file paths (comma-separated): ./asserts/logo.png
# >> User: <image>What kind of animal is it in this picture?
# >> Assistant: The image features a stylized representation of a cat, characterized by its vibrant and abstract depiction.
# >> User: What color is this cat?
# >> Assistant: The cat in the image is primarily white with blue, orange and pink accents, creating a visually appealing and unique appearance.

C. Evaluation for Embodied Ability

Step 1

Download raw data following OpenEQA and RoboVQA(val part)

Step 2

Converting raw data into the format required for model evaluation.

# convert OpenEQA benchmark. Note: replace the paths with your own.
python llava/benchmark/make_openeqa_bmk.py

# convert RoboVQA benchmark. Note: replace the paths with your own.
python llava/benchmark/make_robovqa_bmk.py

Step 3

Make sure that your top-level directory structure should look like this:

|--/path/to/your/benchmarks
|  |--OpenEQA
|  |  |--openeqa_scannet.parquet
|  |  |--openeqa_hm3d.parquet
|  |--RoboVQA
|     |--robovqa.parquet
|--/path/to/your/images
   |--openeqa_val
   |  |--scannet-v0
   |  |  |--002-scannet-scene0709_00
   |  |  |--xxx-scannet-scenexxxx_xx
   |  |--hm3d-v0
   |     |--000-hm3d-BFRyYbPCCPE
   |     |--xxx-hm3d-xxxxxxxxxxx
   |--robovqa_val
      |--robovqa_221911
      |--robovqa_xxxxxx

Step 4

Run script for evaluation

# Note: replace 'YOUR_API_KEY', 'YOUR_ENDPOINT', 'bmk_root', 'image_folder' with your own.
bash scripts/eval/eval_robo.sh /path/to/your/model

D. Evaluation for General Ability

Install the evaluation tool and execute the evaluation script:

pip install lmms-eval==0.2.0
bash eval.sh

We would like to express our gratitude to Huajie Tan, Yumeng Wang, Yin Xie for his significant contributions to the experimental validation in MLLMs.

Downloads last month
14
Safetensors
Model size
7.94B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for DeepGlint-AI/MLCD-Embodied-7B

Finetuned
(1)
this model

Collection including DeepGlint-AI/MLCD-Embodied-7B