Zhaorun commited on
Commit
51f3dd0
Β·
verified Β·
1 Parent(s): 9e03df1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -16
README.md CHANGED
@@ -9,32 +9,38 @@ pinned: false
9
 
10
  # MJ-Bench Team: Align
11
 
12
-
13
  ## 😎 [**MJ-Video**: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation](https://aiming-lab.github.io/MJ-VIDEO.github.io/)
14
 
15
- We release MJ-Bench-Video, a comprehensive fine-grained video preference benchmark, and MJ-Video, a powerful MoE-based multi-dimensional video reward model!
16
 
17
- ![Dataset Overview](https://raw.githubusercontent.com/aiming-lab/MJ-Video/blob/main/asserts/overview.png)
 
 
18
 
 
19
 
20
  ## πŸ‘©β€βš–οΈ [**MJ-Bench**: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?](https://mj-bench.github.io/)
21
 
22
- Project page: https://mj-bench.github.io/
23
- Code repository: https://github.com/MJ-Bench/MJ-Bench
24
-
25
- While text-to-image models like DALLE-3 and Stable Diffusion are rapidly proliferating, they often encounter challenges such as hallucination, bias, and the production of unsafe, low-quality output. To effectively address these issues, it is crucial to align these models with desired behaviors based on feedback from a multimodal judge. Despite their significance, current multimodal judges frequently undergo inadequate evaluation of their capabilities and limitations, potentially leading to misalignment and unsafe fine-tuning outcomes.
26
 
27
- To address this issue, we introduce MJ-Bench, a novel benchmark which incorporates a comprehensive preference dataset to evaluate multimodal judges in providing feedback for image generation models across four key perspectives: **alignment**, **safety**, **image quality**, and **bias**.
28
 
29
- <!-- ![Dataset Overview](https://raw.githubusercontent.com/MJ-Bench/MJ-Bench.github.io/main/static/images/dataset_overview.png) -->
30
 
31
- Specifically, we evaluate a large variety of multimodal judges including
 
 
 
32
 
33
- - 6 smaller-sized CLIP-based scoring models
34
- - 11 open-source VLMs (e.g. LLaVA family)
35
- - 4 and close-source VLMs (e.g. GPT-4o, Claude 3)
36
- -
37
- <!-- ![Radar Plot](https://raw.githubusercontent.com/MJ-Bench/MJ-Bench.github.io/main/static/images/radar_plot.png) -->
38
 
 
 
 
39
 
40
- πŸ”₯πŸ”₯We are actively updating the [leaderboard](https://mj-bench.github.io/) and you are welcome to submit the evaluation result of your multimodal judge on [our dataset](https://huggingface.co/datasets/MJ-Bench/MJ-Bench) to [huggingface leaderboard](https://huggingface.co/spaces/MJ-Bench/MJ-Bench-Leaderboard).
 
 
9
 
10
  # MJ-Bench Team: Align
11
 
 
12
  ## 😎 [**MJ-Video**: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation](https://aiming-lab.github.io/MJ-VIDEO.github.io/)
13
 
14
+ We release **MJ-Bench-Video**, a comprehensive fine-grained video preference benchmark, and **MJ-Video**, a powerful MoE-based multi-dimensional video reward model!
15
 
16
+ <p align="center">
17
+ <img src="https://raw.githubusercontent.com/aiming-lab/MJ-Video/main/asserts/overview.png" alt="MJ-Video Overview" width="80%"/>
18
+ </p>
19
 
20
+ ---
21
 
22
  ## πŸ‘©β€βš–οΈ [**MJ-Bench**: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?](https://mj-bench.github.io/)
23
 
24
+ - **Project page**: [https://mj-bench.github.io/](https://mj-bench.github.io/)
25
+ - **Code repository**: [https://github.com/MJ-Bench/MJ-Bench](https://github.com/MJ-Bench/MJ-Bench)
 
 
26
 
27
+ Text-to-image models like DALLE-3 and Stable Diffusion are proliferating rapidly, but they often encounter challenges such as hallucination, bias, and unsafe or low-quality output. To effectively address these issues, it’s crucial to align these models with desired behaviors based on feedback from a **multimodal judge**.
28
 
29
+ However, current multimodal judges are often **under-evaluated**, leading to possible misalignment and safety concerns during fine-tuning. To tackle this, we introduce **MJ-Bench**, a new benchmark featuring a comprehensive preference dataset to evaluate multimodal judges on four critical dimensions:
30
 
31
+ 1. **Alignment**
32
+ 2. **Safety**
33
+ 3. **Image Quality**
34
+ 4. **Bias**
35
 
36
+ We evaluate a wide range of multimodal judges, including:
37
+ - 6 smaller-sized CLIP-based scoring models
38
+ - 11 open-source VLMs (e.g., the LLaVA family)
39
+ - 4 closed-source VLMs (e.g., GPT-4, Claude 3)
 
40
 
41
+ <p align="center">
42
+ <img src="https://github.com/MJ-Bench/MJ-Bench.github.io/blob/main/static/images/dataset_overview_new.png" alt="MJ-Bench Dataset Overview" width="80%"/>
43
+ </p>
44
 
45
+ πŸ”₯ **We are actively updating the [leaderboard](https://mj-bench.github.io/)!**
46
+ You are welcome to submit your multimodal judge’s evaluation results on [our dataset](https://huggingface.co/datasets/MJ-Bench/MJ-Bench) to the [Hugging Face leaderboard](https://huggingface.co/spaces/MJ-Bench/MJ-Bench-Leaderboard).