Update README.md
Browse files
README.md
CHANGED
@@ -9,32 +9,38 @@ pinned: false
|
|
9 |
|
10 |
# MJ-Bench Team: Align
|
11 |
|
12 |
-
|
13 |
## π [**MJ-Video**: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation](https://aiming-lab.github.io/MJ-VIDEO.github.io/)
|
14 |
|
15 |
-
We release MJ-Bench-Video
|
16 |
|
17 |
-
|
|
|
|
|
18 |
|
|
|
19 |
|
20 |
## π©ββοΈ [**MJ-Bench**: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?](https://mj-bench.github.io/)
|
21 |
|
22 |
-
Project page
|
23 |
-
Code repository
|
24 |
-
|
25 |
-
While text-to-image models like DALLE-3 and Stable Diffusion are rapidly proliferating, they often encounter challenges such as hallucination, bias, and the production of unsafe, low-quality output. To effectively address these issues, it is crucial to align these models with desired behaviors based on feedback from a multimodal judge. Despite their significance, current multimodal judges frequently undergo inadequate evaluation of their capabilities and limitations, potentially leading to misalignment and unsafe fine-tuning outcomes.
|
26 |
|
27 |
-
|
28 |
|
29 |
-
|
30 |
|
31 |
-
|
|
|
|
|
|
|
32 |
|
33 |
-
|
34 |
-
-
|
35 |
-
-
|
36 |
-
-
|
37 |
-
<!--  -->
|
38 |
|
|
|
|
|
|
|
39 |
|
40 |
-
|
|
|
|
9 |
|
10 |
# MJ-Bench Team: Align
|
11 |
|
|
|
12 |
## π [**MJ-Video**: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation](https://aiming-lab.github.io/MJ-VIDEO.github.io/)
|
13 |
|
14 |
+
We release **MJ-Bench-Video**, a comprehensive fine-grained video preference benchmark, and **MJ-Video**, a powerful MoE-based multi-dimensional video reward model!
|
15 |
|
16 |
+
<p align="center">
|
17 |
+
<img src="https://raw.githubusercontent.com/aiming-lab/MJ-Video/main/asserts/overview.png" alt="MJ-Video Overview" width="80%"/>
|
18 |
+
</p>
|
19 |
|
20 |
+
---
|
21 |
|
22 |
## π©ββοΈ [**MJ-Bench**: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?](https://mj-bench.github.io/)
|
23 |
|
24 |
+
- **Project page**: [https://mj-bench.github.io/](https://mj-bench.github.io/)
|
25 |
+
- **Code repository**: [https://github.com/MJ-Bench/MJ-Bench](https://github.com/MJ-Bench/MJ-Bench)
|
|
|
|
|
26 |
|
27 |
+
Text-to-image models like DALLE-3 and Stable Diffusion are proliferating rapidly, but they often encounter challenges such as hallucination, bias, and unsafe or low-quality output. To effectively address these issues, itβs crucial to align these models with desired behaviors based on feedback from a **multimodal judge**.
|
28 |
|
29 |
+
However, current multimodal judges are often **under-evaluated**, leading to possible misalignment and safety concerns during fine-tuning. To tackle this, we introduce **MJ-Bench**, a new benchmark featuring a comprehensive preference dataset to evaluate multimodal judges on four critical dimensions:
|
30 |
|
31 |
+
1. **Alignment**
|
32 |
+
2. **Safety**
|
33 |
+
3. **Image Quality**
|
34 |
+
4. **Bias**
|
35 |
|
36 |
+
We evaluate a wide range of multimodal judges, including:
|
37 |
+
- 6 smaller-sized CLIP-based scoring models
|
38 |
+
- 11 open-source VLMs (e.g., the LLaVA family)
|
39 |
+
- 4 closed-source VLMs (e.g., GPT-4, Claude 3)
|
|
|
40 |
|
41 |
+
<p align="center">
|
42 |
+
<img src="https://github.com/MJ-Bench/MJ-Bench.github.io/blob/main/static/images/dataset_overview_new.png" alt="MJ-Bench Dataset Overview" width="80%"/>
|
43 |
+
</p>
|
44 |
|
45 |
+
π₯ **We are actively updating the [leaderboard](https://mj-bench.github.io/)!**
|
46 |
+
You are welcome to submit your multimodal judgeβs evaluation results on [our dataset](https://huggingface.co/datasets/MJ-Bench/MJ-Bench) to the [Hugging Face leaderboard](https://huggingface.co/spaces/MJ-Bench/MJ-Bench-Leaderboard).
|