Spaces:

MJ-Bench
/

README

Running

App Files Files Community

Zhaorun commited on 5 days ago

Commit

51f3dd0

verified ·

1 Parent(s): 9e03df1

Update README.md

Browse files

Files changed (1) hide show

README.md +22 -16

README.md CHANGED Viewed

@@ -9,32 +9,38 @@ pinned: false
 # MJ-Bench Team: Align
 ## 😎 [**MJ-Video**: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation](https://aiming-lab.github.io/MJ-VIDEO.github.io/)
-We release MJ-Bench-Video, a comprehensive fine-grained video preference benchmark, and MJ-Video, a powerful MoE-based multi-dimensional video reward model!
-![Dataset Overview](https://raw.githubusercontent.com/aiming-lab/MJ-Video/blob/main/asserts/overview.png)
 ## 👩‍⚖️ [**MJ-Bench**: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?](https://mj-bench.github.io/)
-Project page: https://mj-bench.github.io/
-Code repository: https://github.com/MJ-Bench/MJ-Bench
-While text-to-image models like DALLE-3 and Stable Diffusion are rapidly proliferating, they often encounter challenges such as hallucination, bias, and the production of unsafe, low-quality output. To effectively address these issues, it is crucial to align these models with desired behaviors based on feedback from a multimodal judge. Despite their significance, current multimodal judges frequently undergo inadequate evaluation of their capabilities and limitations, potentially leading to misalignment and unsafe fine-tuning outcomes.
-To address this issue, we introduce MJ-Bench, a novel benchmark which incorporates a comprehensive preference dataset to evaluate multimodal judges in providing feedback for image generation models across four key perspectives: **alignment**, **safety**, **image quality**, and **bias**.
-<!-- ![Dataset Overview](https://raw.githubusercontent.com/MJ-Bench/MJ-Bench.github.io/main/static/images/dataset_overview.png) -->
-Specifically, we evaluate a large variety of multimodal judges including
-- 6 smaller-sized CLIP-based scoring models
-- 11 open-source VLMs (e.g. LLaVA family)
-- 4 and close-source VLMs (e.g. GPT-4o, Claude 3)
--
-<!-- ![Radar Plot](https://raw.githubusercontent.com/MJ-Bench/MJ-Bench.github.io/main/static/images/radar_plot.png) -->
-🔥🔥We are actively updating the [leaderboard](https://mj-bench.github.io/) and you are welcome to submit the evaluation result of your multimodal judge on [our dataset](https://huggingface.co/datasets/MJ-Bench/MJ-Bench) to [huggingface leaderboard](https://huggingface.co/spaces/MJ-Bench/MJ-Bench-Leaderboard).

 # MJ-Bench Team: Align
 ## 😎 [**MJ-Video**: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation](https://aiming-lab.github.io/MJ-VIDEO.github.io/)
+We release **MJ-Bench-Video**, a comprehensive fine-grained video preference benchmark, and **MJ-Video**, a powerful MoE-based multi-dimensional video reward model!
+<p align="center">
+  <img src="https://raw.githubusercontent.com/aiming-lab/MJ-Video/main/asserts/overview.png" alt="MJ-Video Overview" width="80%"/>
+</p>
+---
 ## 👩‍⚖️ [**MJ-Bench**: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?](https://mj-bench.github.io/)
+- **Project page**: [https://mj-bench.github.io/](https://mj-bench.github.io/)
+- **Code repository**: [https://github.com/MJ-Bench/MJ-Bench](https://github.com/MJ-Bench/MJ-Bench)
+Text-to-image models like DALLE-3 and Stable Diffusion are proliferating rapidly, but they often encounter challenges such as hallucination, bias, and unsafe or low-quality output. To effectively address these issues, it’s crucial to align these models with desired behaviors based on feedback from a **multimodal judge**.
+However, current multimodal judges are often **under-evaluated**, leading to possible misalignment and safety concerns during fine-tuning. To tackle this, we introduce **MJ-Bench**, a new benchmark featuring a comprehensive preference dataset to evaluate multimodal judges on four critical dimensions:
+1. **Alignment**
+2. **Safety**
+3. **Image Quality**
+4. **Bias**
+We evaluate a wide range of multimodal judges, including:
+- 6 smaller-sized CLIP-based scoring models
+- 11 open-source VLMs (e.g., the LLaVA family)
+- 4 closed-source VLMs (e.g., GPT-4, Claude 3)
+<p align="center">
+  <img src="https://github.com/MJ-Bench/MJ-Bench.github.io/blob/main/static/images/dataset_overview_new.png" alt="MJ-Bench Dataset Overview" width="80%"/>
+</p>
+🔥 **We are actively updating the [leaderboard](https://mj-bench.github.io/)!**
+You are welcome to submit your multimodal judge’s evaluation results on [our dataset](https://huggingface.co/datasets/MJ-Bench/MJ-Bench) to the [Hugging Face leaderboard](https://huggingface.co/spaces/MJ-Bench/MJ-Bench-Leaderboard).