Organization Card

MJ-Bench Team

MJ-Bench-Team is co-founded by Stanford University, UNC-Chapel Hill, and the University of Chicago. We aim to align modern foundation models with multimodal judges to enhance reliability, safety, and performance.

Recent News

🔥 We have released MJ-Video. All datasets and model checkpoints are available here!
🎉 MJ-PreferGen is accepted by ICLR25! Check out the paper: MJ-PreferGen: An Automatic Framework for Preference Data Synthesis.

😎 MJ-Video: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation

Project page: https://aiming-lab.github.io/MJ-VIDEO.github.io/
Code repository: https://github.com/aiming-lab/MJ-Video

We release MJ-Bench-Video, a comprehensive fine-grained video preference benchmark, and MJ-Video, a powerful MoE-based multi-dimensional video reward model!

👩‍⚖️ MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Project page: https://mj-bench.github.io/
Code repository: https://github.com/MJ-Bench/MJ-Bench

Text-to-image models like DALLE-3 and Stable Diffusion are proliferating rapidly, but they often encounter challenges such as hallucination, bias, and unsafe or low-quality output. To effectively address these issues, it’s crucial to align these models with desired behaviors based on feedback from a multimodal judge.

However, current multimodal judges are often under-evaluated, leading to possible misalignment and safety concerns during fine-tuning. To tackle this, we introduce MJ-Bench, a new benchmark featuring a comprehensive preference dataset to evaluate multimodal judges on four critical dimensions:

Alignment
Safety
Image Quality
Bias

We evaluate a wide range of multimodal judges, including:

6 smaller-sized CLIP-based scoring models
11 open-source VLMs (e.g., the LLaVA family)
4 closed-source VLMs (e.g., GPT-4, Claude 3)

🔥 We are actively updating the leaderboard!
You are welcome to submit your multimodal judge’s evaluation results on our dataset to the Hugging Face leaderboard.

Collections 3

spaces 1

pinned

Running

10

MJ Bench Leaderboard

🥇

Display and filter multimodal model leaderboard results

models 6

datasets 3

MJ-Bench/MJ-BENCH-VIDEO

Viewer • Updated Feb 14 • 10.8k • 68

MJ-Bench/MJ-Bench

Viewer • Updated Jul 10, 2024 • 4.07k • 37 • 15

MJ-Bench/MJ-Bench-Results

Preview • Updated Jul 9, 2024 • 32

MJ-Bench-Team

AI & ML interests

Recent Activity

MJ-Bench Team

Recent News

😎 MJ-Video: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation

👩‍⚖️ MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Collections 3

yichaodu/DiffusionDPO-bias-hps-2.1

yichaodu/DiffusionDPO-bias-gemini-1.5

yichaodu/DiffusionDPO-bias-claude3-opus

yichaodu/DiffusionDPO-alignment-hps-2.1

MJ-Bench/DDPO-alignment-gpt-4o

MJ-Bench/DDPO-alignment-gpt-4v

MJ-Bench/DDPO-alignment-claude3-opus

spaces 1

MJ Bench Leaderboard

models 6

MJ-Bench/MJ-VIDEO-2B

MJ-Bench/DDPO-alignment-gpt-4v

MJ-Bench/DDPO-alignment-gpt-4o

MJ-Bench/DDPO-alignment-claude3-opus

MJ-Bench/DiffusionDPO-alignment-claude3-opus

MJ-Bench/DiffusionDPO-alignment-gemini-1.5

datasets 3

MJ-Bench/MJ-BENCH-VIDEO

MJ-Bench/MJ-Bench

MJ-Bench/MJ-Bench-Results

AI & ML interests

Recent Activity

Team members 6

MJ-Bench Team

Recent News

😎 MJ-Video: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation

👩‍⚖️ MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Collections 3

spaces 1

MJ Bench Leaderboard

models 6 Sort: Recently updated

datasets 3 Sort: Recently updated

models 6

datasets 3