WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences Paper β’ 2406.11069 β’ Published Jun 16 β’ 13
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos Paper β’ 2406.08407 β’ Published Jun 12 β’ 24
Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? Paper β’ 2406.07546 β’ Published Jun 11 β’ 8