Spaces:

nlphuji
/

WHOOPS-Leaderboard

Runtime error

WHOOPS-Leaderboard / WHOOPS_Explanation_of_Violation_Leaderboard.tsv

nitzanguetta

results updated

946cdc1 over 1 year ago

577 Bytes

	Model Human Metric Auto Metric Identify (Binary Accuracy)
	Humans 95 92
	Ground-truth Caption → Llama-2-7b (Oracle) 71
	Ground-truth Caption → Llama-2-13b (Oracle) 70
	Ground-truth Caption → GPT4 (Oracle) 69
	Ground-truth Caption → GPT3 (Oracle) 68 62 74
	BLIP2 FlanT5-XXL (Fine-tuned) 27 24 73
	BLIP2 FlanT5-XL (Fine-tuned) 15 18 60
	Predicted Caption → Llama-2-7b 36
	Predicted Caption → Llama-2-13b 36
	Predicted Caption → GPT4 36
	Predicted Caption → GPT3 33 42 59
	InstructBLIP 31
	LLaVA 31
	mPLUG-Owl 24
	BLIP2 FlanT5-XXL (Zero-shot) 0 0 50