What is Average H4 Score?

#2
by adnan-ahmad-tub - opened

What does the term H4 stands for?

I believe its the average of four HuggingFace automatic evaluation metrics, namely,

  • ARC (25-s)
  • HellaSwag (10-s)
  • MMLU (5-s)
  • TruthfulQA (MC) (0-s)

But I'm not sure. Can someone please confirm me?

Thanks!

Hugging Face Optimum org
edited Jul 6, 2023

Yes that's exactly what it is. The llm-perf leaderboard currently reports the average score found in the open llm leaderboard for all hardware+backend configurations of a specific model. We are making the assumption that there's no quality degradation which is not always true (e.g. some models might have their weights originally in float32 and have lower score or even unexpected behavior when loaded in float16).

IlyasMoutawwakil changed discussion status to closed

Sign up or log in to comment