New Leader!

#2
by DKRacingFan - opened

Congratulations on reaching an average 78.55 on the hugging face leaderboards! Now the big question is will we reach an 80% average score before February?

..I do not know ...
I was making my private tests for understanding and reasoning and common sense of that llm and seems like I talk with finetuned very old llama 65b ... poor results.
For instance mistral instruct 0.2 seems to be much more advanced in understanding, reasoning and common sense . I not even mentioned mixtral 8x7b which is like on totally different level... leaps ahead.

I suspect this model is contaminated and that is why so high on the leaderboard.

Moreh, Inc. org
β€’
edited Jan 23

Hi, we haven't trained our model on any datasets other than the three mentioned in our model card

  1. Open-Orca/SlimOrca
  2. jondurbin/truthy-dpo-v0.1
  3. Intel/orca_dpo_pairs

and to the best of our knowledge, these three are not contaminated data.

+ we have tested contamination refer to https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard/discussions/472
gsm8k: result < 0.1, %: 0.47
truthfulqa: result < 0.1, %: 0.44

contamination test results for other tasks will be updated soon

This comment has been hidden

The data contamination check result in the model card is TBU, which is different from the results mentioned above.

Moreh, Inc. org

We will update readme too! Thanks @TomGrc

Moreh, Inc. org
This comment has been hidden

Sign up or log in to comment