Reproducibility issue
#2
by
mlabonne
- opened
Hi @zyh3826 , I'm playing with mergekit and wanted to reproduce your results with this model. Unfortunately, I only got an average score of 48.54 (vs. your 73.3) on the Open LLM Leaderboard.
Did you do extra steps or is there something I might have missed? Thank you.
Im trying the same layer combination with another model and getting complete gibberish. Amazing this even works at all
It's weird because their model does perform very well on Nous benchmark suite: https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard