91.9 HellaSwag, 79.2 TruthfulQA... And It Sucks. Why do this?

#5
by Phil337 - opened

No info on the databases or methods used, absurdly high scores, horrible performance, compulsive lies, extreme stubbornness, inability to solve simple logic puzzles... Why? Why are people downloading this, let alone upvoting this? Why is HF not automatically testing LLMs that get absurdly high scores for a given model and parameter count? I'm so tired of this nonsense.

open llm leaderboard is a joke for sure

Heresay. HF needs reason and evidence to change, more than testimony.

@LordTwave You're right, "HF needs reason and evidence", which is why despite a sea of cheaters almost nobody bothers reporting anything anymore.

That's why I left a comment. It's my way of saving potential downloaders time and effort. Anybody using this LLM for more than a second knows with >99.999% certainty that a HellaSwag score of 91.9 is utter nonsense. LLMs with far greater language skills don't even have that score. It's not hearsay. This LLM's scores are a load of crap.

This is a garbage model and probably was uploaded as a scam. Whoever made this garbage uploaded it on huggingface, then claim it to be top tier on the "Open LLM leaderboard" to trick investors.

This is a garbage model and probably was uploaded as a scam. Whoever made this garbage uploaded it on huggingface, then claim it to be top tier on the "Open LLM leaderboard" to trick investors.

Their stock price went up 30% after announcement of this model. lol

Doesn't work on Ollama. Fails.

This comment has been hidden

@Phil337 https://huggingface.co/fblgit/UNA-ThePitbull-21.4-v1
I tried to make this better for RoleNSFW, lets see wether i was able to solve a few of the problems here...
@bkieser same here, no llama.cpp some issue with vocab.

@Phil337 https://huggingface.co/fblgit/UNA-ThePitbull-21.4-v1
I tried to make this better for RoleNSFW, lets see wether i was able to solve a few of the problems here...
@bkieser same here, no llama.cpp some issue with vocab.

Yes we're running into this more and more Llama 3 8B PTH version also breaks llama.cpp with "sentence" vocab.

Sign up or log in to comment