91.9 HellaSwag, 79.2 TruthfulQA... And It Sucks. Why do this?

by Phil337 - opened Mar 17

Mar 17

No info on the databases or methods used, absurdly high scores, horrible performance, compulsive lies, extreme stubbornness, inability to solve simple logic puzzles... Why? Why are people downloading this, let alone upvoting this? Why is HF not automatically testing LLMs that get absurdly high scores for a given model and parameter count? I'm so tired of this nonsense.

akhil3417

Mar 17

open llm leaderboard is a joke for sure

LordTwave

Mar 18

Heresay. HF needs reason and evidence to change, more than testimony.

Phil337

Mar 18

@LordTwave You're right, "HF needs reason and evidence", which is why despite a sea of cheaters almost nobody bothers reporting anything anymore.

That's why I left a comment. It's my way of saving potential downloaders time and effort. Anybody using this LLM for more than a second knows with >99.999% certainty that a HellaSwag score of 91.9 is utter nonsense. LLMs with far greater language skills don't even have that score. It's not hearsay. This LLM's scores are a load of crap.

duypro247

Mar 20

This is a garbage model and probably was uploaded as a scam. Whoever made this garbage uploaded it on huggingface, then claim it to be top tier on the "Open LLM leaderboard" to trick investors.

maywell

Mar 21

This is a garbage model and probably was uploaded as a scam. Whoever made this garbage uploaded it on huggingface, then claim it to be top tier on the "Open LLM leaderboard" to trick investors.

Their stock price went up 30% after announcement of this model. lol

bkieser

Mar 29

Doesn't work on Ollama. Fails.

jayleekr

Apr 9

This comment has been hidden

fblgit

7 days ago

@Phil337 https://huggingface.co/fblgit/UNA-ThePitbull-21.4-v1
I tried to make this better for RoleNSFW, lets see wether i was able to solve a few of the problems here...
@bkieser same here, no llama.cpp some issue with vocab.

bkieser

7 days ago

@Phil337 https://huggingface.co/fblgit/UNA-ThePitbull-21.4-v1
I tried to make this better for RoleNSFW, lets see wether i was able to solve a few of the problems here...
@bkieser same here, no llama.cpp some issue with vocab.

Yes we're running into this more and more Llama 3 8B PTH version also breaks llama.cpp with "sentence" vocab.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment