2.75 bpw high EQ bench

#1
by koesn - opened

How 2.75 bpw has high eq bench, even still lower ppl?

I'd probably put that down to a level of error in the benchmark itself. EQ Bench works by asking the LLM to give a score from 1-10 on certain emotions against a brief conversation. I don't know if LLM's are good at a "score this between 1 and 10" in testing. I did find it useful when running a lot of them and seeing patterns across prompt types: Midnight Miqu being good at a lot of difference prompts, Cohere Command models working better with Command-R prompts. But I'd probably trust perplexity over EQ Bench.

Sign up or log in to comment