Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
Sentdex 
posted an update 12 days ago
Post
2343
Benchmarks!

I have lately been diving deep into the main benchmarks we all use to evaluate and compare models.

If you've never actually looked under the hood for how benchmarks work, check out the LM eval harness from EleutherAI: https://github.com/EleutherAI/lm-evaluation-harness

+ check out the benchmark datasets, you can find the ones for the LLM leaderboard on the about tab here: HuggingFaceH4/open_llm_leaderboard, then click the dataset and actually peak at the data that comprises these benchmarks.

It feels to me like benchmarks only represent a tiny portion of what we actually use and want LLMs for, and I doubt I'm alone in that sentiment.

Beyond this, the actual evaluations of responses from models are extremely strict and often use even rudimentary NLP techniques when, at this point, we have LLMs themselves that are more than capable at evaluating and scoring responses.

It feels like we've made great strides in the quality of LLMs themselves, but almost no change in the quality of how we benchmark.

If you have any ideas for how benchmarks could be a better assessment of an LLM, or know of good research papers that tackle this challenge, please share!

I find this paper interesting, and it's also fun to read. I like the part where they discuss the dimensions of evaluation and where current benchmarks are now.

Check it out: https://arxiv.org/abs/2212.09746

You may also find this article about the MMLU incident interesting: https://huggingface.co/blog/open-llm-leaderboard-mmlu
halie.png

Recently, I volunteered to help a local non-profit organization strengthen their branding. I needed to understand what is brand narrative and how to effectively convey it. I found a fantastic article that explained the power of storytelling in branding. It highlighted the essential elements of crafting a compelling brand story and how to connect emotionally with the audience. The article was incredibly detailed, offering step-by-step guidance on developing a narrative that resonates. I appreciated how it emphasized authenticity and emotional engagement, which was crucial for the non-profit’s mission.