Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
davanstrien 
posted an update Mar 13
Post
Can we improve the quality of open LLMs for more languages?

Step 1: Evaluate current SOTA.

The Data Is Better Together community has rated more than 10K prompts for quality. We now want to translate a subset of these to help address the language gap in model evals.

The plan is roughly this:

- We started with DIBT/10k_prompts_ranked and took a subset of 500 high-quality prompts
- We're asking the community to translate these prompts into different languages
- We'll evaluate the extent to which we can use AlpacaEval and similar approaches to rate the outputs of models across these different languages
- If it works well, we can more easily evaluate open LLMs across different languages by using a judge LLM to rate the quality of outputs from different models.

You can find more details in our new GitHub repo: https://github.com/huggingface/data-is-better-together (don't forget to give it a ⭐!)
In this post