Rohit Khatri's picture

1

Rohit Khatri

Rohitkhatri75436

·

AI & ML interests

None yet

Recent Activity

liked a Space about 23 hours ago

enzostvs/deepsite

reacted to morgan's post with 🔥 8 months ago

Llama 3.1 405B Instruct beats GPT-4o on MixEval-Hard Just ran MixEval for 405B, Sonnet-3.5 and 4o, with 405B landing right between the other two at 66.19 The GPT-4o result of 64.7 replicated locally but Sonnet-3.5 actually scored 70.25/69.45 in my replications 🤔 Still well ahead of the other 2 though. Sammple of 1 of the eval calls here: https://wandb.ai/morgan/MixEval/weave/calls/07b05ae2-2ef5-4525-98a6-c59963b76fe1 Quick auto-logging tracing for openai-compatible clients and many more here: https://wandb.github.io/weave/quickstart/

reacted to fdaudens's post with 🔥 10 months ago

Do you want to improve AI in your language? Here's how you can help. I'm exploring different AI techniques for an upcoming project in journalism, and I wanted to test a cool idea by @davanstrien, Data is better together, which aims to foster a community of people to create DPO datasets in different languages. This project gives the opportunity to explore various concepts: - Direct Preference Optimization (DPO) - Synthetic data - Data annotation - LLM as a judge 1️⃣ Take the Aya dataset of human-annotated prompt-completion pairs across 71 languages and filter it to include only those in the language you’re interested in. 2️⃣ Use distilabel from Argilla to generate a second response for each prompt and evaluate which response is best. Basicaly, DPO datasets have a chosen and a rejected responses to a question, which helps align models on specific tasks. To quote Daniel: "Currently, there are only a few DPO datasets available for a limited number of languages. By generating more DPO datasets for different languages, we can help to improve the quality of generative models in a wider range of languages." 3️⃣ Send this dataset and evaluations to the easy-to-use interface to evaluate the evaluations. This is where you can help. :) You can rate the LLM evaluation of the prompt-responses pairs. For my example, I built a dataset in French. And without wanting to start a debate about homeopathy, the second result is clearly better in the example below! https://huggingface.co/spaces/fdaudens/demo-aya-dpo-french The final dataset can be found here: https://huggingface.co/datasets/fdaudens/aya_french_dpo To contribute to other languages and learn more about synthetic data, you can also produce datasets in the language of your choice! Read more about the project: https://github.com/huggingface/data-is-better-together/blob/main/dpo/README.md

View all activity

Organizations

None yet

models

None public yet

datasets

None public yet