Rohit Khatri's picture
1

Rohit Khatri

Rohitkhatri75436
·

AI & ML interests

None yet

Recent Activity

liked a Space about 23 hours ago
enzostvs/deepsite
reacted to fdaudens's post with 🔥 10 months ago
Do you want to improve AI in your language? Here's how you can help. I'm exploring different AI techniques for an upcoming project in journalism, and I wanted to test a cool idea by @davanstrien, Data is better together, which aims to foster a community of people to create DPO datasets in different languages. This project gives the opportunity to explore various concepts: - Direct Preference Optimization (DPO) - Synthetic data - Data annotation - LLM as a judge 1️⃣ Take the Aya dataset of human-annotated prompt-completion pairs across 71 languages and filter it to include only those in the language you’re interested in. 2️⃣ Use distilabel from Argilla to generate a second response for each prompt and evaluate which response is best. Basicaly, DPO datasets have a chosen and a rejected responses to a question, which helps align models on specific tasks. To quote Daniel: "Currently, there are only a few DPO datasets available for a limited number of languages. By generating more DPO datasets for different languages, we can help to improve the quality of generative models in a wider range of languages." 3️⃣ Send this dataset and evaluations to the easy-to-use interface to evaluate the evaluations. This is where you can help. :) You can rate the LLM evaluation of the prompt-responses pairs. For my example, I built a dataset in French. And without wanting to start a debate about homeopathy, the second result is clearly better in the example below! https://huggingface.co/spaces/fdaudens/demo-aya-dpo-french The final dataset can be found here: https://huggingface.co/datasets/fdaudens/aya_french_dpo To contribute to other languages and learn more about synthetic data, you can also produce datasets in the language of your choice! Read more about the project: https://github.com/huggingface/data-is-better-together/blob/main/dpo/README.md
View all activity

Organizations

None yet

models

None public yet

datasets

None public yet