Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
1
Rohit Khatri
Rohitkhatri75436
Follow
0 followers
·
16 following
rohitkh24015708
rohitkhatri75436
AI & ML interests
None yet
Recent Activity
liked
a Space
3 days ago
enzostvs/deepsite
reacted
to
morgan
's
post
with 🔥
8 months ago
Llama 3.1 405B Instruct beats GPT-4o on MixEval-Hard Just ran MixEval for 405B, Sonnet-3.5 and 4o, with 405B landing right between the other two at 66.19 The GPT-4o result of 64.7 replicated locally but Sonnet-3.5 actually scored 70.25/69.45 in my replications 🤔 Still well ahead of the other 2 though. Sammple of 1 of the eval calls here: https://wandb.ai/morgan/MixEval/weave/calls/07b05ae2-2ef5-4525-98a6-c59963b76fe1 Quick auto-logging tracing for openai-compatible clients and many more here: https://wandb.github.io/weave/quickstart/
reacted
to
fdaudens
's
post
with 🔥
11 months ago
Do you want to improve AI in your language? Here's how you can help. I'm exploring different AI techniques for an upcoming project in journalism, and I wanted to test a cool idea by @davanstrien, Data is better together, which aims to foster a community of people to create DPO datasets in different languages. This project gives the opportunity to explore various concepts: - Direct Preference Optimization (DPO) - Synthetic data - Data annotation - LLM as a judge 1️⃣ Take the Aya dataset of human-annotated prompt-completion pairs across 71 languages and filter it to include only those in the language you’re interested in. 2️⃣ Use distilabel from Argilla to generate a second response for each prompt and evaluate which response is best. Basicaly, DPO datasets have a chosen and a rejected responses to a question, which helps align models on specific tasks. To quote Daniel: "Currently, there are only a few DPO datasets available for a limited number of languages. By generating more DPO datasets for different languages, we can help to improve the quality of generative models in a wider range of languages." 3️⃣ Send this dataset and evaluations to the easy-to-use interface to evaluate the evaluations. This is where you can help. :) You can rate the LLM evaluation of the prompt-responses pairs. For my example, I built a dataset in French. And without wanting to start a debate about homeopathy, the second result is clearly better in the example below! https://huggingface.co/spaces/fdaudens/demo-aya-dpo-french The final dataset can be found here: https://huggingface.co/datasets/fdaudens/aya_french_dpo To contribute to other languages and learn more about synthetic data, you can also produce datasets in the language of your choice! Read more about the project: https://github.com/huggingface/data-is-better-together/blob/main/dpo/README.md
View all activity
Organizations
None yet
Rohitkhatri75436
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
liked
a Space
3 days ago
Running
1.19k
1.19k
DeepSite
🐳
Generate any application with DeepSeek