Raghav Prabhakar's picture

Raghav Prabhakar

raghavprabhakar

AI & ML interests

Computer Vision, Deep Learning, Robotics

Recent Activity

liked a dataset 2 months ago
raghavprabhakar/commonsense-embodied-ai
View all activity

Organizations

ONNXConfig for all's profile picture Kornia AI's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture

raghavprabhakar's activity

reacted to thomwolf's post with πŸ”₯πŸš€ 24 days ago
view post
Post
4483
We are proud to announce HuggingFaceFW/fineweb-2: A sparkling update to HuggingFaceFW/fineweb with 1000s of πŸ—£οΈlanguages.

We applied the same data-driven approach that led to SOTA English performance in🍷 FineWeb to thousands of languages.

πŸ₯‚ FineWeb2 has 8TB of compressed text data and outperforms other multilingual datasets in our experiments.

The dataset is released under the permissive πŸ“œ ODC-By 1.0 license, and the πŸ’» code to reproduce it and our evaluations is public.

We will very soon announce a big community project, and are working on a πŸ“ blogpost walking you through the entire dataset creation process. Stay tuned!

In the mean time come ask us question on our chat place: HuggingFaceFW/discussion

H/t @guipenedo @hynky @lvwerra as well as @vsabolcec Bettina Messmer @negar-foroutan and @mjaggi
  • 2 replies
Β·
reacted to merve's post with ❀️ 9 months ago
view post
Post
2846
I see you all send your documents to close-source APIs, this is not ok πŸ‘Ž it breaks my heart πŸ’”
I have seen many open-source document models, and I am amazed by what IDEFICS2 has done with document understanding 🀯🀩 it's not something you've ever seen before! HuggingFaceM4/idefics-8b

Please use it! Has Apache 2.0 license ❀️
reacted to akhaliq's post with ❀️ 11 months ago
view post
Post
Aya Dataset

An Open-Access Collection for Multilingual Instruction Tuning

Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning (2402.06619)

Datasets are foundational to many breakthroughs in modern artificial intelligence. Many recent achievements in the space of natural language processing (NLP) can be attributed to the finetuning of pre-trained models on a diverse set of tasks that enables a large language model (LLM) to respond to instructions. Instruction fine-tuning (IFT) requires specifically constructed and annotated datasets. However, existing datasets are almost all in the English language. In this work, our primary goal is to bridge the language gap by building a human-curated instruction-following dataset spanning 65 languages. We worked with fluent speakers of languages from around the world to collect natural instances of instructions and completions. Furthermore, we create the most extensive multilingual collection to date, comprising 513 million instances through templating and translating existing datasets across 114 languages. In total, we contribute four key resources: we develop and open-source the Aya Annotation Platform, the Aya Dataset, the Aya Collection, and the Aya Evaluation Suite. The Aya initiative also serves as a valuable case study in participatory research, involving collaborators from 119 countries. We see this as a valuable framework for future research collaborations that aim to bridge gaps in resources.
Β·
reacted to Tonic's post with ❀️ 11 months ago
view post
Post
πŸ™‹πŸ»β€β™‚οΈhey there folks ,

πŸ€—Aya has been released ! It's an absolutely massive undertaking to create a huge multilingual dataset and multilingual model of very high quality.

Papers :
https://cohere.com/research/papers/aya-dataset-paper-2024-02-13
https://cohere.com/research/papers/aya-model-paper-2024-02-13

Model : CohereForAI/aya-101
Dataset : CohereForAI/aya_dataset


I am proud to be one of 3,000 humans who built Aya - a new massively multilingual, generative LLM that outperforms existing open-source models and covers 101 different languages. Together, we are accelerating multilingual AI. πŸ€—
  • 1 reply
Β·
liked a Space over 1 year ago
New activity in openai/whisper about 2 years ago
liked a Space over 2 years ago
liked a Space almost 3 years ago