Davipar

Davipar

AI & ML interests

None yet

Recent Activity

liked a Space 26 days ago

PR-Puppets/PR-Puppet-Sora

liked a model 30 days ago

mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis

liked a model about 1 month ago

si-pbc/hertz-dev

View all activity

Organizations

Davipar's activity

liked a Space 26 days ago

Running

638

👁

mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis

Text Classification • Updated Jan 21 • 905k • • 355

liked a model about 1 month ago

si-pbc/hertz-dev

Audio-to-Audio • Updated Nov 14 • 208

liked a model 3 months ago

MAISAAI/gemma-2b-coder

Text Generation • Updated May 8 • 476 • 5

liked a Space 3 months ago

Running on T4

1.01k

🎙️

Open NotebookLM

Personalised Podcasts For All - Available in 13 Languages

liked a dataset 3 months ago

TIGER-Lab/MMLU-Pro

Viewer • Updated 25 days ago • 12.1k • 38.9k • 302

liked a model 4 months ago

mattshumer/Reflection-Llama-3.1-70B

Text Generation • Updated Sep 24 • 879 • 1.71k

liked a Space 4 months ago

Running on Zero

🚀

Eagle X5 13B Chat

reacted to mrm8488's post with ❤️ 6 months ago

Post

4647

🚨Exciting news for the Multilingual Synthetic Data Community!🚨

I’ve taken inspiration from the MAGPIE paper on Llama-3-8B-instruct and extended its capabilities. Here’s what’s new!

🗞 The MAGPIE paper showcased that if you use the instruction-tuned version (Llama-3-8B-instruct) to generate synthetic instructions and then fine-tune the base version (Llama-3-8B) on this dataset, you can improve even the it-tuned version

🤔 While reading a script by Sebastian Raschka, PhD, I wondered: Could these advancements be replicated in other languages? Specifically, could they benefit non-English datasets?

🎉 And the answer is YES! At least for Spanish. I've successfully adapted the techniques for Spanish, proving the model's flexibility and multilingual capabilities.

👩‍💻 To make this accessible, I created a basic script (heavily inspired by the Sebastian Raschka one) that allows you to generate similar datasets using ollama models (initially phi and llama3) automatically and upload it to the Hugging Face Hub!
[Script](https://gist.github.com/mrm8488/4650a5e3cc45523798a527a3446eb312)

🔍 Explore the datasets 📚 generated using our new script!

- [Llama-3-8B](https://huggingface.co/datasets/mrm8488/dataset_llama3_5000_samples_es_4231_filtered)
- [Phi-3-medium](https://huggingface.co/datasets/mrm8488/dataset_phi3-medium_5000_samples_es_3906_filtered)
- [Phi-3-mini](https://huggingface.co/datasets/mrm8488/dataset_phi3_5000_samples_es_3282_filtered)

Note: These datasets have basic filtering. Apply additional quality filters before using them to fine-tune large language models.

Inspiration and base script:
https://github.com/rasbt/LLMs-from-scratch/blob/main/ch07/05_dataset-generation/llama3-ollama.ipynb
https://www.linkedin.com/feed/update/urn:li:activity:7210982019751661568/