Daniel Vila

dvilasuero

AI & ML interests

RLHF, RLAIF, DPO, data, data, data

Articles

Organizations

dvilasuero's activity

reacted to albertvillanova's post with πŸ‘ 17 days ago
view post
Post
1899
🚨 We’ve just released a new tool to compare the performance of models in the πŸ€— Open LLM Leaderboard: the Comparator πŸŽ‰
open-llm-leaderboard/comparator

Want to see how two different versions of LLaMA stack up? Let’s walk through a step-by-step comparison of LLaMA-3.1 and LLaMA-3.2. πŸ¦™πŸ§΅πŸ‘‡

1/ Load the Models' Results
- Go to the πŸ€— Open LLM Leaderboard Comparator: open-llm-leaderboard/comparator
- Search for "LLaMA-3.1" and "LLaMA-3.2" in the model dropdowns.
- Press the Load button. Ready to dive into the results!

2/ Compare Metric Results in the Results Tab πŸ“Š
- Head over to the Results tab.
- Here, you’ll see the performance metrics for each model, beautifully color-coded using a gradient to highlight performance differences: greener is better! 🌟
- Want to focus on a specific task? Use the Task filter to hone in on comparisons for tasks like BBH or MMLU-Pro.

3/ Check Config Alignment in the Configs Tab βš™οΈ
- To ensure you’re comparing apples to apples, head to the Configs tab.
- Review both models’ evaluation configurations, such as metrics, datasets, prompts, few-shot configs...
- If something looks off, it’s good to know before drawing conclusions! βœ…

4/ Compare Predictions by Sample in the Details Tab πŸ”
- Curious about how each model responds to specific inputs? The Details tab is your go-to!
- Select a Task (e.g., MuSR) and then a Subtask (e.g., Murder Mystery) and then press the Load Details button.
- Check out the side-by-side predictions and dive into the nuances of each model’s outputs.

5/ With this tool, it’s never been easier to explore how small changes between model versions affect performance on a wide range of tasks. Whether you’re a researcher or enthusiast, you can instantly visualize improvements and dive into detailed comparisons.

πŸš€ Try the πŸ€— Open LLM Leaderboard Comparator now and take your model evaluations to the next level!
reacted to m-ric's post with πŸ‘€ 17 days ago
view post
Post
1663
By far the coolest release of the day!
> The Open LLM Leaderboard, most comprehensive suite for comparing Open LLMs on many benchmarks, just released a comparator tool that lets you dig into the detail of differences between any models.

Here's me checking how the new Llama-3.1-Nemotron-70B that we've heard so much compares to the original Llama-3.1-70B. πŸ€”πŸ”Ž

Try it out here πŸ‘‰ open-llm-leaderboard/comparator
  • 2 replies
Β·
reacted to davidberenstein1957's post with ❀️ 17 days ago
view post
Post
1602
You can now build a custom text classifier without days of human labeling!

πŸ‘ LLMs work reasonably well as text classifiers.
πŸ‘Ž They are expensive to run at scale and their performance drops in specialized domains.

πŸ‘ Purpose-built classifiers have low latency and can potentially run on CPU.
πŸ‘Ž They require labeled training data.

Combine the best of both worlds: the automatic labeling capabilities of LLMs and the high-quality annotations from human experts to train and deploy a specialized model.

Blog: https://huggingface.co/blog/sdiazlor/custom-text-classifier-ai-human-feedback
posted an update 17 days ago
view post
Post
922
Big news! You can now build strong ML models without days of human labelling

You simply:
- Define your dataset, including annotation guidelines, labels and fields
- Optionally label some records manually.
- Use an LLM to auto label your data with a human (you? your team?) in the loop!

Get started with this blog post:
https://huggingface.co/blog/sdiazlor/custom-text-classifier-ai-human-feedback
reacted to clem's post with πŸ‘ 26 days ago
view post
Post
4120
Open-source AI creates healthy competition in a field where natural tendencies lead to extreme concentration of power. Imagine a world where only one or two companies could build software. This is the biggest risk and ethical challenge of them all IMO. Let's fight this!
  • 3 replies
Β·
reacted to clem's post with πŸš€ 26 days ago
view post
Post
4120
Open-source AI creates healthy competition in a field where natural tendencies lead to extreme concentration of power. Imagine a world where only one or two companies could build software. This is the biggest risk and ethical challenge of them all IMO. Let's fight this!
  • 3 replies
Β·
reacted to clem's post with 😎 26 days ago
view post
Post
4120
Open-source AI creates healthy competition in a field where natural tendencies lead to extreme concentration of power. Imagine a world where only one or two companies could build software. This is the biggest risk and ethical challenge of them all IMO. Let's fight this!
  • 3 replies
Β·
reacted to clem's post with ❀️ 26 days ago
view post
Post
4120
Open-source AI creates healthy competition in a field where natural tendencies lead to extreme concentration of power. Imagine a world where only one or two companies could build software. This is the biggest risk and ethical challenge of them all IMO. Let's fight this!
  • 3 replies
Β·
posted an update about 1 month ago
view post
Post
387
Explore FinePersonas, visually with Argilla and black-forest-labs/FLUX.1-schnell


Excited to share this space where the community can explore a tiny subset of FinePersonas

argilla/finepersonas


Dataset built with distilabel and Free Serveless endpoints

This is just a first step towards more interesting experiments with FinePersonas, for example can we use it to assess biases in text2image models?

If you have ideas I'd love to hear them in the comments!

reacted to davidberenstein1957's post with πŸš€ about 2 months ago
view post
Post
1816
🌟 Argilla v2.1.0 goes multi-modal: Image Field, Dark Mode, Enhanched Hugging Face Hub imports and more!

πŸ–Ό Image Field: Seamlessly work with multimodal datasets
πŸŒ“ Dark Mode: Reduce eye strain with our sleek new look
πŸ€— Enhanced Hugging Face Hub import with the SDK
πŸ‡ͺπŸ‡Έ Spanish UI: Breaking language barriers

Plus more improvements to supercharge your model curation workflow!

Check out the full announcement for details and code examples: https://github.com/argilla-io/argilla/compare/v2.0.1...v2.1.0
reacted to davidberenstein1957's post with 😎 about 2 months ago
view post
Post
1816
🌟 Argilla v2.1.0 goes multi-modal: Image Field, Dark Mode, Enhanched Hugging Face Hub imports and more!

πŸ–Ό Image Field: Seamlessly work with multimodal datasets
πŸŒ“ Dark Mode: Reduce eye strain with our sleek new look
πŸ€— Enhanced Hugging Face Hub import with the SDK
πŸ‡ͺπŸ‡Έ Spanish UI: Breaking language barriers

Plus more improvements to supercharge your model curation workflow!

Check out the full announcement for details and code examples: https://github.com/argilla-io/argilla/compare/v2.0.1...v2.1.0
reacted to davidberenstein1957's post with πŸ€— 3 months ago
view post
Post
2390
βš—οΈ Find reusable synthetic data pipeline code and corresponding datasets on the @huggingface Hub.

Find your pipline and use $ distilabel pipeline run --config "hugging_face_dataset_url/pipeline.yaml"

Some components I used
- Embedded dataset viewer https://huggingface.co/docs/hub/main/en/datasets-viewer-embed
- Hugging Face fsspec https://huggingface.co/docs/huggingface_hub/main/en/guides/hf_file_system
- distilabel https://distilabel.argilla.io/latest/
- Gradio leaderboard by Freddy Boulton freddyaboulton/gradio_leaderboard
- Gradio modal by Ali Abid

Space: davidberenstein1957/distilabel-synthetic-data-pipeline-explorer
reacted to reach-vb's post with πŸš€ 4 months ago
view post
Post
3342
What an eventful day in Open Source LLMs today:

Mistral released Codestral Mamba 🐍
> Beats DeepSeek QwenCode, best model < 10B, competitive with Codestral 22B
> Mamba 2 architecture - supports up to 256K context
> Apache 2.0 licensed, perfect for local code assistant
> Transformers & llama.cpp integration upcoming!

Model checkpoint: https://huggingface.co/mistralai/mamba-codestral-7B-v0.1

Hugging Face dropped SmolLM 🀏
> Beats MobileLLM, Qwen 0.5B, Phi 1.5B and more!
> 135M, 360M, and 1.7B param model checkpoints
> Trained on 600B high-quality synthetic + FineWeb Edu tokens
> Architecture: Llama + GQA + 2048 ctx length
> Ripe for fine-tuning and on-device deployments.
> Works out of the box with Transformers!

Model checkpoints: HuggingFaceTB/smollm-6695016cad7167254ce15966

Mistral released Mathstral 7B βˆ‘
> 56.6% on MATH and 63.47% on MMLU
> Same architecture as Mistral 7B
> Works out of the box with Transformers & llama.cpp
> Released under Apache 2.0 license

Model checkpoint: https://huggingface.co/mistralai/mathstral-7B-v0.1

Pretty dope day for open source ML. Can't wait to see what the community builds with it and to support them further! πŸ€—

What's your favourite from the release today?
  • 1 reply
Β·
reacted to reach-vb's post with 🀝 4 months ago
view post
Post
3342
What an eventful day in Open Source LLMs today:

Mistral released Codestral Mamba 🐍
> Beats DeepSeek QwenCode, best model < 10B, competitive with Codestral 22B
> Mamba 2 architecture - supports up to 256K context
> Apache 2.0 licensed, perfect for local code assistant
> Transformers & llama.cpp integration upcoming!

Model checkpoint: https://huggingface.co/mistralai/mamba-codestral-7B-v0.1

Hugging Face dropped SmolLM 🀏
> Beats MobileLLM, Qwen 0.5B, Phi 1.5B and more!
> 135M, 360M, and 1.7B param model checkpoints
> Trained on 600B high-quality synthetic + FineWeb Edu tokens
> Architecture: Llama + GQA + 2048 ctx length
> Ripe for fine-tuning and on-device deployments.
> Works out of the box with Transformers!

Model checkpoints: HuggingFaceTB/smollm-6695016cad7167254ce15966

Mistral released Mathstral 7B βˆ‘
> 56.6% on MATH and 63.47% on MMLU
> Same architecture as Mistral 7B
> Works out of the box with Transformers & llama.cpp
> Released under Apache 2.0 license

Model checkpoint: https://huggingface.co/mistralai/mathstral-7B-v0.1

Pretty dope day for open source ML. Can't wait to see what the community builds with it and to support them further! πŸ€—

What's your favourite from the release today?
  • 1 reply
Β·
reacted to reach-vb's post with πŸ”₯ 4 months ago
view post
Post
3342
What an eventful day in Open Source LLMs today:

Mistral released Codestral Mamba 🐍
> Beats DeepSeek QwenCode, best model < 10B, competitive with Codestral 22B
> Mamba 2 architecture - supports up to 256K context
> Apache 2.0 licensed, perfect for local code assistant
> Transformers & llama.cpp integration upcoming!

Model checkpoint: https://huggingface.co/mistralai/mamba-codestral-7B-v0.1

Hugging Face dropped SmolLM 🀏
> Beats MobileLLM, Qwen 0.5B, Phi 1.5B and more!
> 135M, 360M, and 1.7B param model checkpoints
> Trained on 600B high-quality synthetic + FineWeb Edu tokens
> Architecture: Llama + GQA + 2048 ctx length
> Ripe for fine-tuning and on-device deployments.
> Works out of the box with Transformers!

Model checkpoints: HuggingFaceTB/smollm-6695016cad7167254ce15966

Mistral released Mathstral 7B βˆ‘
> 56.6% on MATH and 63.47% on MMLU
> Same architecture as Mistral 7B
> Works out of the box with Transformers & llama.cpp
> Released under Apache 2.0 license

Model checkpoint: https://huggingface.co/mistralai/mathstral-7B-v0.1

Pretty dope day for open source ML. Can't wait to see what the community builds with it and to support them further! πŸ€—

What's your favourite from the release today?
  • 1 reply
Β·
reacted to davidberenstein1957's post with 🧠 5 months ago
view post
Post
2363
βš—οΈ Looking to get started with Synthetic data and AI Feedback?

I created this cool notebook for a workshop @davanstrien and I gave it a couple of weeks back. It uses https://distilabel.argilla.io/dev/ and I think it is a good entry point for anyone with a practical interest in the topic.

https://colab.research.google.com/github/davanstrien/data-for-fine-tuning-llms/blob/main/03-synthetic-data-generation.ipynb
  • 5 replies
Β·
reacted to davidberenstein1957's post with πŸš€ 5 months ago
view post
Post
2363
βš—οΈ Looking to get started with Synthetic data and AI Feedback?

I created this cool notebook for a workshop @davanstrien and I gave it a couple of weeks back. It uses https://distilabel.argilla.io/dev/ and I think it is a good entry point for anyone with a practical interest in the topic.

https://colab.research.google.com/github/davanstrien/data-for-fine-tuning-llms/blob/main/03-synthetic-data-generation.ipynb
  • 5 replies
Β·
reacted to davidberenstein1957's post with πŸ”₯ 5 months ago
view post
Post
2363
βš—οΈ Looking to get started with Synthetic data and AI Feedback?

I created this cool notebook for a workshop @davanstrien and I gave it a couple of weeks back. It uses https://distilabel.argilla.io/dev/ and I think it is a good entry point for anyone with a practical interest in the topic.

https://colab.research.google.com/github/davanstrien/data-for-fine-tuning-llms/blob/main/03-synthetic-data-generation.ipynb
  • 5 replies
Β·
replied to their post 5 months ago
replied to their post 5 months ago
view reply

We'll keep you posted in both channels but I think it will make sense to shift our efforts towards the HF discord very soon