57 58 267

Daniel Vila

dvilasuero

https://argilla.io

AI & ML interests

RLHF, RLAIF, DPO, data, data, data

Recent Activity

updated a dataset 6 days ago

data-is-better-together/fineweb-c-progress

liked a dataset 6 days ago

m-a-p/FineFineWeb

updated a dataset 6 days ago

dvilasuero/product-reviews-labelled

View all activity

Articles

Argilla 2.4: Easily Build Fine-Tuning and Evaluation datasets on the Hub — No Code Required

Nov 4

• 41

How to build a custom text classifier without days of human labeling

Oct 17

• 55

How to optimize your data labelling project with custom interfaces

Oct 16

• 18

🔥 Argilla 2.0: the data-centric tool for AI makers 🤗

Jul 30

• 37

Llama 3.1 - 405B, 70B & 8B with multilinguality and long context

Jul 23

• 224

How we leveraged distilabel to create an Argilla 2.0 Chatbot

Jul 16

• 32

Ethics and Society Newsletter #6: Building Better AI: The Importance of Data Quality

Jun 24

• 33

🦙⚗️ Using Llama3 and distilabel to build fine-tuning datasets

Jun 4

• 73

Data is better together

Mar 4

• 8

Organizations

dvilasuero's activity

upvoted a collection 17 days ago

FineWeb2 Collaborative Annotation Sprint

Collection

5 items • Updated 1 day ago • 6

upvoted a paper 20 days ago

Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation

Paper • 2412.03304 • Published 22 days ago • 17

upvoted an article 29 days ago

Article

Let’s make a generation of amazing image generation models

•

30 days ago

• 33

upvoted 2 collections about 1 month ago

Dataset transformation, preparation and edition

Collection

3 items • Updated 13 days ago • 5

Dataset Creation

Collection

Spaces and utilities for creating datasets and getting them on the Hub • 3 items • Updated Nov 10 • 10

upvoted an article about 1 month ago

Article

Halo: Open Source Health Tracking with Wearables

•

Nov 19

• 96

upvoted 2 articles 2 months ago

Article

How to build a custom text classifier without days of human labeling

•

Oct 17

• 55

Article

How to optimize your data labelling project with custom interfaces

•

Oct 16

• 18

upvoted an article 3 months ago

Article

Scaling AI-based Data Processing with Hugging Face + Dask

Oct 9

• 27

upvoted 2 collections 3 months ago

LLM Reasoning Papers

Collection

Papers to improve reasoning capabilities of LLMs • 17 items • Updated 3 days ago • 91

Critique-out-Loud Reward Models

Collection

Paper: https://arxiv.org/abs/2408.11791 | Code: https://github.com/zankner/CLoud • 7 items • Updated Sep 5 • 3

upvoted 3 articles 3 months ago

Article

Let's talk about LLM evaluation

•

May 23

• 140

Article

ColPali: Efficient Document Retrieval with Vision Language Models 👀

•

Jul 5

• 182

Article

Llama can now see and run on your device - welcome Llama 3.2

Sep 25

• 180

upvoted a collection 3 months ago

Molmo

Collection

Artifacts for open multimodal language models. • 5 items • Updated 28 days ago • 289

upvoted a paper 3 months ago

V-STaR: Training Verifiers for Self-Taught Reasoners

Paper • 2402.06457 • Published Feb 9 • 9

upvoted an article 3 months ago

Article

Preference Optimization for Vision Language Models

Jul 10

• 53

upvoted an article 4 months ago

Article

Fine-tuning a token classification model for legal data using Argilla and AutoTrain

•

Sep 7

• 14

upvoted 2 papers 4 months ago

The Future of Open Human Feedback

Paper • 2408.16961 • Published Aug 15 • 21

Direct Language Model Alignment from Online AI Feedback

Paper • 2402.04792 • Published Feb 7 • 29

Daniel Vila

AI & ML interests

Recent Activity

Articles

FineWeb2-C: Help Build Better Language Models in Your Language

Introducing the Synthetic Data Generator - Build Datasets with Natural Language

Open Preference Dataset for Text-to-Image Generation by the 🤗 Community

Let’s make a generation of amazing image generation models

Argilla 2.4: Easily Build Fine-Tuning and Evaluation datasets on the Hub — No Code Required

How to build a custom text classifier without days of human labeling

How to optimize your data labelling project with custom interfaces

🔥 Argilla 2.0: the data-centric tool for AI makers 🤗

Llama 3.1 - 405B, 70B & 8B with multilinguality and long context

How we leveraged distilabel to create an Argilla 2.0 Chatbot

Ethics and Society Newsletter #6: Building Better AI: The Importance of Data Quality

🦙⚗️ Using Llama3 and distilabel to build fine-tuning datasets

Data is better together

Organizations

dvilasuero's activity

Let’s make a generation of amazing image generation models

Halo: Open Source Health Tracking with Wearables

How to build a custom text classifier without days of human labeling

How to optimize your data labelling project with custom interfaces

Scaling AI-based Data Processing with Hugging Face + Dask

Let's talk about LLM evaluation

ColPali: Efficient Document Retrieval with Vision Language Models 👀

Llama can now see and run on your device - welcome Llama 3.2

Preference Optimization for Vision Language Models

Fine-tuning a token classification model for legal data using Argilla and AutoTrain