Daniel van Strien PRO

davanstrien

AI & ML interests

Machine Learning Librarian

Articles

Organizations

davanstrien's activity

upvoted an article 1 day ago
view article
Article

Training and Finetuning Embedding Models with Sentence Transformers v3

59
upvoted an article 2 days ago
view article
Article

⚗️ 🔥 Building High-Quality Datasets with distilabel and Prometheus 2

20
upvoted an article 10 days ago
view article
Article

Synthetic data: save money, time and carbon with open source

28
upvoted an article 18 days ago
view article
Article

Introducing the Open Arabic LLM Leaderboard

47
upvoted an article 29 days ago
view article
Article

🧑‍⚖️ "Replacing Judges with Juries" using distilabel

14
upvoted 2 articles about 1 month ago
view article
Article

Jupyter X Hugging Face

2
view article
Article

⚗️ 🧑🏼‍🌾 Let's grow some Domain Specific Datasets together

27
upvoted 3 articles about 1 month ago
view article
Article

Post-OCR-Correction: 1 billion words dataset of automated OCR correction by LLM

10
view article
Article

🦙⚗️ Using Llama3 and distilabel to build fine-tuning datasets

55
view article
Article

The Hugging Face Hub for Galleries, Libraries, Archives and Museums

1
upvoted an article about 1 month ago
view article
Article

Releasing Youtube-Commons: a massive open corpus for conversational and multimodal data

20
upvoted 4 articles about 2 months ago
view article
Article

Text2SQL using Hugging Face Dataset Viewer API and Motherduck DuckDB-NSQL-7B

20
view article
Article

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

5
view article
Article

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

24