Daniel van Strien PRO

davanstrien

AI & ML interests

Machine Learning Librarian

Articles

Organizations

davanstrien's activity

upvoted an article 2 days ago
view article
Article

Training and Finetuning Embedding Models with Sentence Transformers v3

63
upvoted an article 4 days ago
view article
Article

⚗️ 🔥 Building High-Quality Datasets with distilabel and Prometheus 2

20
upvoted an article 12 days ago
view article
Article

Synthetic data: save money, time and carbon with open source

29
upvoted an article 20 days ago
view article
Article

Introducing the Open Arabic LLM Leaderboard

47
upvoted an article about 1 month ago
view article
Article

🧑‍⚖️ "Replacing Judges with Juries" using distilabel

14
upvoted 2 articles about 1 month ago
view article
Article

Jupyter X Hugging Face

2
view article
Article

⚗️ 🧑🏼‍🌾 Let's grow some Domain Specific Datasets together

27
upvoted 3 articles about 1 month ago
view article
Article

Post-OCR-Correction: 1 billion words dataset of automated OCR correction by LLM

10
view article
Article

🦙⚗️ Using Llama3 and distilabel to build fine-tuning datasets

55
view article
Article

The Hugging Face Hub for Galleries, Libraries, Archives and Museums

1
upvoted an article about 2 months ago
view article
Article

Releasing Youtube-Commons: a massive open corpus for conversational and multimodal data

20
upvoted 4 articles about 2 months ago
view article
Article

Text2SQL using Hugging Face Dataset Viewer API and Motherduck DuckDB-NSQL-7B

20
view article
Article

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

5
view article
Article

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

24