Daniel van Strien's picture

Daniel van Strien PRO

davanstrien

·

https://danielvanstrien.xyz/

AI & ML interests

Machine Learning Librarian

Recent Activity

liked a dataset about 2 hours ago

CATMuS/medieval-segmentation

liked a dataset about 2 hours ago

nomic-ai/cornstack-python-v1

liked a model about 2 hours ago

biglam/medieval-manuscript-yolov11

View all activity

Organizations

davanstrien's activity

upvoted a paper about 10 hours ago

The Coralscapes Dataset: Semantic Scene Understanding in Coral Reefs

Paper • 2503.20000 • Published 2 days ago • 1

upvoted a paper 1 day ago

BiblioPage: A Dataset of Scanned Title Pages for Bibliographic Metadata Extraction

Paper • 2503.19658 • Published 2 days ago • 1

upvoted a paper 2 days ago

REALM: A Dataset of Real-World LLM Use Cases

Paper • 2503.18792 • Published 3 days ago • 1

upvoted a paper 3 days ago

SkyLadder: Better and Faster Pretraining via Context Window Scheduling

Paper • 2503.15450 • Published 8 days ago • 11

upvoted an article 6 days ago

Article

The New and Fresh analytics in Inference Endpoints

7 days ago

• 17

upvoted a paper 6 days ago

InsectSet459: an open dataset of insect sounds for bioacoustic machine learning

Paper • 2503.15074 • Published 8 days ago • 1

upvoted a collection 7 days ago

Brazilian legal datasets ⚖️

A collection of data extracted from the courts of Brazil (and others legal websites) • 31 items • Updated 8 days ago • 2

upvoted a collection 14 days ago

Open-Sora 2.0

3 items • Updated 16 days ago • 10

upvoted 3 papers 15 days ago

Robusto-1 Dataset: Comparing Humans and VLMs on real out-of-distribution Autonomous Driving VQA from Peru

Paper • 2503.07587 • Published 17 days ago • 10

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia

Paper • 2503.07920 • Published 17 days ago • 95

JurisTCU: A Brazilian Portuguese Information Retrieval Dataset with Query Relevance Judgments

Paper • 2503.08379 • Published 16 days ago • 2

upvoted a paper 17 days ago

EuroBERT: Scaling Multilingual Encoders for European Languages

Paper • 2503.05500 • Published 20 days ago • 75

upvoted an article 23 days ago

Article

HuggingFace, IISc partner to supercharge model building on India's diverse languages

29 days ago

• 17

upvoted 2 collections 28 days ago

rank1

rank1 is the first test-time compute reasoning model in IR • 15 items • Updated 28 days ago • 3

OWLS: Scaling Laws for Speech Recognition and Translation

🦉 A suite of Whisper-style models from 250M to 18B parameters. Trained on up to 360K hours of data. 16k sampling rate. • 7 items • Updated 17 days ago • 4

upvoted a collection 29 days ago

Granite 3.2 Language Models

3 items • Updated 29 days ago • 19

upvoted 2 papers about 1 month ago

Minions: Cost-efficient Collaboration Between On-device and Cloud Language Models

Paper • 2502.15964 • Published Feb 21 • 1

"Actionable Help" in Crises: A Novel Dataset and Resource-Efficient Models for Identifying Request and Offer Social Media Posts

Paper • 2502.16839 • Published Feb 24 • 1