Dan Saattrup Nielsen's picture

Dan Saattrup Nielsen

saattrupdan

·

https://saattrupdan.com

AI & ML interests

NLP for low-resource languages.

Recent Activity

new activity 5 days ago

alexandrainst/coral:Conversational part of the dataset?

liked a model 6 days ago

jinaai/jina-reranker-v2-base-multilingual

liked a model 6 days ago

jinaai/jina-reranker-m0

View all activity

Organizations

saattrupdan's activity

upvoted 2 collections 9 days ago

SmolLM CPT LoRA

16 items • Updated 5 days ago • 1

SmolLM baselines trained from scratch

2 items • Updated Jan 30 • 1

upvoted a collection 11 days ago

Llama 4

Llama 4 release • 10 items • Updated 11 days ago • 430

upvoted a collection about 2 months ago

Ahma models

12 items • Updated Feb 3 • 3

upvoted a paper 2 months ago

FoQA: A Faroese Question-Answering Dataset

Paper • 2502.07642 • Published Feb 11 • 2

upvoted 2 collections 3 months ago

🧚🏼‍♀️Lucie LLM

Open source LLM for French, English, German, Spanish and Italian • 8 items • Updated 29 days ago • 21

FrenchBench Evaluation datasets

These datasets are used to evaluate models on French performance using: https://github.com/EleutherAI/lm-evaluation-harness (from CroissantLLM paper) • 11 items • Updated Jun 7, 2024 • 7

upvoted a collection 4 months ago

Multilingual LLM Evaluation

Multilingual Evaluation Benchmarks • 8 items • Updated Mar 3 • 25

upvoted 3 collections 5 months ago

Common Corpus

Largest multilingual pretraining data. • 1 item • Updated Nov 13, 2024 • 10

OpenCoder

OpenCoder is an open and reproducible code LLM family which matches the performance of top-tier code LLMs. • 8 items • Updated Nov 23, 2024 • 81

POTION

These are the flagship POTION models. Load them and use them with model2vec (https://github.com/MinishLab/model2vec) or sentence-transformers • 5 items • Updated Feb 3 • 10

upvoted a paper 6 months ago

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12, 2024 • 69

upvoted a collection 6 months ago

VPTQ Llama 3.1 70B Instruct without finetune

arxiv.org/abs/2409.17066, VPTQ Llama 3.1 70B without finetune • 9 items • Updated Oct 18, 2024 • 1

upvoted a collection 10 months ago

Nemotron 4 340B

Nemotron-4: open models for Synthetic Data Generation (SDG). Includes Base, Instruct, and Reward models. • 4 items • Updated 1 day ago • 162

upvoted 2 collections 12 months ago

Meta Llama 3

This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Dec 6, 2024 • 740

Czech evaluation datasets

This collections should contain czech evaluation datasets • 8 items • Updated Jan 14, 2024 • 3

upvoted a collection about 1 year ago

OpenCulture

A multilingual dataset of public domain books and newspapers. • 27 items • Updated Nov 6, 2024 • 125