Nick Doiron's picture

Nick Doiron

monsoon-nlp

·

https://mapmeld.com/plant-based-llms/

AI & ML interests

biology and multilingual models

Recent Activity

upvoted a paper about 15 hours ago

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

updated a dataset about 16 hours ago

monsoon-nlp/genetic-counselor-multiple-choice

updated a dataset about 18 hours ago

monsoon-nlp/genetic-counselor-mini-demo

View all activity

Organizations

monsoon-nlp's activity

upvoted a paper about 15 hours ago

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Paper • 2503.09573 • Published 2 days ago • 46

upvoted a paper about 23 hours ago

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia

Paper • 2503.07920 • Published 4 days ago • 89

upvoted a collection 1 day ago

BD3-LMs

https://m-arriola.com/bd3lms/ • 4 items • Updated 1 day ago • 12

upvoted a paper 6 days ago

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

Paper • 2502.17424 • Published 18 days ago • 2

upvoted a paper 12 days ago

NeoBERT: A Next-Generation BERT

Paper • 2502.19587 • Published 16 days ago • 38

upvoted 2 papers 19 days ago

ReFT: Representation Finetuning for Language Models

Paper • 2404.03592 • Published Apr 4, 2024 • 94

SEA-HELM: Southeast Asian Holistic Evaluation of Language Models

Paper • 2502.14301 • Published 23 days ago • 1

upvoted a paper 25 days ago

NoLiMa: Long-Context Evaluation Beyond Literal Matching

Paper • 2502.05167 • Published Feb 7 • 15

upvoted 2 papers about 2 months ago

Titans: Learning to Memorize at Test Time

Paper • 2501.00663 • Published Dec 31, 2024 • 21

AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages

Paper • 2501.08284 • Published Jan 14 • 6

upvoted a paper 2 months ago

Building Foundations for Natural Language Processing of Historical Turkish: Resources and Models

Paper • 2501.04828 • Published Jan 8 • 11

upvoted an article 3 months ago

Article

They Said It Couldn’t Be Done

By

and 2 others •

Dec 5, 2024

• 83

upvoted a collection 3 months ago

U-MATH and μ-MATH - University-level math evaluation

Paper: A UNIVERSITY-LEVEL BENCHMARK FOR EVALUATING MATHEMATICAL SKILLS IN LLMS • 4 items • Updated Jan 14 • 15

upvoted 2 papers 3 months ago

Monet: Mixture of Monosemantic Experts for Transformers

Paper • 2412.04139 • Published Dec 5, 2024 • 13

LoFiT: Localized Fine-tuning on LLM Representations

Paper • 2406.01563 • Published Jun 3, 2024 • 1

upvoted a paper 4 months ago

Farmer.Chat: Scaling AI-Powered Agricultural Services for Smallholder Farmers

Paper • 2409.08916 • Published Sep 13, 2024 • 4

upvoted 2 collections 4 months ago

Plant foundation models

A collection of pre-trained DNA models for plant genomes. • 19 items • Updated Oct 23, 2024 • 5

Malaysian synthetic dataset

Use LLM to generate Malaysian context synthetic dataset. • 33 items • Updated Dec 23, 2024 • 1

upvoted a paper 4 months ago

RedCode: Risky Code Execution and Generation Benchmark for Code Agents

Paper • 2411.07781 • Published Nov 12, 2024 • 1

upvoted a paper 5 months ago

MrT5: Dynamic Token Merging for Efficient Byte-level Language Models

Paper • 2410.20771 • Published Oct 28, 2024 • 3