fahrizalfarid

akahana

AI & ML interests

NLP

Recent Activity

updated a dataset about 2 hours ago
akahana/vlm
published a dataset about 2 hours ago
akahana/vlm
updated a model 1 day ago
akahana/llm-models
View all activity

Organizations

None yet

akahana's activity

updated a dataset about 2 hours ago
published a dataset about 2 hours ago
upvoted an article 2 days ago
reacted to lewtun's post with πŸ”₯ 2 days ago
view post
Post
1821
Introducing OlympicCoder: a series of open reasoning models that can solve olympiad-level programming problems πŸ§‘β€πŸ’»

- 7B open-r1/OlympicCoder-7B
- 32B open-r1/OlympicCoder-32B

We find that OlympicCoder models outperform Claude 3.7 Sonnet, as well as others over 100x larger πŸ’ͺ

Together with the models, we are releasing:

πŸ“ŠCodeForces-CoTs: new dataset of code problems from the most popular competitive coding platform, with R1 traces in C++ and Python open-r1/codeforces-cots

πŸ† IOI'2024: a new benchmark of VERY hard programming problems where even frontier models struggle to match human performance open-r1/ioi

For links to the models and datasets, check out our latest progress report from Open R1: https://huggingface.co/blog/open-r1/update-3
  • 1 reply
Β·
reacted to prithivMLmods's post with πŸ€— 2 days ago
reacted to tomaarsen's post with ❀️ 3 days ago
view post
Post
6213
An assembly of 18 European companies, labs, and universities have banded together to launch πŸ‡ͺπŸ‡Ί EuroBERT! It's a state-of-the-art multilingual encoder for 15 European languages, designed to be finetuned for retrieval, classification, etc.

πŸ‡ͺπŸ‡Ί 15 Languages: English, French, German, Spanish, Chinese, Italian, Russian, Polish, Portuguese, Japanese, Vietnamese, Dutch, Arabic, Turkish, Hindi
3️⃣ 3 model sizes: 210M, 610M, and 2.1B parameters - very very useful sizes in my opinion
➑️ Sequence length of 8192 tokens! Nice to see these higher sequence lengths for encoders becoming more common.
βš™οΈ Architecture based on Llama, but with bi-directional (non-causal) attention to turn it into an encoder. Flash Attention 2 is supported.
πŸ”₯ A new Pareto frontier (stronger *and* smaller) for multilingual encoder models
πŸ“Š Evaluated against mDeBERTa, mGTE, XLM-RoBERTa for Retrieval, Classification, and Regression (after finetuning for each task separately): EuroBERT punches way above its weight.
πŸ“ Detailed paper with all details, incl. data: FineWeb for English and CulturaX for multilingual data, The Stack v2 and Proof-Pile-2 for code.

Check out the release blogpost here: https://huggingface.co/blog/EuroBERT/release
* EuroBERT/EuroBERT-210m
* EuroBERT/EuroBERT-610m
* EuroBERT/EuroBERT-2.1B

The next step is for researchers to build upon the 3 EuroBERT base models and publish strong retrieval, zero-shot classification, etc. models for all to use. I'm very much looking forward to it!
  • 1 reply
Β·
published a model 5 days ago