8 65 106

Aurélien-Morgan CLAUDON

Aurelien-Morgan

https://huggingface.co/retrain-pipelines

AI & ML interests

None yet

Recent Activity

reacted to AdinaY's post with 🔥 about 3 hours ago

The Chinese community is shipping 🚢 DeepSeek V3 (685 B MoE) has quietly released on the hub! Base: https://huggingface.co/deepseek-ai/DeepSeek-V3-Base Instruct: https://huggingface.co/deepseek-ai/DeepSeek-V3 Can’t wait to see what’s next!

new activity about 5 hours ago

retrain-pipelines/func_calls_wip:[bot] Conversion to Parquet

new activity 1 day ago

deepseek-ai/DeepSeek-V3-Base:我嘞个dou，这么大

View all activity

Articles

Fancy Stateful Metaflow Service + UI on Google Colab ?

Oct 14

• 4

Organizations

Aurelien-Morgan's activity

reacted to AdinaY's post with 🔥 about 3 hours ago

Post

429

The Chinese community is shipping 🚢

DeepSeek V3 (685 B MoE) has quietly released on the hub!
Base: deepseek-ai/DeepSeek-V3-Base
Instruct: deepseek-ai/DeepSeek-V3

Can’t wait to see what’s next!

New activity in retrain-pipelines/func_calls_wip about 5 hours ago

[bot] Conversion to Parquet

#1 opened about 17 hours ago by

parquet-converter

New activity in deepseek-ai/DeepSeek-V3-Base 1 day ago

我嘞个dou，这么大

#1 opened 1 day ago by

mrwkd123

liked a model 1 day ago

deepseek-ai/DeepSeek-V3-Base

Updated about 6 hours ago • 655 • 480

updated a dataset 1 day ago

retrain-pipelines/func_calls_wip

Viewer • Updated 1 day ago • 72.9k • 11

liked a Space 2 days ago

Running

221

🌍

QVQ 72B Preview

reacted to AdinaY's post with 👀 2 days ago

Post

2195

QvQ-72B-Preview🎄 an open weight model for visual reasoning just released by Alibaba_Qwen team
Qwen/qvq-676448c820912236342b9888
✨ Combines visual understanding & language reasoning.
✨ Scores 70.3 on MMMU
✨ Outperforms Qwen2-VL-72B-Instruct in complex problem-solving

upvoted a paper 4 days ago

Qwen2.5 Technical Report

Paper • 2412.15115 • Published 7 days ago • 328

replied to FranckAbgrall's post 5 days ago

That's cool. A little subtle, though. Would you consider a different color for the "dialog bubble" icon too? Making it for instance (dark) golden yellow, plus the mouseover text ?

liked a Space 6 days ago

Running on CPU Upgrade

951

🏢

Anychat

replied to clem's post 7 days ago

Everyone got off the waitlist. So cool. So, you managed to privatize the street for many robots to greet us ?

reacted to m-ric's post with 👍 7 days ago

Post

2041

𝐇𝐮𝐠𝐠𝐢𝐧𝐠 𝐅𝐚𝐜𝐞 𝐫𝐞𝐥𝐞𝐚𝐬𝐞𝐬 𝐏𝐢𝐜𝐨𝐭𝐫𝐨𝐧, 𝐚 𝐦𝐢𝐜𝐫𝐨𝐬𝐜𝐨𝐩𝐢𝐜 𝐥𝐢𝐛 𝐭𝐡𝐚𝐭 𝐬𝐨𝐥𝐯𝐞𝐬 𝐋𝐋𝐌 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝟒𝐃 𝐩𝐚𝐫𝐚𝐥𝐥𝐞𝐥𝐢𝐳𝐚𝐭𝐢𝐨𝐧 🥳

🕰️ Llama-3.1-405B took 39 million GPU-hours to train, i.e. about 4.5 thousand years.

👴🏻 If they had needed all this time, we would have GPU stories from the time of Pharaoh 𓂀: "Alas, Lord of Two Lands, the shipment of counting-stones arriving from Cathay was lost to pirates, this shall delay the building of your computing temple by many moons "

🛠️ But instead, they just parallelized the training on 24k H100s, which made it take just a few months.
This required parallelizing across 4 dimensions: data, tensor, context, pipeline.
And it is infamously hard to do, making for bloated code repos that hold together only by magic.

🤏 𝗕𝘂𝘁 𝗻𝗼𝘄 𝘄𝗲 𝗱𝗼𝗻'𝘁 𝗻𝗲𝗲𝗱 𝗵𝘂𝗴𝗲 𝗿𝗲𝗽𝗼𝘀 𝗮𝗻𝘆𝗺𝗼𝗿𝗲! Instead of building mega-training codes, Hugging Face colleagues cooked in the other direction, towards tiny 4D parallelism libs. A team has built Nanotron, already widely used in industry.
And now a team releases Picotron, a radical approach to code 4D Parallelism in just a few hundred lines of code, a real engineering prowess, making it much easier to understand what's actually happening!

⚡ 𝗜𝘁'𝘀 𝘁𝗶𝗻𝘆, 𝘆𝗲𝘁 𝗽𝗼𝘄𝗲𝗿𝗳𝘂𝗹:
Counting in MFU (Model FLOPs Utilization, how much the model actually uses all the compute potential), this lib reaches ~50% on SmolLM-1.7B model with 8 H100 GPUs, which is really close to what huge libs would reach. (Caution: the team is leading further benchmarks to verify this)

Go take a look 👉 https://github.com/huggingface/picotron/tree/main/picotron

1 reply

liked a Space 9 days ago

Running

384

📝

Scaling test-time compute

reacted to yjernite's post with 👀 13 days ago

Post

2038

🇪🇺 Policy Thoughts in the EU AI Act Implementation 🇪🇺

There is a lot to like in the first draft of the EU GPAI Code of Practice, especially as regards transparency requirements. The Systemic Risks part, on the other hand, is concerning for both smaller developers and for external stakeholders.

I wrote more on this topic ahead of the next draft. TLDR: more attention to immediate large-scale risks and to collaborative solutions supported by evidence can help everyone - as long as developers disclose sufficient information about their design choices and deployment contexts.

Full blog here, based on our submitted response with @frimelle and @brunatrevelin :

https://huggingface.co/blog/yjernite/eu-draft-cop-risks#on-the-proposed-taxonomy-of-systemic-risks

2 replies

reacted to julien-c's post with 👍 15 days ago

Post

7641

After some heated discussion 🔥, we clarify our intent re. storage limits on the Hub

TL;DR:
- public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible
- private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)

docs: https://huggingface.co/docs/hub/storage-limits

We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community 🔥

cc: @reach-vb @pierric @victor and the HF team

28 replies

reacted to dvilasuero's post with ❤️ 19 days ago

Post

2263

🌐 Announcing Global-MMLU: an improved MMLU Open dataset with evaluation coverage across 42 languages, built with Argilla and the Hugging Face community.

Global-MMLU is the result of months of work with the goal of advancing Multilingual LLM evaluation. It's been an amazing open science effort with collaborators from Cohere For AI, Mila - Quebec Artificial Intelligence Institute, EPFL, Massachusetts Institute of Technology, AI Singapore, National University of Singapore, KAIST, Instituto Superior Técnico, Carnegie Mellon University, CONICET, and University of Buenos Aires.

🏷️ +200 contributors used Argilla MMLU questions where regional, dialect, or cultural knowledge was required to answer correctly. 85% of the questions required Western-centric knowledge!

Thanks to this annotation process, the open dataset contains two subsets:

1. 🗽 Culturally Agnostic: no specific regional, cultural knowledge is required.
2. ⚖️ Culturally Sensitive: requires dialect, cultural knowledge or geographic knowledge to answer correctly.

Moreover, we provide high quality translations of 25 out of 42 languages, thanks again to the community and professional annotators leveraging Argilla on the Hub.

I hope this will ensure a better understanding of the limitations and challenges for making open AI useful for many languages.

Dataset: CohereForAI/Global-MMLU

reacted to jsulz's post with 👍 20 days ago

Post

1296

Doing a lot of benchmarking and visualization work, which means I'm always searching for interesting repos in terms of file types, size, branches, and overall structure.

To help, I built a Space jsulz/repo-info that lets you search for any repo and get back:

- Treemap of the repository, color coded by file/directory size
- Repo branches and their size
- Cumulative size of different file types (e.g., the total size of all the safetensors in the repo)

And because I'm interested in how this will fit in our work to leverage content-defined chunking for versioning repos on the Hub
- https://huggingface.co/blog/from-files-to-chunks - everything has the number of chunks (1 chunk = 64KB) as well as the total size in bytes.

Some of the treemaps are pretty cool. Attached are black-forest-labs/FLUX.1-dev and for fun laion/laion-audio-preview (which has nearly 10k .tar files 🤯)

2 replies

liked a dataset 21 days ago

foursquare/fsq-os-places

Viewer • Updated 23 days ago • 105M • 3.1k • 63

upvoted a paper 21 days ago

Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving

Paper • 2407.00079 • Published Jun 24 • 5

reacted to cfahlgren1's post with 👍 22 days ago

Post

1811

You can just ask things 🗣️

"show me messages in the coding category that are in the top 10% of reward model scores"

Download really high quality instructions from the Llama3.1 405B synthetic dataset 🔥

argilla/magpie-ultra-v1.0