jsulz
Β·
AI & ML interests
Infrastructure, law, policy
Recent Activity
replied to
their
post
about 19 hours ago
It's been a bit since I took a step back and looked at https://huggingface.co/xet-team progress to migrate Hugging Face from Git LFS to Xet, but every time I do it boggles the mind.
A month ago there were 5,500 users/orgs on Xet with 150K repos and 4PB. Today?
π€ 700,000 users/orgs
π 350,000 repos
π 15PB
Meanwhile, our migrations have pushed throughput to numbers that are bonkers. In June, we hit upload speeds of 577Gb/s (crossing 500Gb/s for the first time).
These are hard numbers to put into context, but let's try:
The latest run of the Common Crawl from https://huggingface.co/commoncrawl was 471 TB.
We now have ~32 crawls stored in Xet. At peak upload speed we could move the latest crawl into Xet in about two hours.
We're moving to a new phase in the process, so stay tuned.
This shift in gears means it's also time to roll up our sleeves and look at all the bytes we have and the value we're adding to the community.
I already have some homework from @RichardErkhov to look at the dedupe across their uploads, and I'll be doing the same for other early adopters, big models/datasets, and frequent uploaders (looking at you @bartowski π)
Let me know if there's anything you're interested in; happy to dig in!
View all activity
Organizations