Sylvain Lesage's picture

Sylvain Lesage PRO

severo

AI & ML interests

Dataviz freelance developer. Part-time πŸ€— Hugging Face (dataset viewer).

Recent Activity

updated a dataset about 3 hours ago
severo/trending-repos
View all activity

Organizations

Hugging Face's profile picture Datasets Maintainers's profile picture geospatial's profile picture Datasets examples's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture Hugging Face FineVideo's profile picture Hyperparam's profile picture

severo's activity

upvoted an article 7 days ago
view article
Article

Cohere on Hugging Face Inference Providers πŸ”₯

β€’ 97
reacted to jsulz's post with πŸš€ 7 days ago
view post
Post
866
As xet-team infrastructure begins backing hundreds of repositories on the Hugging Face Hub, we’re getting to put on our researcher hats and peer into the bytes. πŸ‘€ πŸ€“

IMO, one of the most interesting ideas Xet storage introduces is a globally shared store of data.

When you upload a file through Xet, the contents are split into ~64KB chunks and deduplicated, but what if those same chunks already exist in another repo on the Hub?

If we can detect and reuse them, we skip them as well saving time and bandwidth for AI builders. More on how that works here:
πŸ”— https://huggingface.co/blog/from-chunks-to-blocks#scaling-deduplication-with-aggregation

Because of this, different repositories can share bytes we store. That opens up something cool - we can draw a graph of which repos actually share data at the chunk level, where:

- Nodes = repositories
- Edges = shared chunks
- Edge thickness = how much they overlap

xet-team/repo-graph

Come find the many BERT islands. Or see how datasets relate in practice, not just in theory. See how libraries or tasks can tie repositories together. You can play around with node size using storage/likes/downloads too.

The result is a super fun visualization from @saba9 and @znation that I’ve already lost way too much time to. I'm excited to see how the networks grow as we add more repositories!
replied to their post 15 days ago
view reply

"convert CSV to Parquet" :) SEO is good

posted an update 15 days ago
reacted to jsulz's post with πŸ‘ about 1 month ago
view post
Post
1967
If you've been following along with the Xet Team's ( xet-team ) work, you know we've been working to migrate the Hugging Face Hub from Git LFS and to Xet.

Recently, we launched a waitlist to join the movement to Xet (join here! https://huggingface.co/join/xet ) but getting to this point was a journey.

From the initial proof of concept in August, to launching on the Hub internally, to migrating a set of repositories and routing a small chunk of download traffic on the Hub through our infrastructure. Every step of the way has been full of challenges, big and small, and well worth the effort.

Over the past few weeks, with real traffic flowing through our services we’ve tackled some truly gnarly issues (unusual upload/download patterns, memory leaks, load imbalances, and more) and resolved each without major disruptions.

If you're curious about how this sliver of Hub infrastructure looks as we routed traffic through it for the first time (and want a deep dive full of Grafana and Kibana charts πŸ€“) I have a post for you.

Here's an inside look into the day of our first migrations and the weeks following, where we pieced together solutions in real time.

https://huggingface.co/blog/xet-on-the-hub
upvoted an article 2 months ago
view article
Article

From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub

β€’ 64