Aboubacar OUATTARA's picture

Aboubacar OUATTARA PRO

oza75

AI & ML interests

NLP and Vision

Recent Activity

updated a dataset about 12 hours ago
djelia/bambara-texts
published a dataset about 12 hours ago
djelia/bambara-texts
updated a dataset about 12 hours ago
djelia/bambara-lm-qa
View all activity

Organizations

open/ acc's profile picture Djelia's profile picture

oza75's activity

reacted to jsulz's post with šŸš€ 5 days ago
view post
Post
2982
Toward the end of last year, the Xet team provided an inside look into the foundations of how we plan to enable rapid experimentation and iteration for the AI builders on the Hub: https://huggingface.co/blog/from-files-to-chunks

But it turns out chunks aren't all you need!

Our goal is to bring:
šŸš€ Faster uploads
ā¬ Speedy downloads
šŸ’Ŗ All without sacrificing your workflow

To do that, we need the infrastructure and system and design to back it up. As we prepare to roll out the first Xet-backed repositories on the Hub, we wrote up a post explaining the nitty gritty details of the decisions that bring this to life https://huggingface.co/blog/from-chunks-to-blocks

Complete with an interactive visualization that shows the power of deduplication in action - taking a 191GB repo to ~97GB and shaving a few hours off upload speeds.

The darker each block in the heatmap, the more we dedupe, the less we have to transfer. Clicking on a file's blocks shows all other files that share blocks.

Check it out and explore for yourself! xet-team/quantization-dedup