Post
2734
What does it mean when models share the same bytes?
We've investigated some quants and have seen that a considerable portion of quantizations of the same model share the same bytes and can be deduplicated to save considerable upload time for quantizers on the Hub.
This space where we crack open a repo from @bartowski shows we can get significant dedupe xet-team/quantization-dedup
You can get a sense of why by reading this write-up: https://github.com/bartowski1182/llm-knowledge/blob/main/quantization/quantization.md
But what about finetuned models?
Since going into production the
xet-team
has migrated hundreds of repositories on the Hub to our storage layer, including classic "pre-Hub" open-source models like
FacebookAI/xlm-roberta-large (XLM-R) from
FacebookAI
XLM-R, introduced in 2019, set new benchmarks for multilingual NLP by learning shared representations across 100 languages. It was then fine-tuned on English, Spanish, Dutch, and German, generating language-specific derivations for each - check out the paper here Unsupervised Cross-lingual Representation Learning at Scale (1911.02116)
These finetunes share much of the same architecture and layout as XLM-R with similar training methods and goals. It makes sense that they would share bytes, but it's still fascinating to see.
We put together a similar space to explore these models to see where they overlap - check it out for yourself xet-team/finetune-dedupe
The darker each block in the heatmap, the more the bytes are shared. Clicking on a repos blocks shows all other repos that share blocks.
We've investigated some quants and have seen that a considerable portion of quantizations of the same model share the same bytes and can be deduplicated to save considerable upload time for quantizers on the Hub.
This space where we crack open a repo from @bartowski shows we can get significant dedupe xet-team/quantization-dedup
You can get a sense of why by reading this write-up: https://github.com/bartowski1182/llm-knowledge/blob/main/quantization/quantization.md
But what about finetuned models?
Since going into production the


XLM-R, introduced in 2019, set new benchmarks for multilingual NLP by learning shared representations across 100 languages. It was then fine-tuned on English, Spanish, Dutch, and German, generating language-specific derivations for each - check out the paper here Unsupervised Cross-lingual Representation Learning at Scale (1911.02116)
These finetunes share much of the same architecture and layout as XLM-R with similar training methods and goals. It makes sense that they would share bytes, but it's still fascinating to see.
We put together a similar space to explore these models to see where they overlap - check it out for yourself xet-team/finetune-dedupe
The darker each block in the heatmap, the more the bytes are shared. Clicking on a repos blocks shows all other repos that share blocks.