Feature request: Using weights hash to identify duplicates

#422
by mrfakename - opened

Hi,
Might it be possible to calculate the MD5 hash of models so there aren't duplicates?
For example, many people have uploaded Llama mirrors, and they keep getting re-evaluated. This makes it a lot more expensive and a lot slower to get unique models evaluated.
Might it be possible to calculate a hash of the model (not the files since they can be compressed, changed, etc) but the actual weights, and use previously calculated evaluations on duplicate models (so they basically get evaluated instantly) and allow people to manually request re-evaluation?
Thank you!

clefourrier changed discussion title from Duplicate Removal to Feature request: Using weights hash to identify duplicates
Hugging Face H4 org

Hi, thank you for your suggestion!
It's an interesting idea. We would need to store the hash of all models evaluated and compare any new submission with them, I'd have to see how expensive that would be.
Leaving this issue open as it's likely we'll investigate it, but it's not a top priority atm

Sign up or log in to comment