Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
stas 
posted an update Jan 26
Post
Do you have a hidden massive storage leak thanks to HF hub models and datasets revisions adding up and not getting automatically deleted?

Here is how to delete all old revisions and only keeping main in a few quick steps and no tedious manual editing.

In terminal A:
$ pip install huggingface_hub["cli"] -U
$ huggingface-cli delete-cache --disable-tui
File to edit: /tmp/tmpundr7lky.txt
0 revisions selected counting for 0.0. Continue ? (y/N)

Do not answer the prompt and proceed with my instructions.

(note your tmp file will have a different path, so adjust it below)

In terminal B:
$ cp /tmp/tmpedbz00ox.txt cache.txt
$ perl -pi -e 's|^#(.*detached.*)|$1|' cache.txt
$ cat cache.txt >>  /tmp/tmpundr7lky.txt

The perl one-liner uncommented out all lines that had (detached) in it - so can be wiped out. And then we pasted it back into the tmp file huggingface-cli expects to be edited.

Now go back to terminal A and hit: N, Y, Y, so it looks like:

0 revisions selected counting for 0.0. Continue ? (y/N) n
89 revisions selected counting for 211.7G. Continue ? (y/N) y
89 revisions selected counting for 211.7G. Confirm deletion ? (Y/n) y

Done.

If you messed up with the prompt answering you still have cache.txt file which you can feed again to the new tmp file it'll create when you run huggingface-cli delete-cache --disable-tui again.

For more details and additional techniques please see https://github.com/stas00/ml-engineering/tree/master/storage#huggingface-hub-caches
In this post