Lyte

Dec 2, 2024

Just deleted some of my model and dataset repos of old projects, feels bad.

Dec 3, 2024

•

edited Dec 3, 2024

Even Pro subscription holders are limited to 1TB. It's a large enough capacity, but I'm over it too.
@bartowski @mradermacher @Yntec @digiplay Are your accounts OK?

·

mradermacher

Dec 3, 2024

•

edited Dec 3, 2024

I have no trouble to upload atm., and I am almost 5000 times above that limit. The way hf is handling that is not exactly stellar - it's unclear what the rules are and will be. What I would like to know, is the limit actually enforced anywhere?

Now, I see as lot of model releases (pretty much all transformer text-generation ones), and it is absolutely clear to me that there are many repos out there that look like models, but actually aren't, so hf is clearly being abused for data storage.

So hf is in a difficult position: they are extremely generous, but are being abused, and it is not easy to defend against that abuse. I am not sure what they are going to do, but it is clear that at some point, they have to do something about it.

It is not in their interest to lose actually useful datasets and repositories, and the way this was introduced without a lot of information is mostly bad for them. There will also be a chilling effect, and I think that is likely not from any imposed limits, but from perceived limits.

Personally, I think we should be grateful for what we got, and probably will be grateful for what we will get. So far, hf has done a great job, and even with some enshtiffication going on, they are still providing a great service :)

I think the way forward is to pretend these limits do not exist and find out what their actual impact will be - judge by what they do, not what they say.

kristaller486

Dec 3, 2024

RIP HF

GhostGate

Dec 3, 2024

I hope that there will be some form of a contingency plans, otherwise most of the finetunes and quants on large models might need to be removed and I hope that won't happen. I would hate to have to go Q4 only from now on, just because no one can upload multiple quants because no space.

·

nyuuzyou

Dec 3, 2024

I hope that there will be some form of a contingency plans, otherwise most of the finetunes and quants on large models might need to be removed and I hope that won't happen. I would hate to have to go Q4 only from now on, just because no one can upload multiple quants because no space.

I hope Hugging Face will find ways to expand storage for free or provide unlimited storage for PRO users instead of the current 1 TB limit. Kaggle offers unlimited storage for public repositories per account, with a limit of 200 GB per repository (which is less generous than Hugging Face's 300 GB). However, Kaggle won't come close to replacing Hugging Face's functionality, so we have nowhere else to turn.

nicoboss

Dec 3, 2024

The new storage quotas seam to mainly be imposed to combat abuse of their service and should hopefully not affect users meaningfully contributing for the greater good of the AI community. I recommend giving https://www.reddit.com/r/LocalLLaMA/comments/1h53x33/huggingface_is_not_an_unlimited_model_storage/ a read. Mainly this post:

Heya! I’m VB, I lead the advocacy and on-device team @ HF. This is just a UI update for limits which have been around for a long while. HF has been and always will be liberal at giving out storage + GPU grants (this is already the case - this update just brings more visibility).

We’re working on updating the UI to make it more clear and recognisable - grants are made for use-cases where the community utilises your model checkpoints and benefits for them - Quantising models is one such use-case, other use-cases are pre-training/ fine-tuning datasets, Model merges and more.

Similarly we also give storage grants to multi-PB datasets like YODAS, Common Voice, FineWeb and the likes.

This update is more for people who dump random stuff across model repos, or use model/ dataset repos to spam users and abuse the HF storage and community.

I’m a fellow GGUF enjoyer, and a quant creator (see - https://huggingface.co/spaces/ggml-org/gguf-my-repo) - we will continue to add storage + GPU grants as we have in past.

Cheers!

·

nyuuzyou

Dec 3, 2024

I'm glad to hear that. But asking for a quota to publish a 100MB dataset is a bit strange.

Reubencf

Dec 3, 2024

RIP HF 😿

Reubencf

Dec 3, 2024

RIP HF 😿

Yntec

Dec 3, 2024

@john6666

@Yntec Are your accounts OK?

This pushed me over the edge and I just took offline 448 models worth 7.5TB, most of which were exclusives and this was the only way to download them, because I made those models.

Huggingface has it backwards, we creators are the ones providing value and a reason for people to come here and use the platform, if they're going to limit how much value I can provide they're not getting a single byte from me. It's their loss, not mine, free web space is available everywhere.

This is now being shown in my spaces:

This has now become a place for big companies like Black Forest Labs to showcase their models so people buy their API access, while there's no future for a little guy like me, a big step back for the democratization of AI, but there's always better things on the horizon, and nothing of value was lost.

RIP HF

·

prithivMLmods

Dec 3, 2024

@yntech
Whaaat ?🥲
Have you reported this anywhere or contacted HF support about it?

John6666

Dec 4, 2024

•

edited Dec 5, 2024

nyuuzyou

Dec 4, 2024

•

edited Dec 4, 2024

Update
https://huggingface.co/posts/nyuuzyou/189643732365912

John6666

Dec 6, 2024

IMPORTANT UPDATE

https://huggingface.co/posts/Duskfallcrew/162083606376506#6751e70cc09f9715656f08c2

·

benjamin-paine

Dec 6, 2024

Thanks for helping keep us updated.

With just two of my datasets (FreeSound and Free Music Archive) I'm already over my allotted 1TB of Pro storage.

I uploaded those datasets for a lot of reasons, but the biggest one is that they simply didn't exist on HF in a way that was easily and securely accessible using 🤗 datasets, and I like that library because it makes rapid iteration with many different datasets trivial. It would be a shame if researchers and devs had to go back to using untrusted code to download sketchy .zip files from slow and unreliable servers overseas in order to get access to these excellent datasets - but Hugging Face hasn't let me down yet when it comes to making decisions that are best for OSS AI, so I have a feeling they feel the same way I do about that and we'll be able to work something out with storage.

Join the conversation

IMPORTANT UPDATE