Love the Hub implementation

#5
by lysandre HF staff - opened

Hey! Transformers developer here, really love how you leverage the jinaai/jina-bert-implementation custom code within this specific repository.

Do you have any feedback regarding this feature? Could anything make great contributions such as this one simpler for you and your team?

Hey @lysandre ! Thank you and the HF team for implementing such a useful feature! Overall, we are quite satisfied with the feature but we do have some minor feedback.

  1. Documentation
    The clarity of the transformers documentation on custom models could be improved by describing the syntax of auto_map in config.json. We never used the register functions and would just directly modify the config.json. The behaviour that allowed one to use code on the Hub from another repo using "--" doesn't seem to be documented anywhere but we figured out that it was possible because save_pretrained saves in this format when using a custom model. The feature does seem to be pretty new though (I believe ~6 months ago?) so maybe that is why it hasn't been too well documented yet. But we think that if it was better communicated to users that it was possible to do this, more people would develop on the Hub as we did.

  2. Promote trust_remote_code to environment variable
    Some downstream libraries currently do not support passing the trust_remote_code argument. Notable to our work was sentence_transformers, despite quite a few requests for this [1, 2, 3]. This leads to us needing to monkeypatching the model loading logic in the libraries to be able to use our model. If trust_remote_code could be read from an environment variable e.g. HUGGINGFACE_TRUST_REMOTE_CODE, it would make it such that one only need set the environment variable to enable loading custom models. This would make the use of custom models much easier to adopt throughout the ecosystem.

  3. Silent failure when trust_remote_code is not set to True.
    When trust_remote_code is not set to True for our model, the behaviour seems to be that it loads the classic BERT implementation from transformers and throws a bunch of warnings from re-initialised weights. This is not ideal because if a downstream evaluation script forgot to set the arg, it would generate inaccurate results and the only way of knowing that something was wrong was to scroll through the output logs and see if this warning appeared or print the model and see if the model has the right name. If instead, it would error and ask the user to set the trust_remote_code arg, it would be more easily caught and save us quite some head scratching and communication overhead in the team.

On an unrelated note, we really loved developing on the Hugging Face Hub and would love to be able to stay in the Hub ecosystem for a larger portion of our workflow. However, a big blocker for us is the rate limit (we hit the commit limit and artifact download limit quite a few times). The behaviour of the rate limit seems to be a pretty well asked question in the huggingface forums [1, 2, 3, 4, 5].

We understand that storing and transferring data at scale is not cheap and we would be happy to consider having an Enterprise subscription for our needs. However, it is not clear to us based on the pricing page and HF communications that it will solve our rate limit issue. Perhaps this can be more clearly communicated on the pricing page as well as the forums to allow more companies and users to adopt workflows that utilise the Hub.

Hey @Jackmin108 , that's great feedback! Thanks a lot for coming back to us.

I'm circulating this internally and we'll see how to fix the abovementioned issues best.


Regarding rate limits, there is a higher rate limit for pro users; let me loop @jeffboudier in regarding enterprise subscriptions.

Thanks @lysandre - hi @Jackmin108 , congrats on the impactful open source release!

Enterprise subscriptions for Hub organizations grant PRO privileges to all their members. So members of an Enterprise organization get their rate limits lifted to higher values. The reason we don't give actual numbers for rate limits is that the values depend on how much load we are getting and subject to change - the bottom line is PRO users and Enterprise organization members get priority. I agree we can document this better - eventually we will. For clarity, Inference Endpoints and Spaces usage is not subject to rate limits.

bwang0911 changed discussion status to closed

Sign up or log in to comment