Spaces:
Running
HuggingChat Performance Issues: Slow Responses and Loading Problems
The current models in Huggingchat are having problems, almost all the feedback models for a long time, or it keeps downloading like that without displaying the answer, I just want to know that there are problems that are happening for several days for hugingchat ??
Likewise, I am getting 502 errors when trying to use a variety of different models. E.g. deepseek-R1 starts the reasoning part, but then errors out on 502 before returning the final response.
@mattasdata can you get me a list of models that have issues ? ideally with some shared conversations :)
Unfortunately, the "share conversation" feature also isn't working. However, the models I have tested are deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
and Qwen/Qwen3-235B-A22B
. Both return 502 or say Error in input stream
. I was able to download the JSON from one conversation below:
{"prompt":"Prompt generation failed","model":"deepseek-ai/DeepSeek-R1-Distill-Qwen-32B","parameters":{"return_full_text":false},"messages":[{"role":"system","content":"","createdAt":"2025-05-20T13:56:20.983Z","updatedAt":"2025-05-20T13:56:20.983Z"},{"role":"user","content":"How would I enforce the signature of a function in python? I want to return a Callable object from a factory method, and that Callable ideally would have a way of indicating what the parameter names should be for it. In code:\n\n```python\ndef factory_method(self):\n def callable_to_return(param_1: list, param_2: int):\n # this function is the one I want to define an interface for, so that param_1 and param_2 should be used as the argument names\n return (param_1, param_2)\n return callable_to_return\n```","createdAt":"2025-05-20T13:56:21.409Z","updatedAt":"2025-05-20T13:56:21.409Z","files":[]}]}
I've been trying to reproduce but so far I'm not getting a 502. https://hf.co/chat/r/ixaxEWn?leafId=d096e2e9-d689-41f6-a4fe-313b80d1bb91
Does this happen every time for you or only occasionally? Also what browser/OS are you using ?
EDIT: I can reproduce it now, investigating, thanks for reporting!
Found the issue! Working on a fix.
Ok the fix is currently deploying, should be live in 5-10minutes.
Looks like the small model we use for tasks like giving reasoning status updates, conversation summaries, etc. was occasionally overloaded. When it happened it crashed chat-ui! This should now be handled properly.
When https://huggingface.co/chat/settings/application shows Latest deployment 5c0c578
then you will know it's live!
Awesome, thanks for getting to that so quickly! I can confirm it has been resolved (after a page refresh) using the Qwen model.
Bonjour, nous avons réalisé un devis pour une réfection à neuf de la toiture courant 2006. Les travaux de peinture à l'intérieur du bâtiment seront réalisés en interne par le service MMT.
Hi 🤗
I have also had the problem here for days that the good HuggingChat is very, very slow and keeps crashing. I found this thread here and checked my version. It says ‘Latest deployment 41a4fde’. That should be older than ‘5c0c578’. How can I get this newer version? I have already reloaded the page using CTRL+F5 (Firefox), but nothing changes. Does anyone here have a tip for me?
@nsarrazin
Oh, thanks for asking 🙂 I've just checked and there is indeed a new version number for me: Latest deployment 6d2f047.
It may be that the crashes have been a little less in the last few days. But I'm not so sure, as I haven't used the HF chat as much in the last few days as I did the time before. 🤔
Another thing I've noticed recently is that the buttons for the community tools (HF docs, Python code, etc.) have been removed. I can't seem to reactivate them either. But that's perhaps another topic 😉
Issue with tools should also be fixed! Let me know if you still have issues with performance
@nsarrazin
Yes, the tools are back. Thank you very much 🙂 At the moment the chat seems to be working fine again. I have the impression that the ‘glitches’ depend on the time of day and also on the LLM I'm using. The system seems to work better at the weekend or in the morning. The answers are quicker and better prepared at this time. In the evening hours, an unspecific error occurs from time to time and the chat output sometimes becomes ‘funny’. The chat then jumps from one language to another in the middle of a sentence and Chinese(?) characters suddenly appear in the text. Or there are lines with meaningless content: "+++++++++++++++++++++++" 😉
Depending on which LLM I use, the response times are also very different. For example, according to my observations, the Qwen/Qwen3-235B-A22B model takes an extremely long time to even start its considerations. Other models are sometimes much faster. But I suspect that this is in the nature of technology 😉
These are my current observations. Perhaps they will help ...