Service Unavailable
Hi just wanted to use this with transformers but now as Response I always get 503 and here on the test it is said that the service is unavailable. just wanted to ask if this is a error from my side or if anything happened on your site.
Hello. If you get 503 error, that means that service is unavailable right now (from 500 to 599 codes are server errors). It's not an error from your side. I am also getting that error.
Does anyone know how much time does it usually take for a model like this one to go back to the warm state? I hope that this gets solved soon. Thank you.
(https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3 can also be used, and is working well at the time of writing this)
To look at warm models, you can use this query: https://huggingface.co/models?inference=warm&pipeline_tag=text-generation
If you require a model on-demand, you can consider using Inference Endpoints. There is a sleep feature to disable it after 15 minutes of no calls to reduce costs, too.
(I work at HF)
I have been using the mistralai/Mistral-7B-Instruct-v0.2 model for a case study, and it was functioning perfectly until recently. However, I am now encountering an issue where I am unable to access the model. The error message I receive is as follows:
Model not loaded on the server: https://api-inference.huggingface.co/models/mistralai/Mistral-7B-Instruct-v0.2. Please retry with a higher timeout (current: 120).
HuggingFace Team, Could you please provide any suggestions or alternatives so that I can access this model again?
You have a few options:
- Use the v3 model on Inference API
- Use Inference Endpoints to host a dedicated endpoint just for you.
- Download the model and run it on your own hardware
- Go to a service like together.ai to pay per token on their api
You have a few options:
- Use the v3 model on Inference API
- Use Inference Endpoints to host a dedicated endpoint just for you.
- Download the model and run it on your own hardware
- Go to a service like together.ai to pay per token on their api
Bro but which is file is to download for the model here are so many files i am confused
You have a few options:
- Use the v3 model on Inference API
- Use Inference Endpoints to host a dedicated endpoint just for you.
- Download the model and run it on your own hardware
- Go to a service like together.ai to pay per token on their api
I never have understood why people are relying on free services. Download it, do it yourself. Then you are 100% in control. It also gets rid of issues during updates you are not ready for. Or if you cant do it local, pay someone else to do it for you ....
If you want to run it locally use TGI or vLLM.
huggingface-cli download mistralai/Mistral-7B-Instruct-v0.2 --local-dir mistral-7b-instruct-v0.2 --local-dir-use-symlinks False --exclude pytorch_model.*
model=mistral-7b-instruct-v0.2
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
token=<your cli READ token>
docker run --gpus all --shm-size 1g -e HF_TOKEN=$token -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:2.0 --model-id $model
If you want to run it locally use TGI or vLLM.
huggingface-cli download mistralai/Mistral-7B-Instruct-v0.2 --local-dir mistral-7b-instruct-v0.2 --local-dir-use-symlinks False --exclude pytorch_model.*
model=mistral-7b-instruct-v0.2 volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run token=<your cli READ token> docker run --gpus all --shm-size 1g -e HF_TOKEN=$token -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:2.0 --model-id $model
Thanks I'll try it out
To run local you could also use ollama cli or with openwebui ( gguf version ), oobas textgen, and some others ..
If you want to run it locally use TGI or vLLM.