Text Generation Inference?

#28
by silvacarl - opened

has anyone been able to get this to work with Text Generation Inference?

https://github.com/huggingface/text-generation-inference

Yes, I've tried to deploy the model using TGI, it is explained here: https://huggingface.co/blog/mixtral#using-text-generation-inference
In my case I was using a AWS EC2 G5.24xlarge, however seems like the machine is not big enough to run the model and it crashes, you can see my issue here: https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1/discussions/22#657991ae0b0608ba9ccb0c4f

I'm awaiting for authorisation in order to use the G5.48xLarge, if you are able to run the model following the instructions in the first link, please let me know which machine you are using.

Cheers.

Hi @silvacarl ,

I was able to run the model along with TGI using an in-place quantisation technique (my current setup is not able to run the model at full), also I used the default value for the flag --max-total-tokens.
Here is the command I used in case it is useful for you or someone else:

sudo docker run -d --gpus all --shm-size 1g -p $port:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id mistralai/Mixtral-8x7B-Instruct-v0.1 --sharded true --num-shard 4 --quantize eetq

Sign up or log in to comment