mistralai/Mixtral-8x7B-Instruct-v0.1 · Text Generation Inference?

silvacarl

Dec 13, 2023

has anyone been able to get this to work with Text Generation Inference?

https://github.com/huggingface/text-generation-inference

AbRds

Dec 14, 2023

•

edited Dec 14, 2023

Yes, I've tried to deploy the model using TGI, it is explained here: https://huggingface.co/blog/mixtral#using-text-generation-inference
In my case I was using a AWS EC2 G5.24xlarge, however seems like the machine is not big enough to run the model and it crashes, you can see my issue here: https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1/discussions/22#657991ae0b0608ba9ccb0c4f

I'm awaiting for authorisation in order to use the G5.48xLarge, if you are able to run the model following the instructions in the first link, please let me know which machine you are using.

Cheers.

ArthurZ

Dec 18, 2023

see this as well https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1/discussions/18

AbRds

Dec 19, 2023

Hi @silvacarl ,

I was able to run the model along with TGI using an in-place quantisation technique (my current setup is not able to run the model at full), also I used the default value for the flag --max-total-tokens.
Here is the command I used in case it is useful for you or someone else:

sudo docker run -d --gpus all --shm-size 1g -p $port:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id mistralai/Mixtral-8x7B-Instruct-v0.1 --sharded true --num-shard 4 --quantize eetq