How much Resource is needed to run the Mixtral ?

#128
by rkhapre - opened

How much Resource is needed to run the Mixtral ?

I need the following information : -

How much Disk Space is needed to run this mode?
How much CPU is needed to run this Model or we need GPU?
How much RAM is needed to run this model?
Can i download this model only for English?
Do we have any managed/hosted inference endpoint for Production and how much it will cost approximately?
Do we have any pay as you use model?

Hi,
First question and Third: It's less about Disk Space and more about Ram (usually it's almost the same quantity), MistralAI states on https://docs.mistral.ai/models/ that Mixtral requires 100GO of VRAM (GPU) for exemple. However, this can be reduced through the use of quantized modelsinstead of the original (will come back at it on a second).
Second: To run the original model you will require 2 GPU's, as the VRAM necessary is an insane amount (can be reduced with quantization).
Fourth: I don't think so, you can however fine tune it or use a fine tuned version for a specific task you want it to do.
For your last questions: MistralAI itself as an API available on their website that you can use, and many other websites have also endpoints you can use !

Now about quantization, there is a well respected user on hugging face named TheBloke that shares quantized versions of popular models.
Here: https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF, for example, there are GGUF versions that you can actually run with a powerfull CPU and some of them with way less ram (the smaller one required 18.14 GB compared to the 100 of the original one...), so it really depends on what you need and your budget honnestly !

Note: The smaller the quantized version, the less accurate and good the results will be.

I hope I was of help !

Sign up or log in to comment