Mixtral 8x7B Instruct-v0.1 - bitsandbytes 4-bit

This repository contains the bitsandbytes 4-bit quantized version of mistralai/Mixtral-8x7B-Instruct-v0.1. To use it, make sure to have the latest version of bitsandbytes and transformers installed from source:

Loading this model as such: will directly load the quantized model in 4-bit precision.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "ybelkada/Mixtral-8x7B-Instruct-v0.1-bnb-4bit"
model = AutoModelForCausalLM.from_pretrained(model_id)

Note you need a CUDA-compatible GPU device to run low-bit precision models with bitsandbytes

Downloads last month
198
Safetensors
Model size
24.2B params
Tensor type
F32
FP16
U8
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model authors have turned it off explicitly.