Quantization
#30
by
mrgiraffe
- opened
Hello - I was wondering if there's a quantized version of the model that can be used to generate embeddings? Tried adding this model on an A10G instance and the GPU couldn't handle it. Thank you very much
You can quantize it yourself. See docss:
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
)
tokenizer = AutoTokenizer.from_pretrained('intfloat/e5-mistral-7b-instruct')
model = AutoModel.from_pretrained('intfloat/e5-mistral-7b-instruct'
,torch_dtype=torch.float16
, attn_implementation="flash_attention_2"
, device_map="cuda"
, quantization_config=quantization_config
)