Quantization

#30

by mrgiraffe - opened Feb 26, 2024

Feb 26, 2024

Hello - I was wondering if there's a quantized version of the model that can be used to generate embeddings? Tried adding this model on an A10G instance and the GPU couldn't handle it. Thank you very much

guillaumeguy

Apr 9, 2024

•

edited Apr 9, 2024

You can quantize it yourself. See docss:

quantization_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.float16,
)

tokenizer = AutoTokenizer.from_pretrained('intfloat/e5-mistral-7b-instruct')
model = AutoModel.from_pretrained('intfloat/e5-mistral-7b-instruct'
,torch_dtype=torch.float16
, attn_implementation="flash_attention_2"
, device_map="cuda"
, quantization_config=quantization_config
)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment