i want to quantize model exl2 and than finetune it

#1
by Mihir1108 - opened

i alredy finetune mistral 7B instruct GPTQ model finetune but i want to reduce infrance time so i want to quantize model exl2 and than finetune it so if possible than can you help me

Mihir from India

give me please replay

I am not familiar with fine-tuning using a quantized model. All of the fine-tuning I've done has been with the original fp16 weights. If you want to quantize fp16 models into elx2 format, you can just clone the exllamav2 repo and run the convert.py script:

Clone this repo:
https://github.com/turboderp/exllamav2

Run this convert.py command to quantize your model (bits is something like 4.0 or any number between 2.18 to 8.0):
python3 ./convert.py -i <input_dir> -o <output_dir> -b <bits>

Sign up or log in to comment