i want to quantize model exl2 and than finetune it

by Mihir1108 - opened Feb 13

Feb 13

i alredy finetune mistral 7B instruct GPTQ model finetune but i want to reduce infrance time so i want to quantize model exl2 and than finetune it so if possible than can you help me

Mihir from India

Mihir1108

Feb 13

give me please replay

LoneStriker

Owner Feb 13

I am not familiar with fine-tuning using a quantized model. All of the fine-tuning I've done has been with the original fp16 weights. If you want to quantize fp16 models into elx2 format, you can just clone the exllamav2 repo and run the convert.py script:

Clone this repo:
https://github.com/turboderp/exllamav2

Run this convert.py command to quantize your model (bits is something like 4.0 or any number between 2.18 to 8.0):
python3 ./convert.py -i <input_dir> -o <output_dir> -b <bits>

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment