Quantized versions, pls?

#1
by Yhyu13 - opened

Hi,

Is there 4bit or 8bit quantized version for this project?

Found this in readme of git repo

The program provides the following hyperparameters to control the generation process and quantization accuracy:

usage: cli_demo.py [-h] [--max_length MAX_LENGTH] [--top_p TOP_P] [--top_k TOP_K] [--temperature TEMPERATURE] [--english] [--quant {8,4}]

optional arguments:
-h, --help show this help message and exit
--max_length MAX_LENGTH
max length of the total sequence
--top_p TOP_P top p for nucleus sampling
--top_k TOP_K top k for top k sampling
--temperature TEMPERATURE
temperature for sampling
--english only output English
--quant {8,4} quantization bits

I just want to ask if it is possible to save/load quantized version only, to reduce disk occupation

Should be possible for 8 bit models but I don’t think for 4 bit models

Sign up or log in to comment