Request for quantized version
#2
by
sudhir2016
- opened
A quantized version of the model which can be used for inference in a free tier Google Colab notebook would be nice.
will you be able to use HF's integration such as bitsandbytes (https://huggingface.co/docs/transformers/v4.35.0/main_classes/quantization#bitsandbytes-integration)?
Yes please. Will it work with load_in_4bit=True.