Text Generation
Transformers
multilingual
Inference Endpoints

Request for quantized version

#2
by sudhir2016 - opened

A quantized version of the model which can be used for inference in a free tier Google Colab notebook would be nice.

MaLA-LM org

will you be able to use HF's integration such as bitsandbytes (https://huggingface.co/docs/transformers/v4.35.0/main_classes/quantization#bitsandbytes-integration)?

Yes please. Will it work with load_in_4bit=True.

Sign up or log in to comment