Quantization support.
Are there any plans of releasing 8bit versions support for this?
Add _no_split_modules = ["CodeT5pBlock"]
to class CodeT5pEncoderDecoderModel
in modeling_codet5p.py and now device_map="auto" should work. now you can just use bitsandbytes to do 8bit inference, which will let you run this model with a 24gb gpu. model = transformers.AutoModelForSeq2SeqLM.from_pretrained(checkpoint, device_map="auto", load_in_8bit=True, low_cpu_mem_usage=True, trust_remote_code=True)
If you are a windows user you can find a bnb build here: https://github.com/acpopescu/bitsandbytes/releases
Hey Verah, For https://huggingface.co/mosaicml/mpt-7b-instruct where should I add _no_split_modules, and what will be the value?
Thanks in advance.
Are there any plans of releasing 4bit versions support for this? Thanks.