Quantized Versions of jais-13b-chat

#4
by haouarin - opened

Hello,

I'm using the "jais-13b-chat" model and find it beneficial. For optimization purposes, could you consider providing 4-bit and 8-bit quantized versions? This would greatly assist deployments in resource-limited environments.

Thanks for considering,
Noureddine

haouarin changed discussion title from Quantized Versions of jais-13b-chat Request to Quantized Versions of jais-13b-chat

you can use bitsandbytes directly on jais

There is this quantized version (https://huggingface.co/mouaff25/jais-13b-chat-8bit) but it did not work for me. Model loaded by got tensor mismatch error.

There is this quantized version (https://huggingface.co/mouaff25/jais-13b-chat-8bit) but it did not work for me. Model loaded by got tensor mismatch error.

It works using A100 :
https://colab.research.google.com/drive/1QLihIVHOnWrz5P7XER4mn13YuGAbnPDq?usp=sharing

I've just pushed an 8-bit quantized version , feel free to check it 'drakkola/jais-13b-chat-8bit'

Sign up or log in to comment