🙏🏻Praying for Quantized

#12

by Tonic - opened Feb 16

Feb 16

your interests in supporting the community by quantizing the cooler models, could really help us right now.

Basically AYA-101 is borne of a community effort from 3000 volunteers from around the world with a strong focus on traditionally overlooked languages.

Our main thing is that folks dont have access to enough GPUs to run it currently !

If there's anything you can do to quantize this one for us, everyone might really like that & build some cool downstream applications in their native languages 🤗

LoneStriker

Feb 16

•

edited Feb 16

If it can be quantized, I will gladly do it. The hard part though is to add support for new model types or variants in llama.cpp, auto-awq, auto-gptq or exllamav2. The quantization part is simple by comparison. I'll have a look.

Edit: doesn't look like any of the standard quant libraries support this specific model yet. I'll keep an eye out for when support is added.

Tonic

Feb 18

just read the edit with a broken heart, i appreciate you for trying + looking into this 🙏🏻

NilanE

Feb 20

try candle, maybe

https://github.com/huggingface/candle/tree/main/candle-examples/examples/t5
https://github.com/huggingface/candle/issues/1311 (has quantization command)

captainkyd

Feb 21

•

edited Feb 21

you know I'm on board. I'll totally rock getting gguf files set and the model ported to ollama too :-D
💕💕

downloading the model now... I'll get on the gguf and ollama port in the morning

Tonic

Feb 22

@TheBloke , we're a small group of 3000 contributors from really around the world , that basically curated this massive high quality multilingual dataset , but when the model was made there were no quants really made available. I dont really know what i'm asking for actually, but basically if you or folks from the community might know someone that would be interested in leading / supporting the effort it could be fantastic ^^

lastrosade

Feb 28

Added today: https://github.com/ggerganov/llama.cpp/issues/5763

kcoopermiller

Feb 29

Hey all, I tried using candle for quantization and it appears to have been successful! First time trying this, so let me know if you see anything out of the ordinary.
Also, if anyone is aware of other libraries apart from candle that are capable of running quantized T5 models, please let me know!

https://huggingface.co/kcoopermiller/aya-101-GGUF

katebsaber

Mar 1

•

edited Mar 1

Hey all, I tried using candle for quantization and it appears to have been successful! First time trying this, so let me know if you see anything out of the ordinary.
Also, if anyone is aware of other libraries apart from candle that are capable of running quantized T5 models, please let me know!

https://huggingface.co/kcoopermiller/aya-101-GGUF

Thanks for the contribution. It would be greatly appreciated if you could also provide your conversion scripts as well as the sample usage in python. Despite the potential reborn of the T5 architecture, resources on their quantization are currently quite limited. You may consider contributing your work as an example to the Candle framework repository.

GreekMan

Mar 6

Hey all, I tried using candle for quantization and it appears to have been successful! First time trying this, so let me know if you see anything out of the ordinary.
Also, if anyone is aware of other libraries apart from candle that are capable of running quantized T5 models, please let me know!

https://huggingface.co/kcoopermiller/aya-101-GGUF

I cant run this on text generation web ui. What am I doing wrong?

kcoopermiller

Mar 6

Thanks for the contribution. It would be greatly appreciated if you could also provide your conversion scripts as well as the sample usage in python. Despite the potential reborn of the T5 architecture, resources on their quantization are currently quite limited. You may consider contributing your work as an example to the Candle framework repository.

No prob! For the conversion, I simply used candle’s tensor-tools command line utility. I recommend checking out this example for more details

Also, as far as I know, there aren’t any popular Python libraries that support quantized T5 models. If anyone is familiar with a library, please let me know and I'll add an example to the aya-101-GGUF readme!

I cant run this on text generation web ui. What am I doing wrong?

I’m pretty sure oobabooga only supports gguf models thru the llama.cpp and ctransformers backends and neither currently support T5 models such as aya. Given that ctransformers hasn’t been updated in a while, you’ll most likely have to wait until llama.cpp adds T5 support. More info: https://github.com/ggerganov/llama.cpp/issues/5763

ugur6634

Mar 9

Does langchain huggingfacepipeline class support T5 based models including gguf or candle format?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment