lmz/candle-quantized-t5 · Add xl variant

bayang

Dec 9, 2023

•

edited Dec 9, 2023

contains:

config-flan-t5-xl.json
model-flan-t5-xl.gguf

quantization: q6k

Add xl variantb73c421a

lmz

Owner Dec 10, 2023

Looks good, could you mention the command line / code change that you needed to be able to test this and how I can run it to try it out?

bayang

Dec 10, 2023

Quantization:

cargo run --example tensor-tools --release -- quantize --quantization q6k PATH/TO/T5/model.safetensors /tmp/model.gguf

Testing:
From Candle, I called my repo deepfile/flan-t5-xl-gguf instead of lmz/candle-quantized-t5, because it contains model-flan-t5-xl.gguf file in the main branch.

cargo run --example quantized-t5 --release -- --prompt "translate to German: I'm living in Paris." --model-id "deepfile/flan-t5-xl-gguf" --which "flan-t5-xl"
...
 Ich wohne in Paris.
8 tokens generated (7.76 token/s)

( @lmz But this xl quantized model is worse than the quantized large one on open-domain questions. I haven't test it yet on context-based QA )

lmz changed pull request status to merged Dec 11, 2023