flan-t5-xxl-gguf
This is a quantized version of google/flan-t5-xxl
Usage/Examples
./llama-cli -m /path/to/file.gguf --prompt "your prompt" --n-gpu-layers nn
nn --> numbers of layers to offload to gpu
Quants
BITs | TYPE |
---|---|
Q2 | Q2_K |
Q3 | Q3_K, Q3_K_L, Q3_K_M, Q3_K_S |
Q4 | Q4_0, Q4_1, Q4_K, Q4_K_M, Q4_K_S |
Q5 | Q5_0, Q5_1, Q5_K, Q5_K_M, Q5_K_S |
Q6 | Q6_K |
Q8 | Q8_0 |
Additional:
BITs | TYPE/float |
---|---|
16 | f16 |
32 | f32 |
Disclaimer
I don't claim any rights on this model. All rights go to google.
Acknowledgements
- Downloads last month
- 791
Model tree for dumb-dev/flan-t5-xxl-gguf
Base model
google/flan-t5-xxl