Uploading EXL2 quants

#9
by bullerwins - opened
Gradient AI org
edited Apr 27

Amazing @bullerwins . Just curious:

  • How did you quantize them / which scripts?
  • which UI / Inference engine are you using for the exl2 quants and local inference?
Gradient AI org

@bullerwins Feel free to open a PR to the Readme!

Amazing @bullerwins . Just curious:

  • How did you quantize them / which scripts?
  • which UI / Inference engine are you using for the exl2 quants and local inference?

I'm using Turboderp's exllamav2 https://github.com/turboderp/exllamav2

Script:
python3 convert.py -i gradientai_Llama-3-8B-Instruct-262k/ -o /temp/ -cf gradientai_Llama-3-8B-Instruct-262k_exl2_5.0bpw/ -b 5.0

I'm testing it using Oobabooga's Textgen webui for inference: https://github.com/oobabooga/text-generation-webui

Gradient AI org

@bullerwins Working on a revised version (with better chat alignment). Would you be up for creating quants? Id be linking them in the readme again

Also: We released the better alignment for 70-B.
https://huggingface.co/gradientai/Llama-3-70B-Instruct-Gradient-262k

Gradient AI org

@bullerwins We just upgraded the weights, you should see a drastic improvment over the previous iteration.

Working on the upgraded 8B weights exl2 quants as well as 70B with better alignment

Sign up or log in to comment