EXL2 quants of alpindale/goliath-120b (https://huggingface.co/alpindale/goliath-120b), to be used on exllamav2.

Calibration dataset is wikitext. I've added a measurement.json file on the main branch if you want to do your own quants.

IMPORTANT: For the 3BPW quant, and if using ooba text gen, disable BOS Token, else you will get gibberish, see https://huggingface.co/Panchovix/goliath-120b-exl2/discussions/1

4.85bpw

4.5bpw

3bpw

Original Model card

Goliath 120B

An auto-regressive causal LM created by combining 2x finetuned Llama-2 70B into one.

Please check out the quantized formats provided by @TheBloke and @Panchovix:

  • GGUF (llama.cpp)
  • GPTQ (KoboldAI, TGW, Aphrodite)
  • AWQ (TGW, Aphrodite, vLLM)
  • Exllamav2 (TGW, KoboldAI)

Prompting Format

Both Vicuna and Alpaca will work, but due the initial and final layers belonging primarily to Xwin, I expect Vicuna to work the best.

Merge process

The models used in the merge are Xwin and Euryale.

The layer ranges used are as follows:

- range 0, 16
  Xwin
- range 8, 24
  Euryale
- range 17, 32
  Xwin
- range 25, 40
  Euryale
- range 33, 48
  Xwin
- range 41, 56
  Euryale
- range 49, 64
  Xwin
- range 57, 72
  Euryale
- range 65, 80
  Xwin

Screenshots

image/png

Benchmarks

Coming soon.

Acknowledgements

Credits goes to @chargoddard for developing the framework used to merge the model - mergekit.

Special thanks to @Undi95 for helping with the merge ratios.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .