license: llama2
EXL2 quant of alpindale/goliath-120b (https://huggingface.co/alpindale/goliath-120b), to be used on exllamav2. 4.25bpw to being to able to use CFG comfortably on 72GB VRAM.
Calibration dataset is a cleaned, fixed pippa RP dataset, which does affect the results (in favor) for RP usage.
You can find the calibration dataset here (You will need to be on TheBloke server to be able to download it)
I've added a measurement.json file if you want to do your own quants.
Original model card
Goliath 120B
An auto-regressive causal LM created by combining 2x finetuned Llama-2 70B into one.
Please check out the quantized formats provided by @TheBloke and @Panchovix:
- GGUF (llama.cpp)
- GPTQ (KoboldAI, TGW, Aphrodite)
- AWQ (TGW, Aphrodite, vLLM)
- Exllamav2 (TGW, KoboldAI)
Prompting Format
Both Vicuna and Alpaca will work, but due the initial and final layers belonging primarily to Xwin, I expect Vicuna to work the best.
Merge process
The models used in the merge are Xwin and Euryale.
The layer ranges used are as follows:
- range 0, 16
Xwin
- range 8, 24
Euryale
- range 17, 32
Xwin
- range 25, 40
Euryale
- range 33, 48
Xwin
- range 41, 56
Euryale
- range 49, 64
Xwin
- range 57, 72
Euryale
- range 65, 80
Xwin
Screenshots
Benchmarks
Coming soon.
Acknowledgements
Credits goes to @chargoddard for developing the framework used to merge the model - mergekit.
Special thanks to @Undi95 for helping with the merge ratios.