askmyteapot
/

GPT4-X-Alpasta-30b-4bit

Text Generation

Inference Endpoints

Model card Files Files and versions Community

GPT4-X-Alpasta-30b-4bit / README.md

askmyteapot's picture

Create README.md

bd9ae91 over 1 year ago

|

history blame contribute delete

701 Bytes

This is a 4bit quant of https://huggingface.co/MetaIX/GPT4-X-Alpasta-30b

My secret sauce:

Using comit 3c16fd9 of 0cc4m's GPTQ fork
Using C4 as the calibration dataset
Act-order, True-sequential, percdamp 0.1 (the default percdamp is 0.01)
No groupsize
Will run with CUDA, does not need triton.
Quant completed on a 'Premium GPU' and 'High Memory' Google Colab.

Benchmark results

Model	C4	WikiText2	PTB
MetaIX's FP16	6.98400259	4.607768536	9.414786339
This Quant	7.292364597	4.954069614	9.754593849