OPT-30B-Erebus-4bit-128g

Model description

Warning: THIS model is NOT suitable for use by minors. The model will output X-rated content.

This is a 4-bit GPTQ quantization of OPT-30B-Erebus, original model: https://huggingface.co/KoboldAI/OPT-30B-Erebus

Quantization Information

Quantized with: https://github.com/0cc4m/GPTQ-for-LLaMa

python opt.py --wbits 4 models/OPT-30B-Erebus c4 --groupsize 128 --save models/OPT-30B-Erebus-4bit-128g/OPT-30B-Erebus-4bit-128g.pt
python opt.py --wbits 4 models/OPT-30B-Erebus c4 --groupsize 128 --save_safetensors models/OPT-30B-Erebus-4bit-128g/OPT-30B-Erebus-4bit-128g.safetensors

Output generated in 54.23 seconds (0.87 tokens/s, 47 tokens, context 44, seed 593020441)

Command text-generation-webui:

https://github.com/oobabooga/text-generation-webui

call python server.py --model_type gptj --model OPT-30B-Erebus-4bit-128g --chat --wbits 4 --groupsize 128 --xformers --sdp-attention

Credit

https://huggingface.co/notstoic

License

OPT-30B is licensed under the OPT-175B license, Copyright (c) Meta Platforms, Inc. All Rights Reserved.

Downloads last month
722
Inference Examples
Inference API (serverless) has been turned off for this model.

Dataset used to train Zicara/OPT-30B-Erebus-4bit-128g