Squish42
/

WizardLM-7B-Uncensored-GPTQ-act_order-8bit

Text Generation

Inference Endpoints

text-generation-inference

Model card Files Files and versions Community

WizardLM-7B-Uncensored-GPTQ-act_order-8bit / README.md

Squish42's picture

README formatting

835febd about 1 year ago

|

raw history blame contribute delete

1.03 kB

	---
	license: unknown
	---

	[ehartford/WizardLM-7B-Uncensored](https://huggingface.co/ehartford/WizardLM-7B-Uncensored) quantized to 8bit GPTQ with act order + true sequential, no group size.

	For most uses this probably isn't what you want. \
	For 4bit with no act order or compatibility with `old-cuda` (text-generation-webui default) see [TheBloke/WizardLM-7B-uncensored-GPTQ](https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GPTQ)

	Quantized using AutoGPTQ with the following config:
	```python
	config: dict = dict(
	quantize_config=dict(bits=8, desc_act=True, true_sequential=True, model_file_base_name='WizardLM-7B-Uncensored'),
	use_safetensors=True
	)
	```
	See `quantize.py` for the full script.

	Tested for compatibility with:
	- WSL with GPTQ-for-Llama `triton` branch.
	- Windows with AutoGPTQ on `cuda` (triton deselected)

	AutoGPTQ loader should read configuration from `quantize_config.json`.\
	For GPTQ-for-Llama use the following configuration when loading:\
	wbits: 8\
	groupsize: None\
	model_type: llama