Thireus
/

WizardLM-70B-V1.0-HF-4.0bpw-h6-exl2

Text Generation

Model card Files Files and versions Community

WizardLM-70B-V1.0-HF-4.0bpw-h6-exl2 / README.md

Thireus's picture

Update README.md

38ec656 over 1 year ago

|

3.04 kB

	---
	inference: false
	license: llama2
	model_creator: WizardLM
	model_link: https://huggingface.co/WizardLM/WizardLM-70B-V1.0
	model_name: WizardLM 70B V1.0
	model_type: llama
	quantized_by: Thireus
	---

	# WizardLM 70B V1.0 - EXL2
	- Model creator: [WizardLM](https://huggingface.co/WizardLM)
	- Original model: [WizardLM 70B V1.0](https://huggingface.co/WizardLM/WizardLM-70B-V1.0)
	- Quantized model: [WizardLM 70B V1.0-HF](https://huggingface.co/simsim314/WizardLM-70B-V1.0-HF) - float16 of WizardLM 70B V1.0

	\| Branch \| BITS (-b) \| HEAD_BITS (-hb) \| MEASUREMENT_LENGTH (-ml) \| LENGTH (-l) \| CAL_DATASET (-c) \| Size \| ExLlama \| Max Context Length \| Desc \|
	\| ------ \| ---- \| -- \| --------- \| ------ \| ------------ \| ------- \| ---- \| ------- \| ---- \|
	\| [main](https://huggingface.co/Thireus/WizardLM-70B-V1.0-HF-4.0bpw-h6-exl2/tree/main) \| 4.0 \| 6 \| 2048 \| 2048 \| [0000.parquet - wikitext-2-raw-v1](https://huggingface.co/datasets/wikitext/tree/refs%2Fconvert%2Fparquet/wikitext-2-raw-v1/train) \| 33GB \| [V2](https://github.com/turboderp/exllamav2) \| 4096 \| Equivalent, in theory, to QPTQ 4-bit. \|

	## Description

	This repository contains EXL2 model files for [WizardLM's WizardLM 70B V1.0](https://huggingface.co/WizardLM/WizardLM-70B-V1.0).

	EXL2 is the new format used by ExLlamaV2 - https://github.com/turboderp/exllamav2. EXL2 is based on the same optimization method as GPTQ. The format allows for mixing quantization
	levels within a model to achieve any average bitrate between 2 and 8 bits per weight.

	## Prompt template (official): Vicuna

	```
	A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {prompt} ASSISTANT:

	```

	## Prompt template (Thireus' own suggestion):

	```
	A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
	USER:
	{prompt}
	ASSISTANT:

	```

	## Quantization process

	Original Model --> Float16 Model --> Safetensor Model --> EXL2 Model

	Example:
	[WizardLM 70B V1.0](https://huggingface.co/WizardLM/WizardLM-70B-V1.0) --> [WizardLM 70B V1.0-HF](https://huggingface.co/simsim314/WizardLM-70B-V1.0-HF) --> Safetensor --> EXL2

	Use any one of the following scripts to convert your float16 pytorch_model bin files to safetensors:
	- https://github.com/turboderp/exllamav2/blob/master/util/convert_safetensors.py
	- https://huggingface.co/Panchovix/airoboros-l2-70b-gpt4-1.4.1-safetensors/blob/main/bin2safetensors/convert.py
	- https://gist.github.com/epicfilemcnulty/1f55fd96b08f8d4d6693293e37b4c55e
	- https://github.com/oobabooga/text-generation-webui/blob/main/convert-to-safetensors.py

	Example to convert WizardLM-70B-V1.0-HF_float16_safetensored to EXL2 4.0 bpw with 6-bit head:
	```
	mkdir -p ~/EXL2/WizardLM-70B-V1.0-HF_4bit # Create the output directory
	python convert.py -i ~/safetensor/WizardLM-70B-V1.0-HF_float16_safetensored -o ~/EXL2/WizardLM-70B-V1.0-HF_4bit -c ~/EXL2/0000.parquet -b 4.0 -hb 6
	```