metadata

inference: false
license: llama2
model_creator: WizardLM
model_link: https://huggingface.co/WizardLM/WizardLM-70B-V1.0
model_name: WizardLM 70B V1.0
model_type: llama
quantized_by: Thireus

WizardLM 70B V1.0 - EXL2

Model creator: WizardLM
Original model: WizardLM 70B V1.0
Quantized model: WizardLM 70B V1.0-HF - float16 of WizardLM 70B V1.0

Branch	BITS (-b)	HEAD_BITS (-hb)	MEASUREMENT_LENGTH (-ml)	LENGTH (-l)	CAL_DATASET (-c)	Size	ExLlama	Max Context Length	Desc
main	4.0	6	2048	2048	0000.parquet - wikitext-2-raw-v1	33GB	V2	4096	Equivalent, in theory, to QPTQ 4-bit.

Description

This repository contains EXL2 model files for WizardLM's WizardLM 70B V1.0.

EXL2 is the new format used by ExLlamaV2 - https://github.com/turboderp/exllamav2. EXL2 is based on the same optimization method as GPTQ. The format allows for mixing quantization levels within a model to achieve any average bitrate between 2 and 8 bits per weight.

Prompt template (official): Vicuna

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {prompt} ASSISTANT:

Prompt template (Thireus' own suggestion):

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
USER:
{prompt}
ASSISTANT:

Quantization process

Original Model --> Float16 Model --> Safetensor Model --> EXL2 Model

Example: WizardLM 70B V1.0 --> WizardLM 70B V1.0-HF --> Safetensor --> EXL2

Use any one of the following scripts to convert your float16 pytorch_model bin files to safetensors:

Example to convert WizardLM-70B-V1.0-HF_float16_safetensored to EXL2 4.0 bpw with 6-bit head:

mkdir -p ~/EXL2/WizardLM-70B-V1.0-HF_4bit # Create the output directory
python convert.py -i ~/safetensor/WizardLM-70B-V1.0-HF_float16_safetensored -o ~/EXL2/WizardLM-70B-V1.0-HF_4bit -c ~/EXL2/0000.parquet -b 4.0 -hb 6