WizardLM 70B V1.0 β EXL2
- Model creator: WizardLM
- FP32 Original model used for quantization: WizardLM 70B V1.0 β float32
- FP16 Model used for quantization: WizardLM 70B V1.0-HF β float16 of WizardLM 70B V1.0
- BF16 Model used for quantization: WizardLM 70B V1.0-BF16 β bfloat16 of WizardLM 70B V1.0
Models available:
Link | BITS (-b) | HEAD BITS (-hb) | MEASU-REMENT LENGTH (-ml) | LENGTH (-l) | CAL DATASET (-c) | Size | V. | Max Context Length | Base Model | Layers | VRAM Min*** | VRAM Max*** | PPL** | Comments |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
here | 4.0 | 6 | 2048 | 2048 | 0000.parquet* | 33GB | 0.0.2 | 4096 | FP32 | 80 | 39GB | 44GB | 4.15234375 | Good results |
here | 4.0 | 6 | 2048 | 2048 | 0000.parquet* | 33GB | 0.0.2 | 4096 | BF16 | 80 | 39GB | 44GB | 4.2421875 | Model suffers from poor prompt understanding and logic is affected |
here | 4.0 | 8 | 2048 | 2048 | 0000.parquet* | 35GB | 0.0.2 | 4096 | FP16 | 80 | 39GB | 44GB | 4.24609375 | Model suffers from poor prompt understanding and logic is affected |
here | 5.0 | 6 | 2048 | 2048 | 0000.parquet* | 41GB | 0.0.2 | 4096 | FP32 | 80 | 47GB | 52GB | 4.06640625 | Best so far. Good results |
here | 5.0 | 8 | 2048 | 2048 | 0000.parquet* | 44GB | 0.0.2 | 4096 | FP16 | 80 | 48GB | 52GB | 4.09765625 | Model suffers from poor prompt understanding and logic is affected |
here | 5.0 | 6 | 2048 | 2048 | 0000.parquet* | 44GB | 0.0.1 | 4096 | FP16 | 80 | 48GB | 52GB | 4.0625 | Model suffers from poor prompt understanding and logic is affected |
here | 5.0 | 6 | 2048 | 2048 | 0000.parquet* | 41GB | 0.0.2 | 4096 | BF16 | 80 | 47GB | 52GB | 4.09765625 | Model suffers from poor prompt understanding and logic is affected |
here | 6.0 | 6 | 2048 | 2048 | 0000.parquet* | 49GB | 0.0.2 | 4096 | FP16 | 80 | 56GB | 60GB | 4.0703125 | Model suffers from poor prompt understanding and logic is affected |
* wikitext-2-raw-v1
** Evaluated with text-generation-webui ExLlama v0.0.2 on wikitext-2-raw-v1 (stride 512 and max_length 0). For reference, TheBloke_WizardLM-70B-V1.0-GPTQ_gptq-4bit-32g-actorder_True has a score of 4.1015625 in perplexity.
*** Without Flash Attention - For better VRAM optimisation, make sure you install https://github.com/Dao-AILab/flash-attention#installation-and-features
Description:
This repository contains EXL2 model files for WizardLM's WizardLM 70B V1.0.
EXL2 is a new format used by ExLlamaV2 β https://github.com/turboderp/exllamav2. EXL2 is based on the same optimization method as GPTQ. The format allows for mixing quantization levels within a model to achieve any average bitrate between 2 and 8 bits per weight.
Prompt template (official):
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {prompt} ASSISTANT:
Prompt template (suggested):
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
USER:
{prompt}
ASSISTANT:
Quantization process:
Original Model | β | (optional) float16 or bfloat16 Model* | β | Safetensors Model** | β | EXL2 Model |
---|---|---|---|---|---|---|
WizardLM 70B V1.0 | β | WizardLM 70B V1.0-HF* | β | Safetensors** | β | EXL2 |
Example to convert WizardLM-70B-V1.0-HF to EXL2 4.0 bpw with 6-bit head:
mkdir -p ~/EXL2/WizardLM-70B-V1.0-HF_4bit # Create the output directory
python convert.py -i ~/float16_safetensored/WizardLM-70B-V1.0-HF -o ~/EXL2/WizardLM-70B-V1.0-HF_4bit -c ~/EXL2/0000.parquet -b 4.0 -hb 6
* Use the following script to convert your local pytorch_model bin files to float16 (you can also choose bfloat16) + safetensors all in one go:
- https://github.com/oobabooga/text-generation-webui/blob/main/convert-to-safetensors.py (best for sharding and float16/FP16 or bfloat16/BF16 conversion)
Example to convert WizardLM 70B V1.0 directly to float16 safetensors in 10GB shards:
python convert-to-safetensors.py ~/original/WizardLM-70B-V1.0 --output ~/float16_safetensored/WizardLM-70B-V1.0 --max-shard-size 10GB
Use --bf16
if you'd like to try bfloat16 instead, but note that there are concerns about quantization quality β https://github.com/turboderp/exllamav2/issues/30#issuecomment-1719009289
** Use any one of the following scripts to convert your local pytorch_model bin files to safetensors:
- https://github.com/turboderp/exllamav2/blob/master/util/convert_safetensors.py (official ExLlamaV2)
- https://huggingface.co/Panchovix/airoboros-l2-70b-gpt4-1.4.1-safetensors/blob/main/bin2safetensors/convert.py (recommended)
- https://gist.github.com/epicfilemcnulty/1f55fd96b08f8d4d6693293e37b4c55e#file-2safetensors-py
Further reading:
- Downloads last month
- 14