File size: 3,988 Bytes
40b2d72
38ec656
40b2d72
38ec656
 
 
 
 
40b2d72
38ec656
5144fcd
38ec656
 
2827354
38ec656
9d90c75
 
6bc41d9
6072ce0
24a748f
fad41ca
 
 
 
38ec656
9dcf1ce
38ec656
efd5fb2
38ec656
9d90c75
38ec656
 
585688f
38ec656
 
e5e5e5f
38ec656
 
9dcf1ce
38ec656
 
 
 
 
 
 
7951b59
38ec656
 
9dcf1ce
38ec656
71ec21e
7b81965
71ec21e
38ec656
511017b
9d90c75
38ec656
 
87db4c3
38ec656
9d90c75
71ec21e
 
 
 
 
 
9d90c75
68d241f
5cfe9bb
71ec21e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
---
inference: false
license: llama2
model_creator: WizardLM
model_link: https://huggingface.co/WizardLM/WizardLM-70B-V1.0
model_name: WizardLM 70B V1.0
model_type: llama
quantized_by: Thireus
---

# WizardLM 70B V1.0 – EXL2
- Model creator: [WizardLM](https://huggingface.co/WizardLM)
- Original model: [WizardLM 70B V1.0](https://huggingface.co/WizardLM/WizardLM-70B-V1.0)
- Model used for quantization: [WizardLM 70B V1.0-HF](https://huggingface.co/simsim314/WizardLM-70B-V1.0-HF) – float16 of [WizardLM 70B V1.0](https://huggingface.co/WizardLM/WizardLM-70B-V1.0)

## Models available in this repository

| Branch | BITS (-b) | HEAD BITS (-hb) | MEASUREMENT LENGTH (-ml) | LENGTH (-l) | CAL DATASET (-c) | Size | ExLlama | Max Context Length |
| ------ | --------- | --------------- | ------------------------ | ----------- | ---------------- | ---- | ------- | ------------------ |
| [main](https://huggingface.co/Thireus/WizardLM-70B-V1.0-HF-4.0bpw-h6-exl2/tree/main) | 4.0 | 6 | 2048 | 2048 | [0000.parquet](https://huggingface.co/datasets/wikitext/tree/refs%2Fconvert%2Fparquet/wikitext-2-raw-v1/train)* | 35GB | [v2](https://github.com/turboderp/exllamav2) | 4096 | 
| _coming soon..._ | 5.0 | 6 | 2048 | 2048 | [0000.parquet](https://huggingface.co/datasets/wikitext/tree/refs%2Fconvert%2Fparquet/wikitext-2-raw-v1/train)* | ...GB | [v2](https://github.com/turboderp/exllamav2) | 4096 | 
| _coming soon..._ | 6.0 | 6 | 2048 | 2048 | [0000.parquet](https://huggingface.co/datasets/wikitext/tree/refs%2Fconvert%2Fparquet/wikitext-2-raw-v1/train)* | ...GB | [v2](https://github.com/turboderp/exllamav2) | 4096 | 

\* wikitext-2-raw-v1

## Description:

_This repository contains EXL2 model files for [WizardLM's WizardLM 70B V1.0](https://huggingface.co/WizardLM/WizardLM-70B-V1.0)._

EXL2 is a new format used by ExLlamaV2 – https://github.com/turboderp/exllamav2. EXL2 is based on the same optimization method as GPTQ. The format allows for mixing quantization
levels within a model to achieve any average bitrate between 2 and 8 bits per weight.

## Prompt template (official):

```
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {prompt} ASSISTANT: 
```

## Prompt template (suggested):

```
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
USER:
{prompt}
ASSISTANT:


```

## Quantization process:

| Original Model | β†’ | Float16 Model* | β†’ | Safetensor Model** | β†’ | EXL2 Model |
| -------------- | --- | ------------- | --- | ---------------- | --- | ---------- |
| [WizardLM 70B V1.0](https://huggingface.co/WizardLM/WizardLM-70B-V1.0) | β†’ | [WizardLM 70B V1.0-HF](https://huggingface.co/simsim314/WizardLM-70B-V1.0-HF)* | β†’ | Safetensor** | β†’ | EXL2 |

Example to convert WizardLM-70B-V1.0-HF_float16_safetensored to EXL2 4.0 bpw with 6-bit head:

```
mkdir -p ~/EXL2/WizardLM-70B-V1.0-HF_4bit # Create the output directory
python convert.py -i ~/float16_safetensored/WizardLM-70B-V1.0-HF -o ~/EXL2/WizardLM-70B-V1.0-HF_4bit -c ~/EXL2/0000.parquet -b 4.0 -hb 6
```

\* Use the following script to convert your local pytorch_model bin files to float16 (you can also choose bfloat16) + safetensors all in one go:

- https://github.com/oobabooga/text-generation-webui/blob/main/convert-to-safetensors.py
 (best for sharding and float16/FP16 or bfloat16/BF16 conversion)

\*\* Use any one of the following scripts to convert your local pytorch_model bin files to safetensors:

- https://github.com/turboderp/exllamav2/blob/master/util/convert_safetensors.py (official ExLlamaV2)
- https://huggingface.co/Panchovix/airoboros-l2-70b-gpt4-1.4.1-safetensors/blob/main/bin2safetensors/convert.py (recommended if model already converted to float16)
- https://gist.github.com/epicfilemcnulty/1f55fd96b08f8d4d6693293e37b4c55e#file-2safetensors-py