Thireus commited on
Commit
38ec656
1 Parent(s): 40b2d72

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -0
README.md CHANGED
@@ -1,3 +1,61 @@
1
  ---
 
2
  license: llama2
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ inference: false
3
  license: llama2
4
+ model_creator: WizardLM
5
+ model_link: https://huggingface.co/WizardLM/WizardLM-70B-V1.0
6
+ model_name: WizardLM 70B V1.0
7
+ model_type: llama
8
+ quantized_by: Thireus
9
  ---
10
+
11
+ # WizardLM 70B V1.0 - EXL2
12
+ - Model creator: [WizardLM](https://huggingface.co/WizardLM)
13
+ - Original model: [WizardLM 70B V1.0](https://huggingface.co/WizardLM/WizardLM-70B-V1.0)
14
+ - Quantized model: [WizardLM 70B V1.0-HF](https://huggingface.co/simsim314/WizardLM-70B-V1.0-HF) - float16 of WizardLM 70B V1.0
15
+
16
+ | Branch | BITS (-b) | HEAD_BITS (-hb) | MEASUREMENT_LENGTH (-ml) | LENGTH (-l) | CAL_DATASET (-c) | Size | ExLlama | Max Context Length | Desc |
17
+ | ------ | ---- | -- | --------- | ------ | ------------ | ------- | ---- | ------- | ---- |
18
+ | [main](https://huggingface.co/Thireus/WizardLM-70B-V1.0-HF-4.0bpw-h6-exl2/tree/main) | 4.0 | 6 | 2048 | 2048 | [0000.parquet - wikitext-2-raw-v1](https://huggingface.co/datasets/wikitext/tree/refs%2Fconvert%2Fparquet/wikitext-2-raw-v1/train) | 33GB | [V2](https://github.com/turboderp/exllamav2) | 4096 | Equivalent, in theory, to QPTQ 4-bit. |
19
+
20
+ ## Description
21
+
22
+ This repository contains EXL2 model files for [WizardLM's WizardLM 70B V1.0](https://huggingface.co/WizardLM/WizardLM-70B-V1.0).
23
+
24
+ EXL2 is the new format used by ExLlamaV2 - https://github.com/turboderp/exllamav2. EXL2 is based on the same optimization method as GPTQ. The format allows for mixing quantization
25
+ levels within a model to achieve any average bitrate between 2 and 8 bits per weight.
26
+
27
+ ## Prompt template (official): Vicuna
28
+
29
+ ```
30
+ A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {prompt} ASSISTANT:
31
+
32
+ ```
33
+
34
+ ## Prompt template (Thireus' own suggestion):
35
+
36
+ ```
37
+ A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
38
+ USER:
39
+ {prompt}
40
+ ASSISTANT:
41
+
42
+ ```
43
+
44
+ ## Quantization process
45
+
46
+ Original Model --> Float16 Model --> Safetensor Model --> EXL2 Model
47
+
48
+ Example:
49
+ [WizardLM 70B V1.0](https://huggingface.co/WizardLM/WizardLM-70B-V1.0) --> [WizardLM 70B V1.0-HF](https://huggingface.co/simsim314/WizardLM-70B-V1.0-HF) --> Safetensor --> EXL2
50
+
51
+ Use any one of the following scripts to convert your float16 pytorch_model bin files to safetensors:
52
+ - https://github.com/turboderp/exllamav2/blob/master/util/convert_safetensors.py
53
+ - https://huggingface.co/Panchovix/airoboros-l2-70b-gpt4-1.4.1-safetensors/blob/main/bin2safetensors/convert.py
54
+ - https://gist.github.com/epicfilemcnulty/1f55fd96b08f8d4d6693293e37b4c55e
55
+ - https://github.com/oobabooga/text-generation-webui/blob/main/convert-to-safetensors.py
56
+
57
+ Example to convert WizardLM-70B-V1.0-HF_float16_safetensored to EXL2 4.0 bpw with 6-bit head:
58
+ ```
59
+ mkdir -p ~/EXL2/WizardLM-70B-V1.0-HF_4bit # Create the output directory
60
+ python convert.py -i ~/safetensor/WizardLM-70B-V1.0-HF_float16_safetensored -o ~/EXL2/WizardLM-70B-V1.0-HF_4bit -c ~/EXL2/0000.parquet -b 4.0 -hb 6
61
+ ```