camenduru commited on
Commit
51a6df1
1 Parent(s): aa2086e

thanks to TheBloke ❤

Browse files
README.md ADDED
@@ -0,0 +1,132 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ inference: false
4
+ ---
5
+ # Vicuna 13B 1.1 GPTQ 4bit 128g
6
+
7
+ This is a 4-bit GPTQ version of the [Vicuna 13B 1.1 model](https://huggingface.co/lmsys/vicuna-13b-delta-v1.1).
8
+
9
+ It was created by merging the deltas provided in the above repo with the original Llama 13B model, [using the code provided on their Github page](https://github.com/lm-sys/FastChat#vicuna-weights).
10
+
11
+ It was then quantized to 4bit using [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa).
12
+
13
+ ## Want to try this in Colab for free?
14
+
15
+ Check out this Google Colab provided by [eucdee](https://huggingface.co/eucdee): [Google Colab for Vicuna 1.1](https://colab.research.google.com/github/eucdee/AI/blob/main/4bit_TextGen_Gdrive.ipynb)
16
+
17
+ ## My Vicuna 1.1 model repositories
18
+
19
+ I have the following Vicuna 1.1 repositories available:
20
+
21
+ **13B models:**
22
+ * [Unquantized 13B 1.1 model for GPU - HF format](https://huggingface.co/TheBloke/vicuna-13B-1.1-HF)
23
+ * [GPTQ quantized 4bit 13B 1.1 for GPU - `safetensors` and `pt` formats](https://huggingface.co/TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g)
24
+ * [GPTQ quantized 4bit 13B 1.1 for CPU - GGML format for `llama.cpp`](https://huggingface.co/TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g-GGML)
25
+
26
+ **7B models:**
27
+ * [Unquantized 7B 1.1 model for GPU - HF format](https://huggingface.co/TheBloke/vicuna-7B-1.1-HF)
28
+ * [GPTQ quantized 4bit 7B 1.1 for GPU - `safetensors` and `pt` formats](https://huggingface.co/TheBloke/vicuna-7B-1.1-GPTQ-4bit-128g)
29
+ * [GPTQ quantized 4bit 7B 1.1 for CPU - GGML format for `llama.cpp`](https://huggingface.co/TheBloke/vicuna-7B-1.1-GPTQ-4bit-128g-GGML)
30
+
31
+ ## GIBBERISH OUTPUT
32
+
33
+ If you get gibberish output, it is because you are using the `safetensors` file without updating GPTQ-for-LLaMA.
34
+
35
+ If you use the `safetensors` file you must have the latest version of GPTQ-for-LLaMA inside text-generation-webui.
36
+
37
+ If you don't want to update, or you can't, use the `pt` file instead.
38
+
39
+ Either way, please read the instructions below carefully.
40
+
41
+ ## Provided files
42
+
43
+ Two model files are provided. Ideally use the `safetensors` file. Full details below:
44
+
45
+ Details of the files provided:
46
+ * `vicuna-13B-1.1-GPTQ-4bit-128g.safetensors`
47
+ * `safetensors` format, with improved file security, created with the latest [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa) code.
48
+ * Command to create:
49
+ * `python3 llama.py vicuna-13B-1.1-HF c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors vicuna-13B-1.1-GPTQ-4bit-128g.safetensors`
50
+ * `vicuna-13B-1.1-GPTQ-4bit-128g.no-act-order.pt`
51
+ * `pt` format file, created without the `--act-order` flag.
52
+ * This file may have slightly lower quality, but is included as it can be used without needing to compile the latest GPTQ-for-LLaMa code.
53
+ * It should hopefully therefore work with one-click-installers on Windows, which include the older GPTQ-for-LLaMa code.
54
+ * Command to create:
55
+ * `python3 llama.py vicuna-13B-1.1-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors vicuna-13B-1.1-GPTQ-4bit-128g.no-act-order.pt`
56
+
57
+ ## How to run in `text-generation-webui`
58
+
59
+ File `vicuna-13B-1.1-GPTQ-4bit-128g.no-act-order.pt` can be loaded the same as any other GPTQ file, without requiring any updates to [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui).
60
+
61
+ The `safetensors` model file was created with the latest GPTQ code, and uses `--act-order` to give the maximum possible quantisation quality, but this means it requires that the latest GPTQ-for-LLaMa is used inside the UI.
62
+
63
+ If you want to use the `safetensors` file and need to update GPTQ-for-LLaMa, here are the commands I used to clone the Triton branch of GPTQ-for-LLaMa, clone text-generation-webui, and install GPTQ into the UI:
64
+ ```
65
+ # We need to clone GPTQ-for-LLaMa as of April 13th, due to breaking changes in more recent commits
66
+ git clone -n https://github.com/qwopqwop200/GPTQ-for-LLaMa gptq-safe
67
+ cd gptq-safe && git checkout 58c8ab4c7aaccc50f507fd08cce941976affe5e0
68
+
69
+ # Now clone text-generation-webui, if you don't already have it
70
+ git clone https://github.com/oobabooga/text-generation-webui
71
+ # And link GPTQ-for-Llama into text-generation-webui
72
+ mkdir -p text-generation-webui/repositories
73
+ ln -s gptq-safe text-generation-webui/repositories/GPTQ-for-LLaMa
74
+ ```
75
+
76
+ Then install this model into `text-generation-webui/models` and launch the UI as follows:
77
+ ```
78
+ cd text-generation-webui
79
+ python server.py --model vicuna-13B-1.1-GPTQ-4bit-128g --wbits 4 --groupsize 128 --model_type Llama # add any other command line args you want
80
+ ```
81
+
82
+ The above commands assume you have installed all dependencies for GPTQ-for-LLaMa and text-generation-webui. Please see their respective repositories for further information.
83
+
84
+ If you are on Windows, or cannot use the Triton branch of GPTQ for any other reason, you can instead use the CUDA branch:
85
+ ```
86
+ git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa -b cuda
87
+ cd GPTQ-for-LLaMa
88
+ python setup_cuda.py install
89
+ ```
90
+ Then link that into `text-generation-webui/repositories` as described above.
91
+
92
+ Or just use `vicuna-13B-1.1-GPTQ-4bit-128g.no-act-order.pt` as mentioned above, which should work without any upgrades to text-generation-webui.
93
+
94
+ # Vicuna Model Card
95
+
96
+ ## Model details
97
+
98
+ **Model type:**
99
+ Vicuna is an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.
100
+ It is an auto-regressive language model, based on the transformer architecture.
101
+
102
+ **Model date:**
103
+ Vicuna was trained between March 2023 and April 2023.
104
+
105
+ **Organizations developing the model:**
106
+ The Vicuna team with members from UC Berkeley, CMU, Stanford, and UC San Diego.
107
+
108
+ **Paper or resources for more information:**
109
+ https://vicuna.lmsys.org/
110
+
111
+ **License:**
112
+ Apache License 2.0
113
+
114
+ **Where to send questions or comments about the model:**
115
+ https://github.com/lm-sys/FastChat/issues
116
+
117
+ ## Intended use
118
+ **Primary intended uses:**
119
+ The primary use of Vicuna is research on large language models and chatbots.
120
+
121
+ **Primary intended users:**
122
+ The primary intended users of the model are researchers and hobbyists in natural language processing, machine learning, and artificial intelligence.
123
+
124
+ ## Training dataset
125
+ 70K conversations collected from ShareGPT.com.
126
+
127
+ ## Evaluation dataset
128
+ A preliminary evaluation of the model quality is conducted by creating a set of 80 diverse questions and utilizing GPT-4 to judge the model outputs. See https://vicuna.lmsys.org/ for more details.
129
+
130
+ ## Major updates of weights v1.1
131
+ - Refactor the tokenization and separator. In Vicuna v1.1, the separator has been changed from `"###"` to the EOS token `"</s>"`. This change makes it easier to determine the generation stop criteria and enables better compatibility with other libraries.
132
+ - Fix the supervised fine-tuning loss computation for better model quality.
config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/content/llama-13b-hf",
3
+ "architectures": [
4
+ "LlamaForCausalLM"
5
+ ],
6
+ "bos_token_id": 0,
7
+ "eos_token_id": 1,
8
+ "hidden_act": "silu",
9
+ "hidden_size": 5120,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 13824,
12
+ "max_position_embeddings": 2048,
13
+ "max_sequence_length": 2048,
14
+ "model_type": "llama",
15
+ "num_attention_heads": 40,
16
+ "num_hidden_layers": 40,
17
+ "pad_token_id": -1,
18
+ "rms_norm_eps": 1e-06,
19
+ "tie_word_embeddings": false,
20
+ "torch_dtype": "float16",
21
+ "transformers_version": "4.28.0.dev0",
22
+ "use_cache": true,
23
+ "vocab_size": 32000
24
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 0,
4
+ "eos_token_id": 1,
5
+ "pad_token_id": 0,
6
+ "transformers_version": "4.28.0.dev0"
7
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "unk_token": {
17
+ "content": "<unk>",
18
+ "lstrip": false,
19
+ "normalized": true,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ }
23
+ }
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
3
+ size 499723
tokenizer_config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "bos_token": {
5
+ "__type": "AddedToken",
6
+ "content": "<s>",
7
+ "lstrip": false,
8
+ "normalized": true,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "clean_up_tokenization_spaces": false,
13
+ "eos_token": {
14
+ "__type": "AddedToken",
15
+ "content": "</s>",
16
+ "lstrip": false,
17
+ "normalized": true,
18
+ "rstrip": false,
19
+ "single_word": false
20
+ },
21
+ "model_max_length": 1000000000000000019884624838656,
22
+ "pad_token": null,
23
+ "sp_model_kwargs": {},
24
+ "tokenizer_class": "LlamaTokenizer",
25
+ "unk_token": {
26
+ "__type": "AddedToken",
27
+ "content": "<unk>",
28
+ "lstrip": false,
29
+ "normalized": true,
30
+ "rstrip": false,
31
+ "single_word": false
32
+ }
33
+ }
vicuna-13B-1.1-GPTQ-4bit-128g.no-act-order.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f987cc06d0a6bc9a87b875ee00a1ca908974e0b4ed9664db6a31a6c66b39ce47
3
+ size 7255476788
vicuna-13B-1.1-GPTQ-4bit-128g.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e47a7a68ed4230004e08e83730247625a55cd7493cebadc7be9abf9c3a7275ea
3
+ size 7255159218