Text Generation
English
bartowski commited on
Commit
ee0edae
1 Parent(s): d52cf96

Quant for 4.25

Browse files
README.md CHANGED
@@ -7,63 +7,91 @@ datasets:
7
  - argilla/dpo-mix-7k
8
  language:
9
  - en
10
- quantized_by: bartowski
11
- pipeline_tag: text-generation
12
  ---
13
-
14
- ## Exllama v2 Quantizations of sparsetral-16x7B-v2-SPIN_iter0
15
-
16
- Using <a href="https://github.com/turboderp/exllamav2/releases/tag/v0.0.13">turboderp's ExLlamaV2 v0.0.13</a> for quantization.
17
-
18
- <b>The "main" branch only contains the measurement.json, download one of the other branches for the model (see below)</b>
19
-
20
- Each branch contains an individual bits per weight, with the main one containing only the meaurement.json for further conversions.
21
-
22
- Original model: https://huggingface.co/serpdotai/sparsetral-16x7B-v2-SPIN_iter0
23
-
24
- | Branch | Bits | lm_head bits | VRAM (4k) | VRAM (16k) | VRAM (32k) | Description |
25
- | ----- | ---- | ------- | ------ | ------ | ------ | ------------ |
26
- | [8_0](https://huggingface.co/bartowski/sparsetral-16x7B-v2-SPIN_iter0-exl2/tree/8_0) | 8.0 | 8.0 | 8.3 GB | 9.7 GB | 11.8 GB | Maximum quality that ExLlamaV2 can produce, near unquantized performance. |
27
- | [6_5](https://huggingface.co/bartowski/sparsetral-16x7B-v2-SPIN_iter0-exl2/tree/6_5) | 6.5 | 8.0 | 7.1 GB | 8.5 GB | 10.6 GB | Very similar to 8.0, good tradeoff of size vs performance, **recommended**. |
28
- | [5_0](https://huggingface.co/bartowski/sparsetral-16x7B-v2-SPIN_iter0-exl2/tree/5_0) | 5.0 | 6.0 | 5.7 GB | 7.1 GB | 9.2 GB | Slightly lower quality vs 6.5, but usable on 8GB cards. |
29
- | [4_25](https://huggingface.co/bartowski/sparsetral-16x7B-v2-SPIN_iter0-exl2/tree/4_25) | 4.25 | 6.0 | 5.1 GB | 6.5 GB | 8.6 GB | GPTQ equivalent bits per weight, slightly higher quality. |
30
- | [3_5](https://huggingface.co/bartowski/sparsetral-16x7B-v2-SPIN_iter0-exl2/tree/3_5) | 3.5 | 6.0 | 4.4 GB | 5.8 GB | 7.9 GB | Lower quality, only use if you have to. |
31
-
32
- ## Download instructions
33
-
34
- With git:
35
-
36
- ```shell
37
- git clone --single-branch --branch 6_5 https://huggingface.co/bartowski/sparsetral-16x7B-v2-SPIN_iter0-exl2 sparsetral-16x7B-v2-SPIN_iter0-exl2-6_5
 
 
 
38
  ```
39
-
40
- With huggingface hub (credit to TheBloke for instructions):
41
-
42
- ```shell
43
- pip3 install huggingface-hub
44
  ```
45
 
46
- To download the `main` (only useful if you only care about measurement.json) branch to a folder called `sparsetral-16x7B-v2-SPIN_iter0-exl2`:
47
-
48
- ```shell
49
- mkdir sparsetral-16x7B-v2-SPIN_iter0-exl2
50
- huggingface-cli download bartowski/sparsetral-16x7B-v2-SPIN_iter0-exl2 --local-dir sparsetral-16x7B-v2-SPIN_iter0-exl2 --local-dir-use-symlinks False
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
  ```
52
 
53
- To download from a different branch, add the `--revision` parameter:
 
54
 
55
- Linux:
56
 
57
- ```shell
58
- mkdir sparsetral-16x7B-v2-SPIN_iter0-exl2-6_5
59
- huggingface-cli download bartowski/sparsetral-16x7B-v2-SPIN_iter0-exl2 --revision 6_5 --local-dir sparsetral-16x7B-v2-SPIN_iter0-exl2-6_5 --local-dir-use-symlinks False
60
- ```
61
-
62
- Windows (which apparently doesn't like _ in folders sometimes?):
63
-
64
- ```shell
65
- mkdir sparsetral-16x7B-v2-SPIN_iter0-exl2-6.5
66
- huggingface-cli download bartowski/sparsetral-16x7B-v2-SPIN_iter0-exl2 --revision 6_5 --local-dir sparsetral-16x7B-v2-SPIN_iter0-exl2-6.5 --local-dir-use-symlinks False
67
- ```
68
 
69
- Want to support my work? Visit my ko-fi page here: https://ko-fi.com/bartowski
 
7
  - argilla/dpo-mix-7k
8
  language:
9
  - en
 
 
10
  ---
11
+ This model is [sparsetral-16x7B-v2](https://huggingface.co/serpdotai/sparsetral-16x7B-v2) further tuned utilizing [SPIN](https://arxiv.org/abs/2401.01335) on [OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5) mixed with traditional DPO samples. This is iteration_0, plan to keep making iterations until improvements stop.
12
+
13
+ ## Training
14
+ - 8x A6000s
15
+ - Base model is [sparsetral-16x7B-v2](https://huggingface.co/serpdotai/sparsetral-16x7B-v2)
16
+ - [Forked version of unsloth](https://github.com/serp-ai/unsloth) for efficient training
17
+ - Sequence Length: 4096
18
+ - Effective batch size: 64
19
+ - Learning Rate: 5e-7 with linear decay (0.1 warmup ratio)
20
+ - Epochs: 2
21
+ - 50k samples (~15k traditional dpo samples, rest SPIN)
22
+ - QLoRA:
23
+ - 256 r and 256 alpha
24
+ - ```python
25
+ target_modules=[
26
+ "q_proj",
27
+ "k_proj",
28
+ "v_proj",
29
+ "o_proj",
30
+ "gate_proj",
31
+ "up_proj",
32
+ "down_proj",
33
+ "adapter_down",
34
+ "adapter_up",
35
+ ]
36
+ ```
37
+
38
+ ## Prompt Format
39
  ```
40
+ <|im_start|>system\n{message}<|im_end|>\n<|im_start|>user\n{message}<|im_end|>\n<|im_start|>assistant\n
 
 
 
 
41
  ```
42
 
43
+ ## Usage
44
+ ```python
45
+ from transformers import AutoModelForCausalLM, AutoTokenizer
46
+
47
+ tokenizer = AutoTokenizer.from_pretrained("serpdotai/sparsetral-16x7B-v2-SPIN_iter0", trust_remote_code=True)
48
+ model = AutoModelForCausalLM.from_pretrained("serpdotai/sparsetral-16x7B-v2-SPIN_iter0", device_map="auto", trust_remote_code=True).eval()
49
+
50
+ system_str = "<|im_start|>system\n{message}<|im_end|>\n"
51
+ user_str = "<|im_start|>user\n{message}<|im_end|>\n"
52
+ assistant_str = "<|im_start|>assistant\n{message}<|im_end|>\n"
53
+
54
+ def construct_prompt(messages):
55
+ prompt = ""
56
+ for message in messages:
57
+ if message["from"] in ["human", "user"]:
58
+ prompt += user_str.format(
59
+ message=message["value"]
60
+ )
61
+ elif message["from"] in ["gpt", "assistant"]:
62
+ prompt += assistant_str.format(
63
+ message=message["value"]
64
+ )
65
+ elif message["from"] in ["system", "instruction"]:
66
+ prompt += system_str.format(
67
+ message=message["value"]
68
+ )
69
+ else:
70
+ raise ValueError(
71
+ f"Unknown message type: {message['from']}"
72
+ )
73
+ return prompt + "<|im_start|>assistant\n"
74
+
75
+ system = "You are a helpful assistant who will help the user to the best of their ability. If you don't know something, say \"I don't know\""
76
+ user = "Are you sentient?"
77
+
78
+ messages = [
79
+ {"from": "system", "value": system},
80
+ {"from": "user", "value": user},
81
+ ]
82
+
83
+ prompt = construct_prompt(messages)
84
+ inputs = tokenizer(prompt, return_tensors="pt")
85
+ inputs = inputs.to(model.device)
86
+ pred = model.generate(**inputs, max_length=4096, do_sample=True, top_k=50, top_p=0.99, temperature=0.9, num_return_sequences=1)
87
+ print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
88
  ```
89
 
90
+ ## Other Information
91
+ Paper reference: [Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks](https://arxiv.org/abs/2401.02731)
92
 
93
+ [Original Paper repo](https://github.com/wuhy68/Parameter-Efficient-MoE)
94
 
95
+ [Forked repo with mistral support (sparsetral)](https://github.com/serp-ai/Parameter-Efficient-MoE)
 
 
 
 
 
 
 
 
 
 
96
 
97
+ If you are interested in faster inferencing, check out our [fork of vLLM](https://github.com/serp-ai/vllm) that adds sparsetral support
config.json ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "serpdotai/sparsetral-16x7B-v2",
3
+ "adapter_dim": 512,
4
+ "adapter_dropout": 0.0,
5
+ "architectures": [
6
+ "MistralForCausalLM"
7
+ ],
8
+ "attention_dropout": 0.0,
9
+ "auto_map": {
10
+ "AutoConfig": "serpdotai/sparsetral-16x7B-v2--configuration_sparsetral.SparsetralConfig",
11
+ "AutoModel": "serpdotai/sparsetral-16x7B-v2--modeling_sparsetral.MistralModel",
12
+ "AutoModelForCausalLM": "serpdotai/sparsetral-16x7B-v2--modeling_sparsetral.MistralForCausalLM"
13
+ },
14
+ "bos_token_id": 1,
15
+ "eos_token_id": 2,
16
+ "hidden_act": "silu",
17
+ "hidden_size": 4096,
18
+ "initializer_range": 0.02,
19
+ "intermediate_size": 14336,
20
+ "max_position_embeddings": 32768,
21
+ "model_type": "mistral",
22
+ "moe_dtype": "bfloat16",
23
+ "moe_scaling": 1,
24
+ "num_attention_heads": 32,
25
+ "num_experts": 16,
26
+ "num_hidden_layers": 32,
27
+ "num_key_value_heads": 8,
28
+ "output_router_logits": false,
29
+ "pretraining_tp": 1,
30
+ "rms_norm_eps": 1e-05,
31
+ "rope_theta": 1000000.0,
32
+ "router_aux_loss_coef": 0.01,
33
+ "sliding_window": null,
34
+ "tie_word_embeddings": false,
35
+ "topk": 4,
36
+ "torch_dtype": "bfloat16",
37
+ "transformers_version": "4.37.2",
38
+ "use_cache": true,
39
+ "vocab_size": 32000
40
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.37.2"
6
+ }
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
original_repo_url.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ https://huggingface.co/serpdotai/sparsetral-16x7B-v2-SPIN_iter0
output.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:948bc95778d72124823a490e30b46e015d416de8da01192f10e5ba3cc548b8a8
3
+ size 4074366700
special_tokens_map.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "unk_token": {
17
+ "content": "<unk>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ }
23
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "added_tokens_decoder": {
5
+ "0": {
6
+ "content": "<unk>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "1": {
14
+ "content": "<s>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "2": {
22
+ "content": "</s>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ }
29
+ },
30
+ "additional_special_tokens": [],
31
+ "bos_token": "<s>",
32
+ "chat_template": "{{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token}}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}",
33
+ "clean_up_tokenization_spaces": false,
34
+ "eos_token": "</s>",
35
+ "legacy": true,
36
+ "model_max_length": 1000000000000000019884624838656,
37
+ "pad_token": null,
38
+ "sp_model_kwargs": {},
39
+ "spaces_between_special_tokens": false,
40
+ "tokenizer_class": "LlamaTokenizer",
41
+ "unk_token": "<unk>",
42
+ "use_default_system_prompt": false
43
+ }