francislabounty commited on
Commit
efa62c0
·
verified ·
1 Parent(s): 15b7356

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +99 -0
README.md CHANGED
@@ -1,3 +1,102 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ datasets:
4
+ - teknium/OpenHermes-2.5
5
+ - jondurbin/truthy-dpo-v0.1
6
+ - jondurbin/gutenberg-dpo-v0.1
7
+ - argilla/dpo-mix-7k
8
+ language:
9
+ - en
10
  ---
11
+ This model is [sparsetral-16x7B-v2](https://huggingface.co/serpdotai/sparsetral-16x7B-v2) further tuned utilizing [SPIN](https://arxiv.org/abs/2401.01335) on [OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5) mixed with traditional DPO samples. This is iteration_1, temporarily pausing further training runs in favor of utilizing [DoRA](https://arxiv.org/pdf/2402.09353.pdf) over [LoRA](https://arxiv.org/abs/2106.09685). May also start from the beginning with v3 for proper chat token support, also debating adding function tokens + function calling. If you have any tasks that Sparsetral has been weak at, feel free to send us some prompts/chats + desired completions and we will see about making sure your task is supported!
12
+
13
+ ![](https://i.imgflip.com/8g9jr4.jpg)
14
+
15
+ Kuru~ Kuru~
16
+ ![Kuru~ Kuru~](https://github.com/duiqt/herta_kuru/raw/main/static/img/hertaa_github.gif)
17
+
18
+ ## Training
19
+ - 8x A6000s
20
+ - Base model is [sparsetral-16x7B-v2-SPIN_iter0](https://huggingface.co/serpdotai/sparsetral-16x7B-v2-SPIN_iter0)
21
+ - [Forked version of unsloth](https://github.com/serp-ai/unsloth) for efficient training
22
+ - Sequence Length: 4096
23
+ - Effective batch size: 64
24
+ - Learning Rate: 5e-7 with linear decay (0.1 warmup ratio)
25
+ - Epochs: 2
26
+ - 100k samples (50K new SPIN + 50K from iter_0)
27
+ - QLoRA:
28
+ - 256 r and 256 alpha
29
+ - ```python
30
+ target_modules=[
31
+ "q_proj",
32
+ "k_proj",
33
+ "v_proj",
34
+ "o_proj",
35
+ "gate_proj",
36
+ "up_proj",
37
+ "down_proj",
38
+ "adapter_down",
39
+ "adapter_up",
40
+ ]
41
+ ```
42
+
43
+ ## Prompt Format
44
+ ```
45
+ <|im_start|>system\n{message}<|im_end|>\n<|im_start|>user\n{message}<|im_end|>\n<|im_start|>assistant\n
46
+ ```
47
+
48
+ ## Usage
49
+ ```python
50
+ from transformers import AutoModelForCausalLM, AutoTokenizer
51
+
52
+ tokenizer = AutoTokenizer.from_pretrained("serpdotai/sparsetral-16x7B-v2-SPIN_iter0", trust_remote_code=True)
53
+ model = AutoModelForCausalLM.from_pretrained("serpdotai/sparsetral-16x7B-v2-SPIN_iter0", device_map="auto", trust_remote_code=True).eval()
54
+
55
+ system_str = "<|im_start|>system\n{message}<|im_end|>\n"
56
+ user_str = "<|im_start|>user\n{message}<|im_end|>\n"
57
+ assistant_str = "<|im_start|>assistant\n{message}<|im_end|>\n"
58
+
59
+ def construct_prompt(messages):
60
+ prompt = ""
61
+ for message in messages:
62
+ if message["from"] in ["human", "user"]:
63
+ prompt += user_str.format(
64
+ message=message["value"]
65
+ )
66
+ elif message["from"] in ["gpt", "assistant"]:
67
+ prompt += assistant_str.format(
68
+ message=message["value"]
69
+ )
70
+ elif message["from"] in ["system", "instruction"]:
71
+ prompt += system_str.format(
72
+ message=message["value"]
73
+ )
74
+ else:
75
+ raise ValueError(
76
+ f"Unknown message type: {message['from']}"
77
+ )
78
+ return prompt + "<|im_start|>assistant\n"
79
+
80
+ system = "You are a helpful assistant who will help the user to the best of their ability. If you don't know something, say \"I don't know\""
81
+ user = "Are you sentient?"
82
+
83
+ messages = [
84
+ {"from": "system", "value": system},
85
+ {"from": "user", "value": user},
86
+ ]
87
+
88
+ prompt = construct_prompt(messages)
89
+ inputs = tokenizer(prompt, return_tensors="pt")
90
+ inputs = inputs.to(model.device)
91
+ pred = model.generate(**inputs, max_length=4096, do_sample=True, top_k=50, top_p=0.99, temperature=0.9, num_return_sequences=1)
92
+ print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
93
+ ```
94
+
95
+ ## Other Information
96
+ Paper reference: [Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks](https://arxiv.org/abs/2401.02731)
97
+
98
+ [Original Paper repo](https://github.com/wuhy68/Parameter-Efficient-MoE)
99
+
100
+ [Forked repo with mistral support (sparsetral)](https://github.com/serp-ai/Parameter-Efficient-MoE)
101
+
102
+ If you are interested in faster inferencing, check out our [fork of vLLM](https://github.com/serp-ai/vllm) that adds sparsetral support