Text Generation
Transformers
Safetensors
English
sparsetral
conversational
custom_code
Inference Endpoints
francislabounty commited on
Commit
ae9364a
1 Parent(s): eea4fdd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +94 -0
README.md CHANGED
@@ -1,3 +1,97 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ datasets:
4
+ - teknium/OpenHermes-2.5
5
+ - jondurbin/truthy-dpo-v0.1
6
+ - jondurbin/gutenberg-dpo-v0.1
7
+ - argilla/dpo-mix-7k
8
+ language:
9
+ - en
10
  ---
11
+ This model is [sparsetral-16x7B-v2](https://huggingface.co/serpdotai/sparsetral-16x7B-v2) further tuned utilizing [SPIN](https://arxiv.org/abs/2401.01335) on [OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5) mixed with traditional DPO samples. This is iteration_0, plan to keep making iterations until improvements stop.
12
+
13
+ ## Training
14
+ - 8x A6000s
15
+ - Base model is [sparsetral-16x7B-v2](https://huggingface.co/serpdotai/sparsetral-16x7B-v2)
16
+ - [Forked version of unsloth](https://github.com/serp-ai/unsloth) for efficient training
17
+ - Sequence Length: 4096
18
+ - Effective batch size: 64
19
+ - Learning Rate: 5e-7 with linear decay (0.1 warmup ratio)
20
+ - Epochs: 2
21
+ - 50k samples (~15k traditional dpo samples, rest SPIN)
22
+ - QLoRA:
23
+ - 256 r and 256 alpha
24
+ - ```python
25
+ target_modules=[
26
+ "q_proj",
27
+ "k_proj",
28
+ "v_proj",
29
+ "o_proj",
30
+ "gate_proj",
31
+ "up_proj",
32
+ "down_proj",
33
+ "adapter_down",
34
+ "adapter_up",
35
+ ]
36
+ ```
37
+
38
+ ## Prompt Format
39
+ ```
40
+ <|im_start|>system\n{message}<|im_end|>\n<|im_start|>user\n{message}<|im_end|>\n<|im_start|>assistant\n
41
+ ```
42
+
43
+ ## Usage
44
+ ```python
45
+ from transformers import AutoModelForCausalLM, AutoTokenizer
46
+
47
+ tokenizer = AutoTokenizer.from_pretrained("serpdotai/sparsetral-16x7B-v2-SPIN_iter0", trust_remote_code=True)
48
+ model = AutoModelForCausalLM.from_pretrained("serpdotai/sparsetral-16x7B-v2-SPIN_iter0", device_map="auto", trust_remote_code=True).eval()
49
+
50
+ system_str = "<|im_start|>system\n{message}<|im_end|>\n"
51
+ user_str = "<|im_start|>user\n{message}<|im_end|>\n"
52
+ assistant_str = "<|im_start|>assistant\n{message}<|im_end|>\n"
53
+
54
+ def construct_prompt(messages):
55
+ prompt = ""
56
+ for message in messages:
57
+ if message["from"] in ["human", "user"]:
58
+ prompt += user_str.format(
59
+ message=message["value"]
60
+ )
61
+ elif message["from"] in ["gpt", "assistant"]:
62
+ prompt += assistant_str.format(
63
+ message=message["value"]
64
+ )
65
+ elif message["from"] in ["system", "instruction"]:
66
+ prompt += system_str.format(
67
+ message=message["value"]
68
+ )
69
+ else:
70
+ raise ValueError(
71
+ f"Unknown message type: {message['from']}"
72
+ )
73
+ return prompt + "<|im_start|>assistant\n"
74
+
75
+ system = "You are a helpful assistant who will help the user to the best of their ability. If you don't know something, say \"I don't know\""
76
+ user = "Are you sentient?"
77
+
78
+ messages = [
79
+ {"from": "system", "value": system},
80
+ {"from": "user", "value": user},
81
+ ]
82
+
83
+ prompt = construct_prompt(messages)
84
+ inputs = tokenizer(prompt, return_tensors="pt")
85
+ inputs = inputs.to(model.device)
86
+ pred = model.generate(**inputs, max_length=4096, do_sample=True, top_k=50, top_p=0.99, temperature=0.9, num_return_sequences=1)
87
+ print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
88
+ ```
89
+
90
+ ## Other Information
91
+ Paper reference: [Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks](https://arxiv.org/abs/2401.02731)
92
+
93
+ [Original Paper repo](https://github.com/wuhy68/Parameter-Efficient-MoE)
94
+
95
+ [Forked repo with mistral support (sparsetral)](https://github.com/serp-ai/Parameter-Efficient-MoE)
96
+
97
+ If you are interested in faster inferencing, check out our [fork of vLLM](https://github.com/serp-ai/vllm) that adds sparsetral support