serpdotai
/

sparsetral-16x7B-v2-SPIN_iter0

 ---
 license: apache-2.0
+datasets:
+- teknium/OpenHermes-2.5
+- jondurbin/truthy-dpo-v0.1
+- jondurbin/gutenberg-dpo-v0.1
+- argilla/dpo-mix-7k
+language:
+- en
 ---
+This model is [sparsetral-16x7B-v2](https://huggingface.co/serpdotai/sparsetral-16x7B-v2) further tuned utilizing [SPIN](https://arxiv.org/abs/2401.01335) on [OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5) mixed with traditional DPO samples. This is iteration_0, plan to keep making iterations until improvements stop.
+## Training
+- 8x A6000s
+- Base model is [sparsetral-16x7B-v2](https://huggingface.co/serpdotai/sparsetral-16x7B-v2)
+- [Forked version of unsloth](https://github.com/serp-ai/unsloth) for efficient training
+- Sequence Length: 4096
+- Effective batch size: 64
+- Learning Rate: 5e-7 with linear decay (0.1 warmup ratio)
+- Epochs: 2
+- 50k samples (~15k traditional dpo samples, rest SPIN)
+- QLoRA:
+  - 256 r and 256 alpha
+  - ```python
+    target_modules=[
+        "q_proj",
+        "k_proj",
+        "v_proj",
+        "o_proj",
+        "gate_proj",
+        "up_proj",
+        "down_proj",
+        "adapter_down",
+        "adapter_up",
+    ]
+    ```
+## Prompt Format
+```
+<|im_start|>system\n{message}<|im_end|>\n<|im_start|>user\n{message}<|im_end|>\n<|im_start|>assistant\n
+```
+## Usage
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("serpdotai/sparsetral-16x7B-v2-SPIN_iter0", trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained("serpdotai/sparsetral-16x7B-v2-SPIN_iter0", device_map="auto", trust_remote_code=True).eval()
+system_str = "<|im_start|>system\n{message}<|im_end|>\n"
+user_str = "<|im_start|>user\n{message}<|im_end|>\n"
+assistant_str = "<|im_start|>assistant\n{message}<|im_end|>\n"
+def construct_prompt(messages):
+    prompt = ""
+    for message in messages:
+        if message["from"] in ["human", "user"]:
+            prompt += user_str.format(
+                message=message["value"]
+            )
+        elif message["from"] in ["gpt", "assistant"]:
+            prompt += assistant_str.format(
+                message=message["value"]
+            )
+        elif message["from"] in ["system", "instruction"]:
+            prompt += system_str.format(
+                message=message["value"]
+            )
+        else:
+            raise ValueError(
+                f"Unknown message type: {message['from']}"
+            )
+    return prompt + "<|im_start|>assistant\n"
+system = "You are a helpful assistant who will help the user to the best of their ability. If you don't know something, say \"I don't know\""
+user = "Are you sentient?"
+messages = [
+    {"from": "system", "value": system},
+    {"from": "user", "value": user},
+]
+prompt = construct_prompt(messages)
+inputs = tokenizer(prompt, return_tensors="pt")
+inputs = inputs.to(model.device)
+pred = model.generate(**inputs, max_length=4096, do_sample=True, top_k=50, top_p=0.99, temperature=0.9, num_return_sequences=1)
+print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
+```
+## Other Information
+Paper reference: [Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks](https://arxiv.org/abs/2401.02731)
+[Original Paper repo](https://github.com/wuhy68/Parameter-Efficient-MoE)
+[Forked repo with mistral support (sparsetral)](https://github.com/serp-ai/Parameter-Efficient-MoE)
+If you are interested in faster inferencing, check out our [fork of vLLM](https://github.com/serp-ai/vllm) that adds sparsetral support