--- license: apache-2.0 datasets: - teknium/OpenHermes-2.5 - jondurbin/truthy-dpo-v0.1 - jondurbin/gutenberg-dpo-v0.1 - argilla/dpo-mix-7k language: - en --- This model is [sparsetral-16x7B-v2](https://huggingface.co/serpdotai/sparsetral-16x7B-v2) further tuned utilizing [SPIN](https://arxiv.org/abs/2401.01335) on [OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5) mixed with traditional DPO samples. This is iteration_0, plan to keep making iterations until improvements stop. ## Training - 8x A6000s - Base model is [sparsetral-16x7B-v2](https://huggingface.co/serpdotai/sparsetral-16x7B-v2) - [Forked version of unsloth](https://github.com/serp-ai/unsloth) for efficient training - Sequence Length: 4096 - Effective batch size: 64 - Learning Rate: 5e-7 with linear decay (0.1 warmup ratio) - Epochs: 2 - 50k samples (~15k traditional dpo samples, rest SPIN) - QLoRA: - 256 r and 256 alpha - ```python target_modules=[ "q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", "adapter_down", "adapter_up", ] ``` ## Prompt Format ``` <|im_start|>system\n{message}<|im_end|>\n<|im_start|>user\n{message}<|im_end|>\n<|im_start|>assistant\n ``` ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("serpdotai/sparsetral-16x7B-v2-SPIN_iter0", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("serpdotai/sparsetral-16x7B-v2-SPIN_iter0", device_map="auto", trust_remote_code=True).eval() system_str = "<|im_start|>system\n{message}<|im_end|>\n" user_str = "<|im_start|>user\n{message}<|im_end|>\n" assistant_str = "<|im_start|>assistant\n{message}<|im_end|>\n" def construct_prompt(messages): prompt = "" for message in messages: if message["from"] in ["human", "user"]: prompt += user_str.format( message=message["value"] ) elif message["from"] in ["gpt", "assistant"]: prompt += assistant_str.format( message=message["value"] ) elif message["from"] in ["system", "instruction"]: prompt += system_str.format( message=message["value"] ) else: raise ValueError( f"Unknown message type: {message['from']}" ) return prompt + "<|im_start|>assistant\n" system = "You are a helpful assistant who will help the user to the best of their ability. If you don't know something, say \"I don't know\"" user = "Are you sentient?" messages = [ {"from": "system", "value": system}, {"from": "user", "value": user}, ] prompt = construct_prompt(messages) inputs = tokenizer(prompt, return_tensors="pt") inputs = inputs.to(model.device) pred = model.generate(**inputs, max_length=4096, do_sample=True, top_k=50, top_p=0.99, temperature=0.9, num_return_sequences=1) print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True)) ``` ## Other Information Paper reference: [Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks](https://arxiv.org/abs/2401.02731) [Original Paper repo](https://github.com/wuhy68/Parameter-Efficient-MoE) [Forked repo with mistral support (sparsetral)](https://github.com/serp-ai/Parameter-Efficient-MoE) If you are interested in faster inferencing, check out our [fork of vLLM](https://github.com/serp-ai/vllm) that adds sparsetral support