Text Generation
Transformers
Safetensors
English
sparsetral
conversational
custom_code
Inference Endpoints
4 papers
Edit model card

This model is sparsetral-16x7B-v2 further tuned utilizing SPIN on OpenHermes-2.5 mixed with traditional DPO samples. This is iteration_1, temporarily pausing further training runs in favor of utilizing DoRA over LoRA. May also start from the beginning with v3 for proper chat token support, also debating adding function tokens + function calling. If you have any tasks that Sparsetral has been weak at, feel free to send us some prompts/chats + desired completions and we will see about making sure your task is supported!

Kuru~ Kuru~ Kuru~ Kuru~

Training

  • 8x A6000s
  • Base model is sparsetral-16x7B-v2-SPIN_iter0
  • Forked version of unsloth for efficient training
  • Sequence Length: 4096
  • Effective batch size: 64
  • Learning Rate: 5e-7 with linear decay (0.1 warmup ratio)
  • Epochs: 2
  • 100k samples (50k new SPIN + 50k from iter_0)
  • QLoRA:
    • 256 r and 256 alpha
    • target_modules=[
          "q_proj",
          "k_proj",
          "v_proj",
          "o_proj",
          "gate_proj",
          "up_proj",
          "down_proj",
          "adapter_down",
          "adapter_up",
      ]
      

Prompt Format

<|im_start|>system\n{message}<|im_end|>\n<|im_start|>user\n{message}<|im_end|>\n<|im_start|>assistant\n

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("serpdotai/sparsetral-16x7B-v2-SPIN_iter0", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("serpdotai/sparsetral-16x7B-v2-SPIN_iter0", device_map="auto", trust_remote_code=True).eval()

system_str = "<|im_start|>system\n{message}<|im_end|>\n"
user_str = "<|im_start|>user\n{message}<|im_end|>\n"
assistant_str = "<|im_start|>assistant\n{message}<|im_end|>\n"

def construct_prompt(messages):
    prompt = ""
    for message in messages:
        if message["from"] in ["human", "user"]:
            prompt += user_str.format(
                message=message["value"]
            )
        elif message["from"] in ["gpt", "assistant"]:
            prompt += assistant_str.format(
                message=message["value"]
            )
        elif message["from"] in ["system", "instruction"]:
            prompt += system_str.format(
                message=message["value"]
            )
        else:
            raise ValueError(
                f"Unknown message type: {message['from']}"
            )
    return prompt + "<|im_start|>assistant\n"

system = "You are a helpful assistant who will help the user to the best of their ability. If you don't know something, say \"I don't know\""
user = "Are you sentient?"

messages = [
    {"from": "system", "value": system},
    {"from": "user", "value": user},
]

prompt = construct_prompt(messages)
inputs = tokenizer(prompt, return_tensors="pt")
inputs = inputs.to(model.device)
pred = model.generate(**inputs, max_length=4096, do_sample=True, top_k=50, top_p=0.99, temperature=0.9, num_return_sequences=1)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

Other Information

Paper reference: Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks

Original Paper repo

Forked repo with mistral support (sparsetral)

If you are interested in faster inferencing, check out our fork of vLLM that adds sparsetral support

Downloads last month
3
Safetensors
Model size
9.39B params
Tensor type
BF16
·

Datasets used to train serpdotai/sparsetral-16x7B-v2-SPIN_iter1