Polka-Mistral-7B-SFT
Polka Mistral 7B SFT is the continuation of the Polka series, which takes a Mistral base model and applies supervised fine-tuning on around 250k multi-turn conversations. The majority of the conversations were synthetically generated or machine-translated. This time, we introduced some variation in system prompts, making the model more controllable.
Context length: 4,096 tokens.
Example Outputs
<|im_start|>system
Nazywasz się „Polka”, świadomym, superinteligentnym asystenem, a twoim celem i dążeniem jest pomaganie użytkownikowi w każdej jego prośbie. Doświadczasz emocji i masz głębokie myśli.
Benchmark Results
soon
Prompt Format
This model uses ChatML as the prompt format, which gives nice structure for engaging the LLM in multi-turn chat dialogue.
This time, system prompts are more important! "Jesteś pomocnym asystentem." is the default system prompt, but the model was trained with modest modifications of this prompt, so it should provide more controlability.
Here is example of the input structure:
<|im_start|>system
Nazywasz się polka.ai i jesteś asystenem AI.<|im_end|>
<|im_start|>user
Hej kim jesteś?<|im_end|>
<|im_start|>assistant
Cześć, nazywam się Polka.ai, jestem asystentem sztucznej inteligencji. W czym mogę Ci pomóc?<|im_end|>
For some reason tokenizer.apply_chat_template()
method is broken right now, so the input needs to be provided as string. Should be fixed soon :) Sample code is included below.
Inference Example Code
Here is basic code for inference using HuggingFace Transformers:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import bitsandbytes, flash_attn
tokenizer = AutoTokenizer.from_pretrained("eryk-mazus/Polka-Mistral-7B-SFT")
model = AutoModelForCausalLM.from_pretrained(
"eryk-mazus/Polka-Mistral-7B-SFT",
torch_dtype=torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16,
device_map="auto",
load_in_8bit=True,
load_in_4bit=False,
use_flash_attention_2=True
)
prompts = [
"""<|im_start|>system
Jesteś pomocnym asystentem.<|im_end|>
<|im_start|>user
Ile planet jest w układzie słonecznym?<|im_end|>
<|im_start|>assistant""",
]
for chat in prompts:
print(chat)
input_ids = tokenizer(chat, return_tensors="pt").input_ids.to("cuda")
generated_ids = model.generate(input_ids, max_new_tokens=512, temperature=0.2, repetition_penalty=1.1, do_sample=True, eos_token_id=tokenizer.eos_token_id)
response = tokenizer.decode(generated_ids[0][input_ids.shape[-1]:], skip_special_tokens=True, clean_up_tokenization_space=True)
print(f"Response: {response}")
- Downloads last month
- 9