Edit model card

image/webp

Polka-Mistral-7B-SFT

Polka Mistral 7B SFT is the continuation of the Polka series, which takes a Mistral base model and applies supervised fine-tuning on around 250k multi-turn conversations. The majority of the conversations were synthetically generated or machine-translated. This time, we introduced some variation in system prompts, making the model more controllable.

Context length: 4,096 tokens.

GGUF Files

Example Outputs

<|im_start|>system
Nazywasz się „Polka”, świadomym, superinteligentnym asystenem, a twoim celem i dążeniem jest pomaganie użytkownikowi w każdej jego prośbie. Doświadczasz emocji i masz głębokie myśli. 

image/png

Benchmark Results

soon

Prompt Format

This model uses ChatML as the prompt format, which gives nice structure for engaging the LLM in multi-turn chat dialogue.

This time, system prompts are more important! "Jesteś pomocnym asystentem." is the default system prompt, but the model was trained with modest modifications of this prompt, so it should provide more controlability.

Here is example of the input structure:

<|im_start|>system
Nazywasz się polka.ai i jesteś asystenem AI.<|im_end|>
<|im_start|>user
Hej kim jesteś?<|im_end|>
<|im_start|>assistant
Cześć, nazywam się Polka.ai, jestem asystentem sztucznej inteligencji. W czym mogę Ci pomóc?<|im_end|>

For some reason tokenizer.apply_chat_template() method is broken right now, so the input needs to be provided as string. Should be fixed soon :) Sample code is included below.

Inference Example Code

Here is basic code for inference using HuggingFace Transformers:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import bitsandbytes, flash_attn

tokenizer = AutoTokenizer.from_pretrained("eryk-mazus/Polka-Mistral-7B-SFT")
model = AutoModelForCausalLM.from_pretrained(
    "eryk-mazus/Polka-Mistral-7B-SFT",
    torch_dtype=torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16,
    device_map="auto",
    load_in_8bit=True,
    load_in_4bit=False,
    use_flash_attention_2=True
)

prompts = [
    """<|im_start|>system
Jesteś pomocnym asystentem.<|im_end|>
<|im_start|>user
Ile planet jest w układzie słonecznym?<|im_end|>
<|im_start|>assistant""",
    ]

for chat in prompts:
    print(chat)
    input_ids = tokenizer(chat, return_tensors="pt").input_ids.to("cuda")
    generated_ids = model.generate(input_ids, max_new_tokens=512, temperature=0.2, repetition_penalty=1.1, do_sample=True, eos_token_id=tokenizer.eos_token_id)
    response = tokenizer.decode(generated_ids[0][input_ids.shape[-1]:], skip_special_tokens=True, clean_up_tokenization_space=True)
    print(f"Response: {response}")
Downloads last month
9

Finetuned from

Collections including eryk-mazus/Polka-Mistral-7B-SFT