metadata

license: apache-2.0
datasets:
  - argilla/distilabel-intel-orca-dpo-pairs
library_name: transformers
pipeline_tag: text-generation

Chikuma_10.7B - V2

This model is the DPO fine tune of Chikuma_10.7B using argilla/distilabel-intel-orca-dpo-pairs

Dataset

Dataset: /argilla/distilabel-intel-orca-dpo-pairs

The dataset was roughly ~3000 samples but they were high quality (according to the chosen_score).
The following filters were applied to the original dataset:

dataset = dataset.filter(
    lambda r:
        r["status"] != "tie" and
        r["chosen_score"] >= 8 and
        not r["in_gsm8k_train"]
)

Chat Template

I decided to go with a slight modification of ChatML.

<|im_start|>GPT4 Correct system:
{system} Always use <|end_of_turn|> when you want to end the answer. <|im_end|>
<|im_start|>GPT4 Correct user:
{user}<|im_end|>
<|im_start|>GPT4 Correct Assistant:
{asistant}<|im_end|>

Training Hardware

I used 1 x A100 80GB in runpod for about 1.5 hours.

Usage

# Format prompt
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(new_model)

# Create pipeline
pipeline = transformers.pipeline(
    "text-generation",
    model=new_model,
    tokenizer=tokenizer,
    device="cuda"
)

# Generate text

message = [
    {"role": "system", "content": "You are a helpful assistant chatbot. Always use <|end_of_turn|> when you want to end the answer."},
    {"role": "user", "content": "What is large language model?"}
]

prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)

sequences = pipeline(
    prompt,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    num_return_sequences=1,
    max_length=512,
)
print(sequences[0]['generated_text'])

Things in Pipeline:

Manual Testing and Evaluation against GPT-4 on text-generation-webui across 45 sample complex prompts.
Nous Benchmark
GGUF Format
Ollama Model (if model benchmarks are good)

Acknowledgements

I'd like to thank the amazing open community and in particular:

The Intel team for publishing a great open dataset and show how well it worked in the first place
Teknium and NousResearch for their awesome work and models.
Maxime for sharing such great resources.
Argilla for publishing argilla/distilabel-intel-orca-dpo-pairs