|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- argilla/distilabel-intel-orca-dpo-pairs |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# Chikuma_10.7B - V2 |
|
|
|
This model is the DPO fine tune of [Chikuma_10.7B](https://huggingface.co/sethuiyer/Chikuma_10.7B) using [argilla/distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs) |
|
|
|
# Dataset |
|
Dataset: `/argilla/distilabel-intel-orca-dpo-pairs` |
|
|
|
The dataset was roughly ~3000 samples but they were high quality (according to the chosen_score). |
|
The following filters were applied to the original dataset: |
|
```python |
|
dataset = dataset.filter( |
|
lambda r: |
|
r["status"] != "tie" and |
|
r["chosen_score"] >= 8 and |
|
not r["in_gsm8k_train"] |
|
) |
|
``` |
|
|
|
# Chat Template |
|
I decided to go with a slight modification of ChatML. |
|
|
|
``` |
|
<|im_start|>GPT4 Correct system: |
|
{system} Always use <|end_of_turn|> when you want to end the answer. <|im_end|> |
|
<|im_start|>GPT4 Correct user: |
|
{user}<|im_end|> |
|
<|im_start|>GPT4 Correct Assistant: |
|
{asistant}<|im_end|> |
|
``` |
|
|
|
### Training Hardware |
|
|
|
I used 1 x A100 80GB in runpod for about 1.5 hours. |
|
|
|
## Usage |
|
|
|
```python |
|
# Format prompt |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
tokenizer = AutoTokenizer.from_pretrained(new_model) |
|
|
|
# Create pipeline |
|
pipeline = transformers.pipeline( |
|
"text-generation", |
|
model=new_model, |
|
tokenizer=tokenizer, |
|
device="cuda" |
|
) |
|
|
|
# Generate text |
|
|
|
message = [ |
|
{"role": "system", "content": "You are a helpful assistant chatbot. Always use <|end_of_turn|> when you want to end the answer."}, |
|
{"role": "user", "content": "What is large language model?"} |
|
] |
|
|
|
prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False) |
|
|
|
sequences = pipeline( |
|
prompt, |
|
do_sample=True, |
|
temperature=0.7, |
|
top_p=0.9, |
|
num_return_sequences=1, |
|
max_length=512, |
|
) |
|
print(sequences[0]['generated_text']) |
|
``` |
|
|
|
## Things in Pipeline: |
|
1. Manual Testing and Evaluation against GPT-4 on text-generation-webui across 45 sample complex prompts. |
|
2. Nous Benchmark |
|
3. GGUF Format |
|
4. Ollama Model (if model benchmarks are good) |
|
|
|
## Acknowledgements |
|
|
|
I'd like to thank the amazing open community and in particular: |
|
|
|
* The Intel team for publishing a great open dataset and show how well it worked in the first place |
|
* Teknium and NousResearch for their awesome work and models. |
|
* Maxime for sharing such great resources. |
|
* Argilla for publishing argilla/distilabel-intel-orca-dpo-pairs |
|
|
|
|
|
|