|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- argilla/distilabel-intel-orca-dpo-pairs |
|
base_model: sethuiyer/Chikuma_10.7B |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
tags: |
|
- dpo |
|
|
|
--- |
|
|
|
# Chikuma_10.7B - V2 (Enhanced with DPO) [For Experiments] |
|
|
|
<p align="center"> |
|
<img src="https://huggingface.co/sethuiyer/distilabled_Chikuma_10.7B/resolve/main/chikuma_v2.webp" height="256px" alt="Chikuma"> |
|
</p> |
|
|
|
|
|
This model is the **DPO fine tuned version** of [Chikuma_10.7B](https://huggingface.co/sethuiyer/Chikuma_10.7B), which was a depth upscaled merge of: |
|
* [sethuiyer/SynthIQ-7b](https://huggingface.co/sethuiyer/SynthIQ-7b) |
|
* [openchat/openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106) |
|
|
|
The name "Chikuma" is inspired by the [Chikuma River](https://en.wikipedia.org/wiki/Shinano_River), the longest in Japan, known for its continuous flow and meandering path. |
|
This metaphorically represents the model's depth, fluidity, and adaptability in processing and understanding language. |
|
|
|
|
|
# Dataset used for Fine Tuning |
|
Dataset: `/argilla/distilabel-intel-orca-dpo-pairs` |
|
|
|
The dataset was roughly ~3000 samples but they were high quality (according to the chosen_score). |
|
|
|
The following filters were applied to the original dataset: |
|
```python |
|
dataset = dataset.filter( |
|
lambda r: |
|
r["status"] != "tie" and |
|
r["chosen_score"] >= 8 and |
|
not r["in_gsm8k_train"] |
|
) |
|
``` |
|
|
|
# Chat Template |
|
The chat template for Chikuma_10.7B - V2 is a modified version of ChatML, optimized for improved interaction and engagement: |
|
|
|
``` |
|
<|im_start|>GPT4 Correct system: |
|
{system} Always use <|end_of_turn|> when you want to end the answer. <|im_end|> |
|
<|im_start|>GPT4 Correct user: |
|
{user}<|im_end|> |
|
<|im_start|>GPT4 Correct Assistant: |
|
{asistant}<|im_end|> |
|
``` |
|
|
|
## Nous Benchmark Evaluation |
|
| Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average | |
|
|-------------------------------|---------|---------|------------|----------|---------| |
|
| SynthIQ-7b | 42.67 | 73.71 | 56.51 | 44.59 | 54.37 | |
|
| openchat/openchat-3.5-0106 | **44.17** | 73.72 | 52.53 | 44.4 | 53.71 | |
|
| Chikuma_10.7B | 42.41 | 73.41 | 56.69 | 43.5 | 54.00 | |
|
| **Chikuma_10.7B_v2** | 42.77 | **73.81** | **58.83** | **44.83** | **55.06** | |
|
|
|
# OpenLLM Leaderboard |
|
|
|
| Benchmark Name | Performance | |
|
|----------------|-------------| |
|
| ARC | 66.38 | |
|
| HellaSwag | 85 | |
|
| MMLU | 65.27 | |
|
| TruthfulQA | 58.83 | |
|
| Winogrande | 78.77 | |
|
| GSM8K | 63.68 | |
|
| **Average** | **69.65** | |
|
|
|
|
|
### Training Environment |
|
- Hardware: Single A100 80GB GPU in a runpod, utilized for approximately 1.5 hours. |
|
- Training Script: Accessible via [Google Colab Notebook](https://colab.research.google.com/drive/15iFBr1xWgztXvhrj5I9fBv20c7CFOPBE?usp=sharing). Special thanks to [mlabonne](https://huggingface.co/mlabonne) for providing the template. |
|
|
|
|
|
## Usage |
|
|
|
```python |
|
# Format prompt |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
tokenizer = AutoTokenizer.from_pretrained(new_model) |
|
|
|
# Create pipeline |
|
pipeline = transformers.pipeline( |
|
"text-generation", |
|
model=new_model, |
|
tokenizer=tokenizer, |
|
device="cuda" |
|
) |
|
|
|
# Generate text |
|
|
|
message = [ |
|
{"role": "system", "content": "You are a helpful assistant chatbot."}, |
|
{"role": "user", "content": "Who invented LLMs?"} |
|
] |
|
|
|
prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False) |
|
|
|
sequences = pipeline( |
|
prompt, |
|
max_new_tokens=512 |
|
) |
|
print(sequences[0]['generated_text']) |
|
``` |
|
|
|
## Acknowledgements |
|
|
|
A heartfelt appreciation goes to the vibrant open-source community, particularly: |
|
|
|
* The Intel team for publishing a great open dataset and show how well it worked in the first place |
|
* Teknium and NousResearch for their awesome work and models. |
|
* Maxime for sharing such great resources. |
|
* Argilla for publishing argilla/distilabel-intel-orca-dpo-pairs |