|
--- |
|
license: cc-by-nc-4.0 |
|
model-index: |
|
- name: Kunoichi-DPO-7B |
|
results: |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: AI2 Reasoning Challenge (25-Shot) |
|
type: ai2_arc |
|
config: ARC-Challenge |
|
split: test |
|
args: |
|
num_few_shot: 25 |
|
metrics: |
|
- type: acc_norm |
|
value: 69.62 |
|
name: normalized accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=SanjiWatsuki/Kunoichi-DPO-7B |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: HellaSwag (10-Shot) |
|
type: hellaswag |
|
split: validation |
|
args: |
|
num_few_shot: 10 |
|
metrics: |
|
- type: acc_norm |
|
value: 87.14 |
|
name: normalized accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=SanjiWatsuki/Kunoichi-DPO-7B |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MMLU (5-Shot) |
|
type: cais/mmlu |
|
config: all |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 64.79 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=SanjiWatsuki/Kunoichi-DPO-7B |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: TruthfulQA (0-shot) |
|
type: truthful_qa |
|
config: multiple_choice |
|
split: validation |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: mc2 |
|
value: 67.31 |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=SanjiWatsuki/Kunoichi-DPO-7B |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: Winogrande (5-shot) |
|
type: winogrande |
|
config: winogrande_xl |
|
split: validation |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 80.58 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=SanjiWatsuki/Kunoichi-DPO-7B |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: GSM8k (5-shot) |
|
type: gsm8k |
|
config: main |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 63.99 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=SanjiWatsuki/Kunoichi-DPO-7B |
|
name: Open LLM Leaderboard |
|
--- |
|
|
|
![image/png](https://huggingface.co/SanjiWatsuki/Kunoichi-DPO-7B/resolve/main/assets/kunoichi2.png) |
|
|
|
<!-- description start --> |
|
## Description |
|
|
|
This repository hosts **Kunoichi-DPO-7B**, a DPO finetune using Intel's Orca pairs with the Alpaca template on Kunoichi-7B. This model is targeted at general use. In my testing, it has stronger reasoning and instruction following capabilities than Kunoichi-7B but it may be worse for roleplaying purposes due to the alignment from the Orca dataset. |
|
|
|
This model is undergoing benchmark testing and I will update the model page with the finalized results. |
|
|
|
| Model | MT Bench | EQ Bench | MMLU | Logic Test | |
|
|----------------------|----------|----------|---------|-------------| |
|
| GPT-4-Turbo | 9.32 | - | - | - | |
|
| GPT-4 | 8.99 | 62.52 | 86.4 | 0.86 | |
|
| **Kunoichi-DPO-7B** | **8.29** | **41.60** | - | **0.59** | |
|
| **Kunoichi-7B** | **8.14** | **44.32** | **64.9** | **0.58** | |
|
| Starling-7B | 8.09 | - | 63.9 | 0.51 | |
|
| Claude-2 | 8.06 | 52.14 | 78.5 | - | |
|
| Silicon-Maid-7B | 7.96 | 40.44 | 64.7 | 0.54 | |
|
| Loyal-Macaroni-Maid-7B | 7.95 | 38.66 | 64.9 | 0.57 | |
|
| GPT-3.5-Turbo | 7.94 | 50.28 | 70 | 0.57 | |
|
| Claude-1 | 7.9 | - | 77 | - | |
|
| Openchat-3.5 | 7.81 | 37.08 | 64.3 | 0.39 | |
|
| Dolphin-2.6-DPO | 7.74 | 42.88 | 61.9 | 0.53 | |
|
| Zephyr-7B-beta | 7.34 | 38.71 | 61.4 | 0.30 | |
|
| Llama-2-70b-chat-hf | 6.86 | 51.56 | 63 | - | |
|
| Neural-chat-7b-v3-1 | 6.84 | 43.61 | 62.4 | 0.30 | |
|
|
|
| Model | Average | AGIEval | GPT4All | TruthfulQA | Bigbench | |
|
|---|---:|---:|---:|---:|---:| |
|
| **Kunoichi-DPO-7B**|**58.4**| 45.08 | 74| 66.99| 47.52| |
|
| [Kunoichi-7B](https://huggingface.co/SanjiWatsuki/Kunoichi-7B)|57.54| 44.99| 74.86| 63.72| 46.58| |
|
| [OpenPipe/mistral-ft-optimized-1218](https://huggingface.co/OpenPipe/mistral-ft-optimized-1218)| 56.85 | 44.74 | 75.6 | 59.89 | 47.17 | |
|
| [Silicon-Maid-7B](https://huggingface.co/SanjiWatsuki/Silicon-Maid-7B) | 56.45| 44.74| 74.26| 61.5| 45.32| |
|
| [mlabonne/NeuralHermes-2.5-Mistral-7B](https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B) | 53.51 | 43.67 | 73.24 | 55.37 | 41.76 | |
|
| [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) | 52.42 | 42.75 | 72.99 | 52.99 | 40.94 | |
|
| [openchat/openchat_3.5](https://huggingface.co/openchat/openchat_3.5) | 51.34 | 42.67 | 72.92 | 47.27 | 42.51 | |
|
| [berkeley-nest/Starling-LM-7B-alpha](https://huggingface.co/berkeley-nest/Starling-LM-7B-alpha) | 51.16 | 42.06 | 72.72 | 47.33 | 42.53 | |
|
| [HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) | 50.99 | 37.33 | 71.83 | 55.1 | 39.7 | |
|
|
|
The model is intended to be used with up to an 8k context window. Using a NTK RoPE alpha of 2.6, the model can be used experimentally up to a 16k context window. |
|
|
|
<!-- description end --> |
|
<!-- prompt-template start --> |
|
## Prompt template: Custom format, or Alpaca |
|
|
|
### Alpaca: |
|
``` |
|
Below is an instruction that describes a task. Write a response that appropriately completes the request. |
|
|
|
### Instruction: |
|
{prompt} |
|
|
|
### Response: |
|
``` |
|
|
|
### SillyTavern format: |
|
I found the best SillyTavern results from using the Noromaid template. |
|
|
|
SillyTavern config files: [Context](https://files.catbox.moe/ifmhai.json), [Instruct](https://files.catbox.moe/ttw1l9.json). |
|
|
|
Additionally, here is my highly recommended [Text Completion preset](https://huggingface.co/SanjiWatsuki/Loyal-Macaroni-Maid-7B/blob/main/Characters/MinP.json). You can tweak this by adjusting temperature up or dropping min p to boost creativity or raise min p to increase stability. You shouldn't need to touch anything else! |
|
|
|
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) |
|
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_SanjiWatsuki__Kunoichi-DPO-7B) |
|
|
|
| Metric |Value| |
|
|---------------------------------|----:| |
|
|Avg. |72.24| |
|
|AI2 Reasoning Challenge (25-Shot)|69.62| |
|
|HellaSwag (10-Shot) |87.14| |
|
|MMLU (5-Shot) |64.79| |
|
|TruthfulQA (0-shot) |67.31| |
|
|Winogrande (5-shot) |80.58| |
|
|GSM8k (5-shot) |63.99| |
|
|
|
|