File size: 9,024 Bytes
05ce315
926fc3b
 
 
 
 
 
 
 
 
05ce315
 
 
 
 
 
926fc3b
 
 
 
 
 
 
 
 
 
 
 
 
 
05ce315
926fc3b
 
 
 
 
 
 
 
 
 
 
05ce315
 
24b0759
eedce5b
5b92a0e
05ce315
e19d359
 
05ce315
 
 
 
 
 
 
 
 
 
c3b326b
05ce315
 
 
738118c
e19d359
05ce315
 
 
 
 
c3b326b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
05ce315
13c7a96
cc9e363
05ce315
 
 
e19d359
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33f123f
e19d359
3a9bfdb
ac0853b
d713289
463ea77
ac0853b
d713289
51ce627
 
d713289
 
 
463ea77
d713289
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
---
model_creator: Nekochu
quantized_by: Nekochu
model_name: Llama-3.1 8B German ORPO 
pretty_name: Llama-3.1 8B German ORPO
model_type: llama3.1
prompt_template: >-
  Below is an instruction that describes a task. Write a response that
  appropriately completes the request. ### Instruction: {Instruction} {summary} ### input: {category} ### Response: {prompt}
library_name: peft
license: llama3.1
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
tags:
- llama-factory
- lora
datasets:
- mayflowergmbh/intel_orca_dpo_pairs_de
- LeoLM/OpenSchnabeltier
- LeoLM/German_Songs
- LeoLM/German_Poems
- bjoernp/ultrachat_de
- mayflowergmbh/ultra-chat_de
- mayflowergmbh/airoboros-3.0_de
- mayflowergmbh/booksum_de
- mayflowergmbh/dolphin_de
- mayflowergmbh/evol-instruct_de
- mayflowergmbh/openschnabeltier_de
- mayflowergmbh/alpaca-gpt4_de
- mayflowergmbh/dolly-15k_de
- mayflowergmbh/oasst_de
language:
- de
- en
pipeline_tag: text-generation
task_categories:
- question-answering
- text2text-generation
- conversational
inference: True
model-index:
- name: Llama-3.1-8B-German-ORPO
  results: []
---

- Fine-tuning of [Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) on german datasets. Same datasets used in [Nekochu/Llama-2-13B-German-ORPO](https://huggingface.co/Nekochu/Llama-2-13B-German-ORPO).
- I've (alway) kept LoRA `QLoRA_German-ORPO` so it can be applied to any *LLaMA-3.1-8B* fine-tuned model but may affect performance.
- Quants: exl2 [2.4bpw-h6](https://huggingface.co/Nekochu/Llama-3.1-8B-German-ORPO/tree/2.4bpw-h6), [4.25bpw-h6](https://huggingface.co/Nekochu/Llama-3.1-8B-German-ORPO/tree/4.25bpw-h6), [8.0bpw-h8](https://huggingface.co/Nekochu/Llama-3.1-8B-German-ORPO/tree/8.0bpw-h8) | [GGUF](https://huggingface.co/Nekochu/Llama-3.1-8B-German-ORPO/tree/gguf) Q4_K_M,IQ4_XS...

Oh, and I am not a GER speaker. ^^

<details>
  <summary>This training can be replicated using LLaMA-Factory. </summary>

Stage A: SFT
```
set CUDA_VISIBLE_DEVICES=0 && llamafactory-cli train --stage sft --do_train True --model_name_or_path meta-llama/Meta-Llama-3.1-8B-Instruct --preprocessing_num_workers 1 --finetuning_type lora --template alpaca --rope_scaling linear --flash_attn fa2 --dataset_dir data --dataset ultrachat_de,airoboros_de,booksum_de,dolphin_de,evol_instruct_de,openschnabeltier_de,alpaca-gpt4_de,dolly_15k_de,oasst_de,bjoernp_ultrachat_de,German_Poems,German_Songs,OpenSchnabeltier --cutoff_len 8192 --learning_rate 5e-05 --num_train_epochs 3.0 --max_samples 100000 --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --lr_scheduler_type cosine --max_grad_norm 1.0 --logging_steps 100 --save_steps 1000 --warmup_steps 1000 --neftune_noise_alpha 5 --optim adamw_8bit --packing True --neat_packing True --report_to none --output_dir saves\LLaMA3.1-8B-Chat\lora\Llama-3.1-8B-German --bf16 True --plot_loss True --ddp_timeout 180000000 --include_num_input_tokens_seen True --quantization_bit 4 --quantization_method bitsandbytes --lora_rank 32 --lora_alpha 64 --lora_dropout 0.15 --lora_target all --use_adam_mini True --create_new_adapter True
```

Stage B: Continued, `orpo`
```
set CUDA_VISIBLE_DEVICES=0 && llamafactory-cli train --stage dpo --do_train True --model_name_or_path meta-llama/Meta-Llama-3.1-8B-Instruct --preprocessing_num_workers 1 --finetuning_type lora --template alpaca --rope_scaling linear --flash_attn fa2 --dataset_dir data --dataset fix_orca_dpo_de --cutoff_len 4000 --learning_rate 5e-05 --num_train_epochs 1.0 --max_samples 100000 --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --lr_scheduler_type cosine --max_grad_norm 1.0 --logging_steps 10 --save_steps 1000 --warmup_steps 0 --neftune_noise_alpha 5 --optim adamw_8bit --packing True --report_to none --output_dir saves\LLaMA3.1-8B-Chat\lora\Llama-3.1-8B-German-ORPO --bf16 True --plot_loss True --ddp_timeout 180000000 --include_num_input_tokens_seen True --quantization_bit 4 --quantization_method bitsandbytes --lora_rank 32 --lora_alpha 64 --lora_dropout 0.35 --lora_target all --pref_beta 0.1 --pref_ftx 0 --pref_loss orpo --adapter_name_or_path saves\LLaMA3.1-8B-Chat\lora\Llama-3.1-8B-German
```


Average training time: 5d sft, 6h dpo

<details>
  <summary>dataset_info.json</summary>

`dataset_info.json`:
```json
  "oasst_de": {
    "hf_hub_url": "mayflowergmbh/oasst_de"
  },
  "dolly_15k_de": {
    "hf_hub_url": "mayflowergmbh/dolly-15k_de"
  },
  "alpaca-gpt4_de": {
    "hf_hub_url": "mayflowergmbh/alpaca-gpt4_de"
  },
  "openschnabeltier_de": {
    "hf_hub_url": "mayflowergmbh/openschnabeltier_de"
  },
  "evol_instruct_de": {
    "hf_hub_url": "mayflowergmbh/evol-instruct_de"
  },
  "dolphin_de": {
    "hf_hub_url": "mayflowergmbh/dolphin_de"
  },
  "booksum_de": {
    "hf_hub_url": "mayflowergmbh/booksum_de"
  },
  "airoboros_de": {
    "hf_hub_url": "mayflowergmbh/airoboros-3.0_de"
  },
  "ultrachat_de": {
    "hf_hub_url": "mayflowergmbh/ultra-chat_de"
  },
  "German_Songs": {
    "file_name": "German_Songs.json",
    "file_sha1": "3ec36066a19debd1b138020b293e05f21264c352",
    "columns": {
      "prompt": "prompt",
      "query": "analysis_prompt",
      "response": "song",
      "history": "analysis",
      "system": "topic"
    }
  },
  "German_Poems": {
    "file_name": "German_Poems.json",
    "file_sha1": "f0f4bbea3b8cbc378afb640f4ff4dcd11132263c",
    "columns": {
      "prompt": "prompt",
      "query": "topic", 
      "response": "poem"
    }
  },
  "bjoernp_ultrachat_de": {
    "file_name": "ultrachat_de.json",
    "file_sha1": "4e2b6dba1c387b3fa439c33ab35281403c39e973",
    "formatting": "sharegpt",
    "columns": {
      "messages": "conversations"
    },
    "tags": {
      "role_tag": "from",
      "content_tag": "value",
      "user_tag": "human",
      "assistant_tag": "gpt",
      "system_tag": "system"
    }
  },
  "OpenSchnabeltier": {
    "file_name": "OpenSchnabeltier.json",
    "columns": {
      "prompt": "instruction_de",
      "response": "output_de"
    }
  },
  "fix_orca_dpo_de": {
    "file_name": "fix_intel_orca_dpo_pairs_de.json",
    "ranking": true,
    "columns": {
      "prompt": "instruction",
      "query": "input",
      "chosen": "chosen",
      "rejected": "rejected"
    }
  }
}
```

Additionally, the locally converted `.json` dataset is in the branch [dataset-reformat](https://huggingface.co/Nekochu/Llama-3.1-8B-German-ORPO/tree/dataset-reformate).
</details>

</details>


<details>
  <summary>Output Examples</summary>

```
#Question:
Wie geht es in diesem Absatz weiter? Dann reibt sie eine Nadel auf einem Wattebausch, schiebt ihn dann auf einen Bleistift und wickelt einen Faden darum. Dann hält sie eine Schachtel mit einem Produkt hoch und gießt dann mehrere Flüssigkeiten in eine Schüssel. sie Wählen Sie Ihre Antwort aus: A. Fügt einen Topf hinzu und schüttelt das Produkt in einer Mühle. B. kneift den Faden, um eine Zigarette zu stylen, und geht dann weg. Dann taucht C. die Nadel in Tinte und zeichnet mit dem Bleistift ein Motiv auf ihr Bein, das sie am Ende mit einem Lappen abreibt. D. beginnt, ihre Haare zu stylen und schneidet sie mehrmals, bevor sie die Spitzen scheitelt, um die Frisur zu zeigen, die sie kreiert hat.


#Only Llama3 3.1 (wrong) - Llama3 Template: 
Die richtige Antwort ist B.

#Model SFT GER (wrong) - Alpaca Template: 
Es ist unklar, welche Handlung sie als nächstes kommt, da der Absatz zu Ende geht.

#Model SFT+orpo GER (correct) - Alpaca Template, linear RoPE Scaling: 
C. taucht die Nadel in Tinte und zeichnet mit dem Bleistift ein Motiv auf ihr Bein, das sie am Ende mit einem Lappen abreibt.

```

Note: Output from inference [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) (and exl2-8bpw). Source question: [mayflowergmbh/intel_orca_dpo_pairs_de](https://huggingface.co/datasets/mayflowergmbh/intel_orca_dpo_pairs_de)

</details>

<details>
  <summary>Eval English</summary>

[MMLU-Pro](https://github.com/chigkim/Ollama-MMLU-Pro)[*](https://pastebin.com/a8xRqXtg) (en):
| Model                           | Overall Accuracy | biology | business | chemistry | computer science | economics | engineering | health | history | law  | math  | philosophy | physics | psychology | other |
|----------------------------------|----------------------|---------|----------|-----------|------------------|-----------|-------------|--------|---------|------|-------|------------|---------|------------|-------|
| Llama-3.1-8B-German-ORPO-8.0bpw-h8-exl2 | 38.83                | 60.81   | 37.26    | 32.86     | 38.78            | 46.33     | 23.32       | 45.48  | 39.90   | 21.62 | 38.86 | 34.67      | 28.79   | 50.63      | 44.26 |
| Llama-3.1-8B-Instruct-exl2-8bpw-h8 | 46.16                | 63.74   | 49.68    | 36.93     | 48.29            | 55.81     | 28.59       | 52.81  | 45.67   | 30.79 | 45.08 | 40.48      | 39.03   | 60.90      | 48.38 |

Note: Lower on Benchmark for **English**, seems to be degraded as trade-off. Not frequently but the output repeats sentences (because of the wrong chat template).
</details>