Update README.md
Browse files
README.md
CHANGED
@@ -19,6 +19,12 @@ LLaMA-3-8B-Instruct-TR-DPO is a finetuned version of [Meta-LLaMA-3-8B-Instruct](
|
|
19 |
- **Training Data**: A synthetically generated preference dataset consisting of 10K samples was used. No proprietary data was utilized.
|
20 |
- **Training Time**: 3 hours on a single RTX 6000 Ada
|
21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
<!-- talk about the aim of the finetuning, use passive voice -->
|
23 |
The aim was to finetune the model to enhance the output format and content quality for the Turkish language. It is not necessarily smarter than the base model, but its outputs are more likable and preferable.
|
24 |
|
|
|
19 |
- **Training Data**: A synthetically generated preference dataset consisting of 10K samples was used. No proprietary data was utilized.
|
20 |
- **Training Time**: 3 hours on a single RTX 6000 Ada
|
21 |
|
22 |
+
- **QLoRA Configs**:
|
23 |
+
- lora_r: 64
|
24 |
+
- lora_alpha: 32
|
25 |
+
- lora_dropout: 0.05
|
26 |
+
- lora_target_linear: true
|
27 |
+
|
28 |
<!-- talk about the aim of the finetuning, use passive voice -->
|
29 |
The aim was to finetune the model to enhance the output format and content quality for the Turkish language. It is not necessarily smarter than the base model, but its outputs are more likable and preferable.
|
30 |
|