Metin
/

LLaMA-3-8B-Instruct-TR-DPO

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Metin commited on May 19

Commit

01d951a

•

1 Parent(s): 2f014c7

Update README.md

Files changed (1) hide show

README.md +6 -0

README.md CHANGED Viewed

@@ -19,6 +19,12 @@ LLaMA-3-8B-Instruct-TR-DPO is a finetuned version of [Meta-LLaMA-3-8B-Instruct](
 - **Training Data**: A synthetically generated preference dataset consisting of 10K samples was used. No proprietary data was utilized.
 - **Training Time**: 3 hours on a single RTX 6000 Ada
 <!-- talk about the aim of the finetuning, use passive voice -->
 The aim was to finetune the model to enhance the output format and content quality for the Turkish language. It is not necessarily smarter than the base model, but its outputs are more likable and preferable.

 - **Training Data**: A synthetically generated preference dataset consisting of 10K samples was used. No proprietary data was utilized.
 - **Training Time**: 3 hours on a single RTX 6000 Ada
+- **QLoRA Configs**:
+  - lora_r: 64
+  - lora_alpha: 32
+  - lora_dropout: 0.05
+  - lora_target_linear: true
 <!-- talk about the aim of the finetuning, use passive voice -->
 The aim was to finetune the model to enhance the output format and content quality for the Turkish language. It is not necessarily smarter than the base model, but its outputs are more likable and preferable.