Felladrin
/

Minueza-32M-Chat

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Felladrin commited on Mar 1, 2024

Commit

fddb2c5

·

verified ·

1 Parent(s): aba0753

Update README.md

Files changed (1) hide show

README.md +27 -0

README.md CHANGED Viewed

@@ -155,3 +155,30 @@ output = generate(
 print(output[0]["generated_text"])
 ```

 print(output[0]["generated_text"])
 ```
+## How it was trained
+This model was trained with [SFT Trainer](https://huggingface.co/docs/trl/main/en/sft_trainer) and [DPO Trainer](https://huggingface.co/docs/trl/main/en/dpo_trainer), in several sessions, using the following settings:
+For Supervised Fine-Tuning:
+| Hyperparameter              | Value                                         |
+| :-------------------------- | :-------------------------------------------- |
+| learning_rate               | 2e-5                                          |
+| total_train_batch_size      | 24                                            |
+| max_seq_length              | 2048                                          |
+| weight_decay                | 0                                             |
+| warmup_ratio                | 0.02                                          |
+For Direct Preference Optimization:
+| Hyperparameter              | Value                                         |
+| :-------------------------- | :-------------------------------------------- |
+| learning_rate               | 7.5e-7                                        |
+| total_train_batch_size      | 6                                             |
+| max_length                  | 2048                                          |
+| max_prompt_length           | 1536                                          |
+| max_steps                   | 200                                           |
+| weight_decay                | 0                                             |
+| warmup_ratio                | 0.02                                          |
+| beta                        | 0.1                                           |