Upload README.md
Browse files
README.md
CHANGED
@@ -174,6 +174,8 @@ Half of the samples was generated by this model where prompts contained the adve
|
|
174 |
|
175 |
[KTO](https://arxiv.org/abs/2402.01306) trainer from [Hugging Face TRL library](https://huggingface.co/docs/trl/en/kto_trainer) was employed for performing preference alignment. The LoRA adapter from the previous training stages was merged into the model, and a new LoRA adapter was created for the KTO training. The quantized base model serves as a reference.
|
176 |
|
|
|
|
|
177 |
#### QLoRa adapter configuration
|
178 |
|
179 |
- Rank: 16
|
@@ -210,7 +212,7 @@ The model's performance in Adventure Mode has improved substantially. The writin
|
|
210 |

|
211 |

|
212 |

|
213 |
-

|
215 |
|
216 |
|
|
|
174 |
|
175 |
[KTO](https://arxiv.org/abs/2402.01306) trainer from [Hugging Face TRL library](https://huggingface.co/docs/trl/en/kto_trainer) was employed for performing preference alignment. The LoRA adapter from the previous training stages was merged into the model, and a new LoRA adapter was created for the KTO training. The quantized base model serves as a reference.
|
176 |
|
177 |
+
During the alignment, the model was encouraged to respect player's actions and agency, construct a coherent narrative, and use evocative language to describe the world and the outcome of the player's actions.
|
178 |
+
|
179 |
#### QLoRa adapter configuration
|
180 |
|
181 |
- Rank: 16
|
|
|
212 |

|
213 |

|
214 |

|
215 |
+

|
216 |

|
217 |
|
218 |
|