Update README.md
Browse files
README.md
CHANGED
@@ -24,4 +24,14 @@ Fine-tuning datasets for this model are based on [Stack Exchange Paired](https:/
|
|
24 |
|
25 |
**Traditional Fine-tuning:** [https://huggingface.co/datasets/lvwerra/stack-exchange-paired/tree/main/data/finetune](https://huggingface.co/datasets/lvwerra/stack-exchange-paired/tree/main/data/finetune)
|
26 |
|
27 |
-
**DPO Training:** [https://huggingface.co/datasets/lvwerra/stack-exchange-paired/tree/main/data/rl](https://huggingface.co/datasets/lvwerra/stack-exchange-paired/tree/main/data/rl)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
24 |
|
25 |
**Traditional Fine-tuning:** [https://huggingface.co/datasets/lvwerra/stack-exchange-paired/tree/main/data/finetune](https://huggingface.co/datasets/lvwerra/stack-exchange-paired/tree/main/data/finetune)
|
26 |
|
27 |
+
**DPO Training:** [https://huggingface.co/datasets/lvwerra/stack-exchange-paired/tree/main/data/rl](https://huggingface.co/datasets/lvwerra/stack-exchange-paired/tree/main/data/rl)
|
28 |
+
|
29 |
+
### Training Procedure
|
30 |
+
The model was first fine-tuned on the Stack Exchange question and answer pairs and then fine-tuned via the DPO training procedure using a Stack Exchange Reward Model.
|
31 |
+
It is trained to respond to prompts with the following template:
|
32 |
+
|
33 |
+
```
|
34 |
+
Question: <Query>
|
35 |
+
|
36 |
+
Answer: <Response>
|
37 |
+
```
|