kashif HF staff commited on
Commit
b25009a
1 Parent(s): 08cf7ac

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -1
README.md CHANGED
@@ -24,4 +24,14 @@ Fine-tuning datasets for this model are based on [Stack Exchange Paired](https:/
24
 
25
  **Traditional Fine-tuning:** [https://huggingface.co/datasets/lvwerra/stack-exchange-paired/tree/main/data/finetune](https://huggingface.co/datasets/lvwerra/stack-exchange-paired/tree/main/data/finetune)
26
 
27
- **DPO Training:** [https://huggingface.co/datasets/lvwerra/stack-exchange-paired/tree/main/data/rl](https://huggingface.co/datasets/lvwerra/stack-exchange-paired/tree/main/data/rl)
 
 
 
 
 
 
 
 
 
 
 
24
 
25
  **Traditional Fine-tuning:** [https://huggingface.co/datasets/lvwerra/stack-exchange-paired/tree/main/data/finetune](https://huggingface.co/datasets/lvwerra/stack-exchange-paired/tree/main/data/finetune)
26
 
27
+ **DPO Training:** [https://huggingface.co/datasets/lvwerra/stack-exchange-paired/tree/main/data/rl](https://huggingface.co/datasets/lvwerra/stack-exchange-paired/tree/main/data/rl)
28
+
29
+ ### Training Procedure
30
+ The model was first fine-tuned on the Stack Exchange question and answer pairs and then fine-tuned via the DPO training procedure using a Stack Exchange Reward Model.
31
+ It is trained to respond to prompts with the following template:
32
+
33
+ ```
34
+ Question: <Query>
35
+
36
+ Answer: <Response>
37
+ ```