chansurgeplus
commited on
Commit
•
ef3b1e7
1
Parent(s):
4e4e66b
Update README.md
Browse files
README.md
CHANGED
@@ -14,7 +14,7 @@ The OpenBezoar-HH-RLHF-DPO is an LLM that has been fine tuned for human preferen
|
|
14 |
|
15 |
## Model Details
|
16 |
|
17 |
-
- Base Model: [OpenBezoar-HH-RLHF-SFT](https://huggingface.co/SurgeGlobal/OpenBezoar-HH-RLHF-SFT)
|
18 |
- Dataset used for SFT: First 100K examples of the [HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf) dataset
|
19 |
- Alignment Method: [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290)
|
20 |
- Epochs: 1
|
|
|
14 |
|
15 |
## Model Details
|
16 |
|
17 |
+
- Base Model: [OpenBezoar-HH-RLHF-SFT](https://huggingface.co/SurgeGlobal/OpenBezoar-HH-RLHF-SFT)
|
18 |
- Dataset used for SFT: First 100K examples of the [HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf) dataset
|
19 |
- Alignment Method: [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290)
|
20 |
- Epochs: 1
|