Text Generation
Transformers
PyTorch
Safetensors
English
llama
text-generation-inference
Inference Endpoints
chansurgeplus commited on
Commit
ef3b1e7
1 Parent(s): 4e4e66b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -14,7 +14,7 @@ The OpenBezoar-HH-RLHF-DPO is an LLM that has been fine tuned for human preferen
14
 
15
  ## Model Details
16
 
17
- - Base Model: [OpenBezoar-HH-RLHF-SFT](https://huggingface.co/SurgeGlobal/OpenBezoar-HH-RLHF-SFT) model on a subset of [Anthropic's HH-RLHF Dataset](https://huggingface.co/datasets/Anthropic/hh-rlhf)
18
  - Dataset used for SFT: First 100K examples of the [HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf) dataset
19
  - Alignment Method: [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290)
20
  - Epochs: 1
 
14
 
15
  ## Model Details
16
 
17
+ - Base Model: [OpenBezoar-HH-RLHF-SFT](https://huggingface.co/SurgeGlobal/OpenBezoar-HH-RLHF-SFT)
18
  - Dataset used for SFT: First 100K examples of the [HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf) dataset
19
  - Alignment Method: [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290)
20
  - Epochs: 1