natolambert
commited on
Commit
•
0709c67
1
Parent(s):
c7eaf1c
Update README.md
Browse files
README.md
CHANGED
@@ -69,7 +69,8 @@ All smaller DPO'd models have strong performance per model size in the category
|
|
69 |
## Intended uses & limitations
|
70 |
|
71 |
The model was initially fine-tuned on a filtered and preprocessed of the Tulu V2 mix dataset (TODO add link), which contains a diverse range of human created instructions and synthetic dialogues generated primarily by other LLMs.
|
72 |
-
We then further aligned the model with a [Jax DPO trainer](https://github.com/hamishivi/EasyLM/blob/main/EasyLM/models/llama/llama_train_dpo.py) built on [EasyLM](https://github.com/young-geng/EasyLM) on the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, which contains 64k prompts and model completions that are ranked by GPT-4.
|
|
|
73 |
|
74 |
<!-- You can find the datasets used for training Tulu V2 [here]() -->
|
75 |
|
|
|
69 |
## Intended uses & limitations
|
70 |
|
71 |
The model was initially fine-tuned on a filtered and preprocessed of the Tulu V2 mix dataset (TODO add link), which contains a diverse range of human created instructions and synthetic dialogues generated primarily by other LLMs.
|
72 |
+
We then further aligned the model with a [Jax DPO trainer](https://github.com/hamishivi/EasyLM/blob/main/EasyLM/models/llama/llama_train_dpo.py) built on [EasyLM](https://github.com/young-geng/EasyLM) on the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, which contains 64k prompts and model completions that are ranked by GPT-4.
|
73 |
+
|
74 |
|
75 |
<!-- You can find the datasets used for training Tulu V2 [here]() -->
|
76 |
|