Update README.md
Browse files
README.md
CHANGED
@@ -4,6 +4,7 @@ model-index:
|
|
4 |
results: []
|
5 |
datasets:
|
6 |
- anon8231489123/ShareGPT_Vicuna_unfiltered
|
|
|
7 |
language:
|
8 |
- en
|
9 |
base_model: meta-llama/Llama-2-7b-hf
|
@@ -15,7 +16,8 @@ base_model: meta-llama/Llama-2-7b-hf
|
|
15 |
# Model Card for Open Instruct ShareGPT DPO Llama2 7B
|
16 |
|
17 |
This model belongs to the Tulu series of models, which is a series of language models that are trained to act as helpful assistants.
|
18 |
-
Open Instruct ShareGPT Llama2 7B is
|
|
|
19 |
Please check out our paper [TODO] for more!
|
20 |
|
21 |
|
|
|
4 |
results: []
|
5 |
datasets:
|
6 |
- anon8231489123/ShareGPT_Vicuna_unfiltered
|
7 |
+
- HuggingFaceH4/ultrafeedback_binarized
|
8 |
language:
|
9 |
- en
|
10 |
base_model: meta-llama/Llama-2-7b-hf
|
|
|
16 |
# Model Card for Open Instruct ShareGPT DPO Llama2 7B
|
17 |
|
18 |
This model belongs to the Tulu series of models, which is a series of language models that are trained to act as helpful assistants.
|
19 |
+
Open Instruct ShareGPT Llama2 7B is initially fine-tuned version of Llama 2 that was trained on the [ShareGPT dataset](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered).
|
20 |
+
The model was then further trained on the UltraFeedback dataset using [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290).
|
21 |
Please check out our paper [TODO] for more!
|
22 |
|
23 |
|