allenai
/

open-instruct-llama2-sharegpt-dpo-7b

Text Generation

text-generation-inference

Model card Files Files and versions

hamishivi commited on Nov 18, 2023

Commit

00ae9d3

·

1 Parent(s): 071a3f2

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -4,6 +4,7 @@ model-index:
   results: []
 datasets:
 - anon8231489123/ShareGPT_Vicuna_unfiltered
 language:
 - en
 base_model: meta-llama/Llama-2-7b-hf
@@ -15,7 +16,8 @@ base_model: meta-llama/Llama-2-7b-hf
 # Model Card for Open Instruct ShareGPT DPO Llama2 7B
 This model belongs to the Tulu series of models, which is a series of language models that are trained to act as helpful assistants.
-Open Instruct ShareGPT Llama2 7B is a fine-tuned version of Llama 2 that was trained on the [ShareGPT dataset](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered).
 Please check out our paper [TODO] for more!

   results: []
 datasets:
 - anon8231489123/ShareGPT_Vicuna_unfiltered
+- HuggingFaceH4/ultrafeedback_binarized
 language:
 - en
 base_model: meta-llama/Llama-2-7b-hf
 # Model Card for Open Instruct ShareGPT DPO Llama2 7B
 This model belongs to the Tulu series of models, which is a series of language models that are trained to act as helpful assistants.
+Open Instruct ShareGPT Llama2 7B is initially fine-tuned version of Llama 2 that was trained on the [ShareGPT dataset](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered).
+The model was then further trained on the UltraFeedback dataset using [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290).
 Please check out our paper [TODO] for more!