Trained NousResearch/Nous-Hermes-llama-2-7b on UltraFeedback for Direct Preference Optimization on the preference data created on Ultrafeedback having difference b/w chosen score and rejected score>=5

Downloads last month: 24

Safetensors

Model size

6.74B params

Tensor type

F32

BF16

Inference Examples

Text Generation

Unable to determine this model's library. Check the docs .

Model tree for gupta-tanish/llama-7b-dpo-baseline

Base model

NousResearch/Nous-Hermes-llama-2-7b

Finetuned

(2)

this model

gupta-tanish
/

llama-7b-dpo-baseline

Model tree for gupta-tanish/llama-7b-dpo-baseline

Dataset used to train gupta-tanish/llama-7b-dpo-baseline