metadata

license: apache-2.0
datasets:
  - Intel/orca_dpo_pairs
tags:
  - mistral
  - dpo
  - una
  - finetune
  - chatml
  - instruct

Neural-una-cybertron-7b

Neural-una-cybertron-7b is an fblgit/una-cybertron-7b-v2-bf16 model that has been further fine-tuned with Direct Preference Optimization (DPO) using the Intel/orca_dpo_pairs dataset.

This model was created after examining the procedure of mlabonne/NeuralHermes-2.5-Mistral-7B model. Special thanks to @mlabonne.

Addionatal Information

This model was fine-tuned on Nvidia A100-SXM4-40GB GPU.

The total training time was 1 hour and 10 minutes.

<|im_start|>system
{system}<|im_end|>
<|im_start|>user
{user}<|im_end|>
<|im_start|>assistant
{asistant}<|im_end|>

LoRA:

r=16
lora_alpha=16
lora_dropout=0.05
bias="none"
task_type="CAUSAL_LM"
target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']

Training arguments:

DPOTrainer: