chchen
/

Vicuna-7B-v1.5-ORPO

Generated from Trainer

Model card Files Files and versions Community

Edit model card

Vicuna-7B-v1.5-ORPO

This model is a fine-tuned version of lmsys/vicuna-7b-v1.5 on the dpo_mix_en dataset. It achieves the following results on the evaluation set:

Loss: 1.0073
Rewards/chosen: -0.0940
Rewards/rejected: -0.1081
Rewards/accuracies: 0.5160
Rewards/margins: 0.0141
Logps/rejected: -1.0807
Logps/chosen: -0.9399
Logits/rejected: -0.2988
Logits/chosen: -0.3321
Sft Loss: 0.9399
Odds Ratio Loss: 0.6739

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 0.1
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen	Sft Loss	Odds Ratio Loss
1.0913	0.8891	500	1.0354	-0.0968	-0.1107	0.5180	0.0140	-1.1075	-0.9676	-0.3176	-0.3490	0.9676	0.6776
1.0328	1.7782	1000	1.0126	-0.0945	-0.1086	0.5160	0.0141	-1.0856	-0.9451	-0.2979	-0.3308	0.9451	0.6748
0.9998	2.6673	1500	1.0073	-0.0940	-0.1081	0.5160	0.0141	-1.0807	-0.9399	-0.2988	-0.3321	0.9399	0.6739

Framework versions

PEFT 0.10.0
Transformers 4.40.1
Pytorch 2.3.0
Datasets 2.19.0
Tokenizers 0.19.1

Downloads last month: 0

Unable to determine this model’s pipeline type. Check the docs .

Adapter for

Evaluation results

Metadata error: specify a dataset to view leaderboard