RikkiXu
/

zephyr-7b-dpo-full

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

zephyr-7b-dpo-full / README.md

RikkiXu's picture

Model save

aa55b9d verified 2 months ago

|

No virus

3.34 kB

	---
	tags:
	- trl
	- dpo
	- generated_from_trainer
	model-index:
	- name: zephyr-7b-dpo-full
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# zephyr-7b-dpo-full

	This model was trained from scratch on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.4292
	- Rewards/chosen: -1.8869
	- Rewards/rejected: -2.7914
	- Rewards/accuracies: 0.8242
	- Rewards/margins: 0.9045
	- Logps/rejected: -612.2493
	- Logps/chosen: -524.2042
	- Logits/rejected: -0.4436
	- Logits/chosen: -0.8025

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-07
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 8
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 128
	- total_eval_batch_size: 64
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.5405 \| 0.12 \| 100 \| 0.6086 \| -0.8599 \| -1.1867 \| 0.6953 \| 0.3268 \| -451.7755 \| -421.5048 \| -1.6547 \| -1.7462 \|
	\| 0.4371 \| 0.23 \| 200 \| 0.5454 \| -2.0208 \| -2.5842 \| 0.7422 \| 0.5634 \| -591.5291 \| -537.5920 \| -0.7151 \| -0.8867 \|
	\| 0.4348 \| 0.35 \| 300 \| 0.5012 \| -2.0998 \| -2.8410 \| 0.7734 \| 0.7413 \| -617.2101 \| -545.4883 \| -0.3499 \| -0.5939 \|
	\| 0.3733 \| 0.46 \| 400 \| 0.4721 \| -2.1506 \| -2.9308 \| 0.7773 \| 0.7802 \| -626.1902 \| -550.5717 \| -0.2280 \| -0.5456 \|
	\| 0.3689 \| 0.58 \| 500 \| 0.4484 \| -2.0467 \| -2.9485 \| 0.7969 \| 0.9018 \| -627.9595 \| -540.1826 \| -0.1091 \| -0.4774 \|
	\| 0.3829 \| 0.69 \| 600 \| 0.4419 \| -2.0265 \| -2.9075 \| 0.8086 \| 0.8810 \| -623.8541 \| -538.1624 \| -0.1412 \| -0.5099 \|
	\| 0.3725 \| 0.81 \| 700 \| 0.4329 \| -1.9184 \| -2.8079 \| 0.8242 \| 0.8895 \| -613.8932 \| -527.3496 \| -0.3224 \| -0.6920 \|
	\| 0.4052 \| 0.92 \| 800 \| 0.4292 \| -1.8869 \| -2.7914 \| 0.8242 \| 0.9045 \| -612.2493 \| -524.2042 \| -0.4436 \| -0.8025 \|


	### Framework versions

	- Transformers 4.38.2
	- Pytorch 2.1.2+cu118
	- Datasets 2.16.1
	- Tokenizers 0.15.2