Myashka
/

gpt-imdb-ipo_annealing

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

gpt-imdb-ipo_annealing / README.md

Myashka's picture

End of training

334af1d 11 months ago

|

history blame contribute delete

4.45 kB

	---
	base_model: lvwerra/gpt2-imdb
	tags:
	- generated_from_trainer
	model-index:
	- name: gpt-imdb-ipo_annealing
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# gpt-imdb-ipo_annealing

	This model is a fine-tuned version of [lvwerra/gpt2-imdb](https://huggingface.co/lvwerra/gpt2-imdb) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 125.6974
	- Rewards/chosen: -0.0343
	- Rewards/rejected: -0.1277
	- Rewards/accuracies: 0.875
	- Rewards/margins: 0.0934
	- Logps/rejected: -267.1282
	- Logps/chosen: -236.1897
	- Logits/rejected: -31.3501
	- Logits/chosen: -31.5916

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 24
	- eval_batch_size: 24
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.99) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 150
	- training_steps: 7197

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 16.3187 \| 0.21 \| 500 \| 34.0876 \| 0.1161 \| -0.1126 \| 0.5292 \| 0.2287 \| -263.8062 \| -235.1407 \| -33.1877 \| -33.4371 \|
	\| 5.5155 \| 0.42 \| 1000 \| 13.0423 \| -0.1485 \| -0.3812 \| 0.5042 \| 0.2327 \| -264.1273 \| -235.4375 \| -35.2608 \| -35.4541 \|
	\| 10.2532 \| 0.63 \| 1500 \| 18.5157 \| -0.4407 \| -0.5471 \| 0.5458 \| 0.1064 \| -264.3746 \| -235.8205 \| -34.2230 \| -34.4246 \|
	\| 6.755 \| 0.83 \| 2000 \| 28.1593 \| -0.7791 \| -0.8052 \| 0.5917 \| 0.0261 \| -264.7961 \| -236.3400 \| -33.6119 \| -33.8069 \|
	\| 9.4126 \| 1.04 \| 2500 \| 9.2406 \| -0.8733 \| -1.2564 \| 0.6229 \| 0.3831 \| -265.6003 \| -236.5962 \| -31.9471 \| -32.0700 \|
	\| 8.5908 \| 1.25 \| 3000 \| 12.4967 \| -0.6700 \| -1.0163 \| 0.6167 \| 0.3462 \| -265.4156 \| -236.4061 \| -31.6914 \| -31.8443 \|
	\| 19.5217 \| 1.46 \| 3500 \| 6.8889 \| -0.0720 \| -0.4689 \| 0.6854 \| 0.3969 \| -264.5895 \| -235.4041 \| -32.1300 \| -32.2692 \|
	\| 6.9195 \| 1.67 \| 4000 \| 4.2435 \| -0.5324 \| -0.9335 \| 0.7021 \| 0.4012 \| -265.7609 \| -236.4489 \| -31.8342 \| -31.9606 \|
	\| 4.6993 \| 1.88 \| 4500 \| 5.0987 \| -0.2002 \| -0.6179 \| 0.7521 \| 0.4177 \| -265.3070 \| -235.7907 \| -31.6301 \| -31.7617 \|
	\| 2.7896 \| 2.08 \| 5000 \| 2.7344 \| -0.2390 \| -0.5589 \| 0.7500 \| 0.3199 \| -265.4754 \| -236.0307 \| -31.9650 \| -32.1009 \|
	\| 3.2262 \| 2.29 \| 5500 \| 3.0584 \| -0.1936 \| -0.5168 \| 0.8083 \| 0.3231 \| -265.8080 \| -236.0606 \| -31.6585 \| -31.8243 \|
	\| 4.1965 \| 2.5 \| 6000 \| 4.2350 \| -0.1555 \| -0.4440 \| 0.8417 \| 0.2884 \| -266.2272 \| -236.1557 \| -31.6484 \| -31.8344 \|
	\| 15.1482 \| 2.71 \| 6500 \| 10.8174 \| -0.0932 \| -0.3244 \| 0.8667 \| 0.2312 \| -266.7491 \| -236.1454 \| -31.4600 \| -31.6800 \|
	\| 145.9251 \| 2.92 \| 7000 \| 125.6974 \| -0.0343 \| -0.1277 \| 0.875 \| 0.0934 \| -267.1282 \| -236.1897 \| -31.3501 \| -31.5916 \|


	### Framework versions

	- Transformers 4.35.2
	- Pytorch 2.1.1
	- Datasets 2.15.0
	- Tokenizers 0.15.0