metadata

license: mit
base_model: microsoft/DialoGPT-large
tags:
  - generated_from_trainer
model-index:
  - name: DialoGPT-large-faqs-block-size-128-bs-16-lr-7e-6
    results: []

DialoGPT-large-faqs-block-size-128-bs-16-lr-7e-6

This model is a fine-tuned version of microsoft/DialoGPT-large on the None dataset. It achieves the following results on the evaluation set:

Loss: 2.4362

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 7e-06
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss
No log	1.0	40	4.4791
No log	2.0	80	3.7462
No log	3.0	120	3.2760
No log	4.0	160	3.0066
No log	5.0	200	2.8421
No log	6.0	240	2.7291
No log	7.0	280	2.6535
No log	8.0	320	2.5975
No log	9.0	360	2.5532
No log	10.0	400	2.5265
No log	11.0	440	2.4987
No log	12.0	480	2.4778
2.9559	13.0	520	2.4655
2.9559	14.0	560	2.4553
2.9559	15.0	600	2.4449
2.9559	16.0	640	2.4456
2.9559	17.0	680	2.4389
2.9559	18.0	720	2.4384
2.9559	19.0	760	2.4372
2.9559	20.0	800	2.4362

Framework versions

Transformers 4.33.0.dev0
Pytorch 2.0.1+cu118
Datasets 2.14.4.dev0
Tokenizers 0.13.3