bert-dp-4

This model is a fine-tuned version of on the generator dataset. It achieves the following results on the evaluation set:

Loss: 2.4611

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 64
eval_batch_size: 64
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 1000
num_epochs: 180
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
6.3492	1.89	1000	5.9327
5.8333	3.78	2000	5.8515
5.7604	5.67	3000	5.8483
5.7137	7.56	4000	5.7914
5.6597	9.45	5000	5.7672
5.6213	11.34	6000	5.7594
5.5798	13.23	7000	5.7352
5.5482	15.12	8000	5.7275
5.513	17.01	9000	5.7203
5.485	18.9	10000	5.7211
5.4498	20.79	11000	5.6947
5.4175	22.68	12000	5.6923
5.3877	24.57	13000	5.6879
5.3635	26.47	14000	5.6776
5.3389	28.36	15000	5.6757
5.3166	30.25	16000	5.6758
5.2951	32.14	17000	5.6676
5.2793	34.03	18000	5.6711
5.2684	35.92	19000	5.6687
5.2609	37.81	20000	5.6684
5.2606	39.7	21000	5.6719
5.2624	41.59	22000	5.6697
5.2551	43.48	23000	5.6718
5.2461	45.37	24000	5.6699
5.2431	47.26	25000	5.6692
5.2414	49.15	26000	5.6691
5.2856	51.04	27000	5.6823
5.2753	52.93	28000	5.6860
5.2549	54.82	29000	5.6877
5.2276	56.71	30000	5.6285
5.1674	58.6	31000	5.5439
5.0894	60.49	32000	5.4082
4.9508	62.38	33000	5.1598
4.7453	64.27	34000	4.9274
4.5898	66.16	35000	4.7884
4.4656	68.05	36000	4.6531
4.35	69.94	37000	4.5123
4.2378	71.83	38000	4.4012
4.1496	73.72	39000	4.3240
4.0891	75.61	40000	4.2763
4.0538	77.5	41000	4.2520
4.0448	79.4	42000	4.2485
3.9724	81.29	43000	3.9940
3.6527	83.18	44000	3.7442
3.4172	85.07	45000	3.5713
3.2446	86.96	46000	3.4403
3.4764	88.85	47000	3.3796
3.0543	90.74	48000	3.2884
2.9549	92.63	49000	3.2107
2.8785	94.52	50000	3.1466
2.8143	96.41	51000	3.0788
2.7605	98.3	52000	3.0230
2.7111	100.19	53000	2.9802
2.6727	102.08	54000	2.9414
2.6417	103.97	55000	2.9167
2.612	105.86	56000	2.8927
2.5918	107.75	57000	2.8769
2.5769	109.64	58000	2.8637
2.566	111.53	59000	2.8551
2.556	113.42	60000	2.8458
2.548	115.31	61000	2.8488
2.5468	117.2	62000	2.8412
2.5453	119.09	63000	2.8383
2.7567	120.98	64000	2.8857
2.6017	122.87	65000	2.8382
2.5416	124.76	66000	2.7862
2.484	126.65	67000	2.7415
2.4361	128.54	68000	2.7079
2.3925	130.43	69000	2.6771
2.3512	132.33	70000	2.6542
2.3146	134.22	71000	2.6327
2.2805	136.11	72000	2.6119
2.2494	138.0	73000	2.5903
2.2218	139.89	74000	2.5734
2.1955	141.78	75000	2.5584
2.1739	143.67	76000	2.5459
2.154	145.56	77000	2.5337
2.1324	147.45	78000	2.5260
2.1149	149.34	79000	2.5169
2.096	151.23	80000	2.5095
2.083	153.12	81000	2.5045
2.0666	155.01	82000	2.4911
2.0562	156.9	83000	2.4907
2.0437	158.79	84000	2.4808
2.0356	160.68	85000	2.4816
2.0317	162.57	86000	2.4758
2.0201	164.46	87000	2.4724
2.0138	166.35	88000	2.4723
2.0095	168.24	89000	2.4651
2.0056	170.13	90000	2.4651
2.0021	172.02	91000	2.4616
1.9974	173.91	92000	2.4611
1.9985	175.8	93000	2.4613
1.9954	177.69	94000	2.4579
1.9979	179.58	95000	2.4611

Framework versions

Transformers 4.26.1
Pytorch 1.11.0+cu113
Datasets 2.13.0
Tokenizers 0.13.3

NasimB
/

bert-dp-4

bert-dp-4

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results