dobbersc's picture
Add de2en and en2de models
c89209e verified
2024-07-29 08:02:54,900 ----------------------------------------------------------------------------------------------------
2024-07-29 08:02:54,900 Training Model
2024-07-29 08:02:54,900 ----------------------------------------------------------------------------------------------------
2024-07-29 08:02:54,900 Translator(
(encoder): EncoderLSTM(
(embedding): Embedding(114, 300, padding_idx=0)
(dropout): Dropout(p=0.1, inplace=False)
(lstm): LSTM(300, 512, batch_first=True, bidirectional=True)
)
(decoder): DecoderLSTM(
(embedding): Embedding(112, 300, padding_idx=0)
(dropout): Dropout(p=0.1, inplace=False)
(lstm): LSTM(300, 1024, batch_first=True)
(hidden2vocab): Linear(in_features=1024, out_features=112, bias=True)
(log_softmax): LogSoftmax(dim=-1)
)
)
2024-07-29 08:02:54,900 ----------------------------------------------------------------------------------------------------
2024-07-29 08:02:54,900 Training Hyperparameters:
2024-07-29 08:02:54,900 - max_epochs: 10
2024-07-29 08:02:54,900 - learning_rate: 0.001
2024-07-29 08:02:54,900 - batch_size: 128
2024-07-29 08:02:54,900 - patience: 5
2024-07-29 08:02:54,900 - scheduler_patience: 3
2024-07-29 08:02:54,900 - teacher_forcing_ratio: 0.5
2024-07-29 08:02:54,900 ----------------------------------------------------------------------------------------------------
2024-07-29 08:02:54,900 Computational Parameters:
2024-07-29 08:02:54,900 - num_workers: 4
2024-07-29 08:02:54,900 - device: device(type='cuda', index=0)
2024-07-29 08:02:54,900 ----------------------------------------------------------------------------------------------------
2024-07-29 08:02:54,900 Dataset Splits:
2024-07-29 08:02:54,900 - train: 133623 data points
2024-07-29 08:02:54,900 - dev: 19090 data points
2024-07-29 08:02:54,900 - test: 38179 data points
2024-07-29 08:02:54,900 ----------------------------------------------------------------------------------------------------
2024-07-29 08:02:54,900 EPOCH 1
2024-07-29 08:03:53,216 batch 104/1044 - loss 2.73587249 - lr 0.0010 - time 58.32s
2024-07-29 08:04:49,000 batch 208/1044 - loss 2.60424047 - lr 0.0010 - time 114.10s
2024-07-29 08:05:41,466 batch 312/1044 - loss 2.53508188 - lr 0.0010 - time 166.57s
2024-07-29 08:06:40,078 batch 416/1044 - loss 2.48822718 - lr 0.0010 - time 225.18s
2024-07-29 08:07:39,192 batch 520/1044 - loss 2.45172488 - lr 0.0010 - time 284.29s
2024-07-29 08:08:34,794 batch 624/1044 - loss 2.42237450 - lr 0.0010 - time 339.89s
2024-07-29 08:09:31,742 batch 728/1044 - loss 2.39725696 - lr 0.0010 - time 396.84s
2024-07-29 08:10:29,289 batch 832/1044 - loss 2.37658248 - lr 0.0010 - time 454.39s
2024-07-29 08:11:26,901 batch 936/1044 - loss 2.35841719 - lr 0.0010 - time 512.00s
2024-07-29 08:12:23,384 batch 1040/1044 - loss 2.34286929 - lr 0.0010 - time 568.48s
2024-07-29 08:12:25,463 ----------------------------------------------------------------------------------------------------
2024-07-29 08:12:25,466 EPOCH 1 DONE
2024-07-29 08:12:53,761 TRAIN Loss: 2.3424
2024-07-29 08:12:53,761 DEV Loss: 3.0900
2024-07-29 08:12:53,761 DEV Perplexity: 21.9763
2024-07-29 08:12:53,761 New best score!
2024-07-29 08:12:53,762 ----------------------------------------------------------------------------------------------------
2024-07-29 08:12:53,762 EPOCH 2
2024-07-29 08:13:52,690 batch 104/1044 - loss 2.18471333 - lr 0.0010 - time 58.93s
2024-07-29 08:14:49,182 batch 208/1044 - loss 2.17075370 - lr 0.0010 - time 115.42s
2024-07-29 08:15:46,935 batch 312/1044 - loss 2.16476290 - lr 0.0010 - time 173.17s
2024-07-29 08:16:43,859 batch 416/1044 - loss 2.15894632 - lr 0.0010 - time 230.10s
2024-07-29 08:17:40,100 batch 520/1044 - loss 2.15486173 - lr 0.0010 - time 286.34s
2024-07-29 08:18:36,802 batch 624/1044 - loss 2.15043282 - lr 0.0010 - time 343.04s
2024-07-29 08:19:30,048 batch 728/1044 - loss 2.14694826 - lr 0.0010 - time 396.29s
2024-07-29 08:20:28,026 batch 832/1044 - loss 2.14419594 - lr 0.0010 - time 454.26s
2024-07-29 08:21:25,600 batch 936/1044 - loss 2.14010675 - lr 0.0010 - time 511.84s
2024-07-29 08:22:25,420 batch 1040/1044 - loss 2.13670866 - lr 0.0010 - time 571.66s
2024-07-29 08:22:27,758 ----------------------------------------------------------------------------------------------------
2024-07-29 08:22:27,762 EPOCH 2 DONE
2024-07-29 08:22:55,713 TRAIN Loss: 2.1365
2024-07-29 08:22:55,714 DEV Loss: 3.1892
2024-07-29 08:22:55,714 DEV Perplexity: 24.2695
2024-07-29 08:22:55,714 No improvement for 1 epoch(s)
2024-07-29 08:22:55,714 ----------------------------------------------------------------------------------------------------
2024-07-29 08:22:55,714 EPOCH 3
2024-07-29 08:23:53,091 batch 104/1044 - loss 2.08751571 - lr 0.0010 - time 57.38s
2024-07-29 08:24:50,619 batch 208/1044 - loss 2.08733297 - lr 0.0010 - time 114.90s
2024-07-29 08:25:48,704 batch 312/1044 - loss 2.08495532 - lr 0.0010 - time 172.99s
2024-07-29 08:26:46,137 batch 416/1044 - loss 2.08294034 - lr 0.0010 - time 230.42s
2024-07-29 08:27:41,812 batch 520/1044 - loss 2.08286387 - lr 0.0010 - time 286.10s
2024-07-29 08:28:37,415 batch 624/1044 - loss 2.07837076 - lr 0.0010 - time 341.70s
2024-07-29 08:29:35,773 batch 728/1044 - loss 2.07550259 - lr 0.0010 - time 400.06s
2024-07-29 08:30:32,773 batch 832/1044 - loss 2.07277058 - lr 0.0010 - time 457.06s
2024-07-29 08:31:31,914 batch 936/1044 - loss 2.06922043 - lr 0.0010 - time 516.20s
2024-07-29 08:32:27,776 batch 1040/1044 - loss 2.06737263 - lr 0.0010 - time 572.06s
2024-07-29 08:32:29,985 ----------------------------------------------------------------------------------------------------
2024-07-29 08:32:29,987 EPOCH 3 DONE
2024-07-29 08:32:58,150 TRAIN Loss: 2.0673
2024-07-29 08:32:58,150 DEV Loss: 3.2107
2024-07-29 08:32:58,150 DEV Perplexity: 24.7975
2024-07-29 08:32:58,150 No improvement for 2 epoch(s)
2024-07-29 08:32:58,150 ----------------------------------------------------------------------------------------------------
2024-07-29 08:32:58,150 EPOCH 4
2024-07-29 08:33:55,013 batch 104/1044 - loss 2.04089623 - lr 0.0010 - time 56.86s
2024-07-29 08:34:52,898 batch 208/1044 - loss 2.03903778 - lr 0.0010 - time 114.75s
2024-07-29 08:35:51,119 batch 312/1044 - loss 2.03777666 - lr 0.0010 - time 172.97s
2024-07-29 08:36:48,063 batch 416/1044 - loss 2.03265216 - lr 0.0010 - time 229.91s
2024-07-29 08:37:43,123 batch 520/1044 - loss 2.03068389 - lr 0.0010 - time 284.97s
2024-07-29 08:38:42,281 batch 624/1044 - loss 2.02925459 - lr 0.0010 - time 344.13s
2024-07-29 08:39:38,619 batch 728/1044 - loss 2.02635143 - lr 0.0010 - time 400.47s
2024-07-29 08:40:34,110 batch 832/1044 - loss 2.02490569 - lr 0.0010 - time 455.96s
2024-07-29 08:41:30,332 batch 936/1044 - loss 2.02244815 - lr 0.0010 - time 512.18s
2024-07-29 08:42:26,605 batch 1040/1044 - loss 2.02155263 - lr 0.0010 - time 568.45s
2024-07-29 08:42:28,905 ----------------------------------------------------------------------------------------------------
2024-07-29 08:42:28,908 EPOCH 4 DONE
2024-07-29 08:42:56,907 TRAIN Loss: 2.0215
2024-07-29 08:42:56,908 DEV Loss: 3.3884
2024-07-29 08:42:56,908 DEV Perplexity: 29.6186
2024-07-29 08:42:56,908 No improvement for 3 epoch(s)
2024-07-29 08:42:56,908 ----------------------------------------------------------------------------------------------------
2024-07-29 08:42:56,908 EPOCH 5
2024-07-29 08:43:54,221 batch 104/1044 - loss 1.99417387 - lr 0.0010 - time 57.31s
2024-07-29 08:44:52,997 batch 208/1044 - loss 1.98792041 - lr 0.0010 - time 116.09s
2024-07-29 08:45:48,150 batch 312/1044 - loss 1.99154850 - lr 0.0010 - time 171.24s
2024-07-29 08:46:45,419 batch 416/1044 - loss 1.99533101 - lr 0.0010 - time 228.51s
2024-07-29 08:47:44,326 batch 520/1044 - loss 1.99671145 - lr 0.0010 - time 287.42s
2024-07-29 08:48:42,269 batch 624/1044 - loss 1.99625001 - lr 0.0010 - time 345.36s
2024-07-29 08:49:37,222 batch 728/1044 - loss 1.99431187 - lr 0.0010 - time 400.31s
2024-07-29 08:50:32,593 batch 832/1044 - loss 1.99355745 - lr 0.0010 - time 455.68s
2024-07-29 08:51:28,854 batch 936/1044 - loss 1.99387271 - lr 0.0010 - time 511.95s
2024-07-29 08:52:26,219 batch 1040/1044 - loss 1.99341333 - lr 0.0010 - time 569.31s
2024-07-29 08:52:28,341 ----------------------------------------------------------------------------------------------------
2024-07-29 08:52:28,343 EPOCH 5 DONE
2024-07-29 08:52:56,407 TRAIN Loss: 1.9933
2024-07-29 08:52:56,407 DEV Loss: 3.4417
2024-07-29 08:52:56,407 DEV Perplexity: 31.2411
2024-07-29 08:52:56,408 No improvement for 4 epoch(s)
2024-07-29 08:52:56,408 ----------------------------------------------------------------------------------------------------
2024-07-29 08:52:56,408 EPOCH 6
2024-07-29 08:53:53,603 batch 104/1044 - loss 1.94269003 - lr 0.0001 - time 57.20s
2024-07-29 08:54:47,873 batch 208/1044 - loss 1.94458400 - lr 0.0001 - time 111.47s
2024-07-29 08:55:46,532 batch 312/1044 - loss 1.94421510 - lr 0.0001 - time 170.12s
2024-07-29 08:56:43,219 batch 416/1044 - loss 1.94502676 - lr 0.0001 - time 226.81s
2024-07-29 08:57:40,699 batch 520/1044 - loss 1.94408302 - lr 0.0001 - time 284.29s
2024-07-29 08:58:39,775 batch 624/1044 - loss 1.94322916 - lr 0.0001 - time 343.37s
2024-07-29 08:59:37,572 batch 728/1044 - loss 1.94293834 - lr 0.0001 - time 401.16s
2024-07-29 09:00:35,569 batch 832/1044 - loss 1.94358721 - lr 0.0001 - time 459.16s
2024-07-29 09:01:31,771 batch 936/1044 - loss 1.94235871 - lr 0.0001 - time 515.36s
2024-07-29 09:02:29,148 batch 1040/1044 - loss 1.94160419 - lr 0.0001 - time 572.74s
2024-07-29 09:02:31,339 ----------------------------------------------------------------------------------------------------
2024-07-29 09:02:31,341 EPOCH 6 DONE
2024-07-29 09:02:59,432 TRAIN Loss: 1.9417
2024-07-29 09:02:59,433 DEV Loss: 3.2860
2024-07-29 09:02:59,433 DEV Perplexity: 26.7353
2024-07-29 09:02:59,433 No improvement for 5 epoch(s)
2024-07-29 09:02:59,433 Patience reached: Terminating model training due to early stopping
2024-07-29 09:02:59,433 ----------------------------------------------------------------------------------------------------
2024-07-29 09:02:59,433 Finished Training
2024-07-29 09:03:55,129 TEST Perplexity: 21.9740
2024-07-29 09:14:15,480 TEST BLEU = 4.83 42.3/11.5/2.1/0.5 (BP = 1.000 ratio = 1.000 hyp_len = 97 ref_len = 97)