Loading model from path bert-base-uncased
Task: ner
Model path: bert-base-uncased
Data path: ./data/ud/
Tokenizer: bert-base-uncased
Batch size: 32
Epochs: 10
Learning rate: 2e-05
LR Decay: 0.3
LR Decay End Epoch: 5
Sequence length: 96
Training: True
Num Threads: 24
Num Sentences: 0
Max Norm: 0.0
Use GNN: False
Use label weights: False
PID: 3523179, PGID: 3523174
ATen/Parallel:
	at::get_num_threads() : 24
	at::get_num_interop_threads() : 36
OpenMP 201511 (a.k.a. OpenMP 4.5)
	omp_get_max_threads() : 24
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
	mkl_get_max_threads() : 24
Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
std::thread::hardware_concurrency() : 72
Environment variables:
	OMP_NUM_THREADS : 24
	MKL_NUM_THREADS : 24
ATen parallel backend: OpenMP

Training model
Loading Training Data
Loading NER labels from ./data/ud/**/*-train-orig.ner
en_atis-ud-train-orig.ner
num sentences: 4274
en_cesl-ud-train-orig.ner
num sentences: 4124
en_ewt-ud-train-orig.ner
num sentences: 11649
en_gum-ud-train-orig.ner
num sentences: 5344
en_lines-ud-train-orig.ner
num sentences: 3010
en_partut-ud-train-orig.ner
num sentences: 1739
Example of NER labels: [[['what', 'O'], ['is', 'O'], ['the', 'O'], ['cost', 'O'], ['of', 'O'], ['a', 'O'], ['round', 'O'], ['trip', 'O'], ['flight', 'O'], ['from', 'O'], ['pittsburgh', 'S-GPE'], ['to', 'O'], ['atlanta', 'S-GPE'], ['beginning', 'O'], ['on', 'O'], ['april', 'B-DATE'], ['twenty', 'I-DATE'], ['fifth', 'E-DATE'], ['and', 'O'], ['returning', 'O'], ['on', 'O'], ['may', 'B-DATE'], ['sixth', 'E-DATE']], [['now', 'O'], ['i', 'O'], ['need', 'O'], ['a', 'O'], ['flight', 'O'], ['leaving', 'O'], ['fort', 'B-GPE'], ['worth', 'E-GPE'], ['and', 'O'], ['arriving', 'O'], ['in', 'O'], ['denver', 'S-GPE'], ['no', 'O'], ['later', 'O'], ['than', 'O'], ['2', 'B-TIME'], ['pm', 'E-TIME'], ['next', 'B-DATE'], ['monday', 'E-DATE']]]
30140 sentences, 942 batches of size 32

Control example of InputFeatures
Input Ids: [101, 2085, 1045, 2342, 1037, 3462, 2975, 3481, 4276, 1998, 7194, 1999, 7573, 2053, 2101, 2084, 1016, 7610, 2279, 6928, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Input Mask: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Label Ids: [77, 1, 1, 1, 1, 1, 1, 31, 32, 1, 1, 1, 16, 1, 1, 1, 28, 30, 17, 19, 78, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Valid Ids: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
Label Mask: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Segment Ids: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Loading Validation Data
Loading NER labels from ./data/ud/**/*-dev-orig.ner
en_atis-ud-dev-orig.ner
num sentences: 572
en_cesl-ud-dev-orig.ner
num sentences: 500
en_ewt-ud-dev-orig.ner
num sentences: 1875
en_gum-ud-dev-orig.ner
num sentences: 788
en_lines-ud-dev-orig.ner
num sentences: 986
en_partut-ud-dev-orig.ner
num sentences: 149
Example of NER labels: [[['i', 'O'], ['would', 'O'], ['like', 'O'], ['the', 'O'], ['cheapest', 'O'], ['flight', 'O'], ['from', 'O'], ['pittsburgh', 'S-GPE'], ['to', 'O'], ['atlanta', 'S-GPE'], ['leaving', 'O'], ['april', 'B-DATE'], ['twenty', 'I-DATE'], ['fifth', 'E-DATE'], ['and', 'O'], ['returning', 'O'], ['may', 'B-DATE'], ['sixth', 'E-DATE']], [['i', 'O'], ['want', 'O'], ['a', 'O'], ['flight', 'O'], ['from', 'O'], ['memphis', 'S-LOC'], ['to', 'O'], ['seattle', 'S-FAC'], ['that', 'O'], ['arrives', 'O'], ['no', 'O'], ['later', 'O'], ['than', 'O'], ['3', 'B-TIME'], ['pm', 'E-TIME']]]
4870 sentences, 153 batches of size 32

Control example of InputFeatures
Input Ids: [101, 1045, 2215, 1037, 3462, 2013, 9774, 2000, 5862, 2008, 8480, 2053, 2101, 2084, 1017, 7610, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Input Mask: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Label Ids: [77, 1, 1, 1, 1, 1, 59, 1, 60, 1, 1, 1, 1, 1, 28, 30, 78, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Valid Ids: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
Label Mask: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Segment Ids: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Test Data
Loading NER labels from ./data/ud/**/*-test-orig.ner
en_atis-ud-test-orig.ner
num sentences: 586
en_cesl-ud-test-orig.ner
num sentences: 500
en_ewt-ud-test-orig.ner
num sentences: 1955
en_gum-ud-test-orig.ner
num sentences: 851
en_lines-ud-test-orig.ner
num sentences: 988
en_pud-ud-test-orig.ner
num sentences: 973
en_partut-ud-test-orig.ner
num sentences: 149
en_pronouns-ud-test-orig.ner
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForNer: ['cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForNer from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForNer from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForNer were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
num sentences: 265
Example of NER labels: [[['what', 'O'], ['are', 'O'], ['the', 'O'], ['coach', 'O'], ['flights', 'O'], ['between', 'O'], ['dallas', 'S-GPE'], ['and', 'O'], ['baltimore', 'S-GPE'], ['leaving', 'O'], ['august', 'B-DATE'], ['tenth', 'E-DATE'], ['and', 'O'], ['returning', 'O'], ['august', 'B-DATE'], ['twelve', 'E-DATE']], [['i', 'O'], ['want', 'O'], ['a', 'O'], ['flight', 'O'], ['from', 'O'], ['nashville', 'S-GPE'], ['to', 'O'], ['seattle', 'S-GPE'], ['that', 'O'], ['arrives', 'O'], ['no', 'O'], ['later', 'O'], ['than', 'O'], ['3', 'B-TIME'], ['pm', 'E-TIME']]]
6267 sentences, 196 batches of size 32

Control example of InputFeatures
Input Ids: [101, 1045, 2215, 1037, 3462, 2013, 8423, 2000, 5862, 2008, 8480, 2053, 2101, 2084, 1017, 7610, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Input Mask: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Label Ids: [77, 1, 1, 1, 1, 1, 16, 1, 16, 1, 1, 1, 1, 1, 28, 30, 78, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Valid Ids: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
Label Mask: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Segment Ids: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Adjusting learning rate of group 0 to 2.0000e-05.
  0%|          | 0/942 [00:00<?, ?it/s]Train Epoch 0:  21%|██        | 200/942 [12:21<45:50,  3.71s/it, loss=0.993]Train Epoch 0:  42%|████▏     | 400/942 [24:44<33:32,  3.71s/it, loss=0.405]Train Epoch 0:  64%|██████▎   | 600/942 [37:08<21:10,  3.71s/it, loss=0.265]Train Epoch 0:  85%|████████▍ | 800/942 [49:31<08:47,  3.71s/it, loss=0.209]Train Epoch 0: 100%|██████████| 942/942 [58:18<00:00,  3.71s/it, loss=0.611]
/home/9_QuAnTuM_6/cdaniel/venv_syntrans/lib/python3.8/site-packages/seqeval/metrics/sequence_labeling.py:171: UserWarning: [CLS] seems not to be NE tag.
  warnings.warn('{} seems not to be NE tag.'.format(chunk))
/home/9_QuAnTuM_6/cdaniel/venv_syntrans/lib/python3.8/site-packages/seqeval/metrics/sequence_labeling.py:171: UserWarning: <unk> seems not to be NE tag.
  warnings.warn('{} seems not to be NE tag.'.format(chunk))
/home/9_QuAnTuM_6/cdaniel/venv_syntrans/lib/python3.8/site-packages/seqeval/metrics/sequence_labeling.py:171: UserWarning: <START> seems not to be NE tag.
  warnings.warn('{} seems not to be NE tag.'.format(chunk))
/home/9_QuAnTuM_6/cdaniel/venv_syntrans/lib/python3.8/site-packages/seqeval/metrics/sequence_labeling.py:171: UserWarning: X seems not to be NE tag.
  warnings.warn('{} seems not to be NE tag.'.format(chunk))
/home/9_QuAnTuM_6/cdaniel/venv_syntrans/lib/python3.8/site-packages/seqeval/metrics/sequence_labeling.py:171: UserWarning: [SEP] seems not to be NE tag.
  warnings.warn('{} seems not to be NE tag.'.format(chunk))
/home/9_QuAnTuM_6/cdaniel/venv_syntrans/lib/python3.8/site-packages/seqeval/metrics/sequence_labeling.py:171: UserWarning: <STOP> seems not to be NE tag.
  warnings.warn('{} seems not to be NE tag.'.format(chunk))
O Token Predictions: 471383, NER token predictions: 32921
loss: 0.4831624427086608 w prec: 0.48333733331137896 w recall: 0.32263332619404583 w f1: 0.3743943894083346
  0%|          | 0/153 [00:00<?, ?it/s]Validation Epoch 0: 100%|██████████| 153/153 [01:58<00:00,  1.29it/s, loss=0.104]
O Token Predictions: 68594, NER token predictions: 6306
loss: 0.24858878246125052 w prec: 0.5736847431314509 w recall: 0.5198167695678152 w f1: 0.5394522935165895
Adjusting learning rate of group 0 to 1.7200e-05.
  0%|          | 0/942 [00:00<?, ?it/s]Train Epoch 1:  21%|██        | 200/942 [12:25<46:04,  3.73s/it, loss=0.241]Train Epoch 1:  42%|████▏     | 400/942 [24:50<33:39,  3.73s/it, loss=0.262]Train Epoch 1:  64%|██████▎   | 600/942 [37:14<21:13,  3.72s/it, loss=0.339]Train Epoch 1:  85%|████████▍ | 800/942 [49:39<08:48,  3.72s/it, loss=0.197]Train Epoch 1: 100%|██████████| 942/942 [58:27<00:00,  3.72s/it, loss=0.286]
O Token Predictions: 457692, NER token predictions: 46666
loss: 0.23743267873828072 w prec: 0.642246843284797 w recall: 0.5536785178839152 w f1: 0.5869654887673703
  0%|          | 0/153 [00:00<?, ?it/s]Validation Epoch 1: 100%|██████████| 153/153 [01:59<00:00,  1.29it/s, loss=0.223]
O Token Predictions: 67004, NER token predictions: 7896
loss: 0.19031139838150124 w prec: 0.6106517236837183 w recall: 0.6608245369448317 w f1: 0.6317247629054747
Adjusting learning rate of group 0 to 1.4400e-05.
  0%|          | 0/942 [00:00<?, ?it/s]Train Epoch 2:  21%|██        | 200/942 [12:25<46:06,  3.73s/it, loss=0.308]Train Epoch 2:  42%|████▏     | 400/942 [24:52<33:42,  3.73s/it, loss=0.0753]Train Epoch 2:  64%|██████▎   | 600/942 [37:15<21:13,  3.72s/it, loss=0.141] Train Epoch 2:  85%|████████▍ | 800/942 [49:41<08:49,  3.73s/it, loss=0.18] Train Epoch 2: 100%|██████████| 942/942 [58:29<00:00,  3.73s/it, loss=0.108]
O Token Predictions: 452542, NER token predictions: 51816
loss: 0.17240141229723796 w prec: 0.6995943496786741 w recall: 0.6429374598415079 w f1: 0.6678698690045989
  0%|          | 0/153 [00:00<?, ?it/s]Validation Epoch 2: 100%|██████████| 153/153 [01:58<00:00,  1.29it/s, loss=0.223]
O Token Predictions: 67555, NER token predictions: 7345
loss: 0.16576223912971472 w prec: 0.6909975902827813 w recall: 0.6671977693686517 w f1: 0.6759967434078691
Adjusting learning rate of group 0 to 1.1600e-05.
  0%|          | 0/942 [00:00<?, ?it/s]Train Epoch 3:  21%|██        | 200/942 [12:26<46:10,  3.73s/it, loss=0.164]Train Epoch 3:  42%|████▏     | 400/942 [24:47<33:34,  3.72s/it, loss=0.133]Train Epoch 3:  64%|██████▎   | 600/942 [37:08<21:08,  3.71s/it, loss=0.238]Train Epoch 3:  85%|████████▍ | 800/942 [49:28<08:46,  3.71s/it, loss=0.055]Train Epoch 3: 100%|██████████| 942/942 [58:13<00:00,  3.71s/it, loss=0.0853]
O Token Predictions: 450026, NER token predictions: 54332
loss: 0.13777431711601984 w prec: 0.7394564960771575 w recall: 0.7012208181623474 w f1: 0.718659603222537
  0%|          | 0/153 [00:00<?, ?it/s]Validation Epoch 3: 100%|██████████| 153/153 [02:00<00:00,  1.27it/s, loss=0.978]
O Token Predictions: 67201, NER token predictions: 7699
loss: 0.16132921111934326 w prec: 0.6829530578688375 w recall: 0.7128062139016133 w f1: 0.6956742392256572
Adjusting learning rate of group 0 to 8.8000e-06.
  0%|          | 0/942 [00:00<?, ?it/s]Train Epoch 4:  21%|██        | 200/942 [12:21<45:51,  3.71s/it, loss=0.0878]Train Epoch 4:  42%|████▏     | 400/942 [24:42<33:27,  3.70s/it, loss=0.122] Train Epoch 4:  64%|██████▎   | 600/942 [37:02<21:06,  3.70s/it, loss=0.0805]Train Epoch 4:  85%|████████▍ | 800/942 [49:23<08:45,  3.70s/it, loss=0.0553]Train Epoch 4: 100%|██████████| 942/942 [58:08<00:00,  3.70s/it, loss=0.0969]
O Token Predictions: 448629, NER token predictions: 55729
loss: 0.11438079141950405 w prec: 0.770823616445863 w recall: 0.7418344399228957 w f1: 0.7551456680915405
  0%|          | 0/153 [00:00<?, ?it/s]Validation Epoch 4: 100%|██████████| 153/153 [01:58<00:00,  1.29it/s, loss=0.24]
O Token Predictions: 67145, NER token predictions: 7755
loss: 0.15573614080941756 w prec: 0.720899538634842 w recall: 0.7211710814578769 w f1: 0.7191736846769018
Adjusting learning rate of group 0 to 6.0000e-06.
  0%|          | 0/942 [00:00<?, ?it/s]Train Epoch 5:  21%|██        | 200/942 [12:22<45:53,  3.71s/it, loss=0.0934]Train Epoch 5:  42%|████▏     | 400/942 [24:43<33:29,  3.71s/it, loss=0.0816]Train Epoch 5:  64%|██████▎   | 600/942 [37:05<21:08,  3.71s/it, loss=0.0971]Train Epoch 5:  85%|████████▍ | 800/942 [49:26<08:46,  3.71s/it, loss=0.151] Train Epoch 5: 100%|██████████| 942/942 [58:13<00:00,  3.71s/it, loss=0.0697]
O Token Predictions: 447725, NER token predictions: 56633
loss: 0.09880779274579161 w prec: 0.7914035913278309 w recall: 0.7704005140286999 w f1: 0.780219783818005
  0%|          | 0/153 [00:00<?, ?it/s]Validation Epoch 5: 100%|██████████| 153/153 [01:58<00:00,  1.29it/s, loss=0.117]
O Token Predictions: 66994, NER token predictions: 7906
loss: 0.15097037561578688 w prec: 0.7114692952948101 w recall: 0.7382991435968931 w f1: 0.7234172615803383
Adjusting learning rate of group 0 to 6.0000e-06.
  0%|          | 0/942 [00:00<?, ?it/s]Train Epoch 6:  21%|██        | 200/942 [12:22<45:55,  3.71s/it, loss=0.0335]Train Epoch 6:  42%|████▏     | 400/942 [24:42<33:28,  3.71s/it, loss=0.062] Train Epoch 6:  64%|██████▎   | 600/942 [37:03<21:07,  3.71s/it, loss=0.0799]Train Epoch 6:  85%|████████▍ | 800/942 [49:24<08:46,  3.70s/it, loss=0.0437]Train Epoch 6: 100%|██████████| 942/942 [58:10<00:00,  3.71s/it, loss=0.148] 
O Token Predictions: 447405, NER token predictions: 56953
loss: 0.0908111283029657 w prec: 0.8023543679818423 w recall: 0.7863300492610837 w f1: 0.7937841209593911
  0%|          | 0/153 [00:00<?, ?it/s]Validation Epoch 6: 100%|██████████| 153/153 [01:58<00:00,  1.29it/s, loss=0.00742]O Token Predictions: 66811, NER token predictions: 8089
loss: 0.15317525608112026 w prec: 0.7115102285753689 w recall: 0.7444732125074687 w f1: 0.7263377157668733
Adjusting learning rate of group 0 to 6.0000e-06.
Model evaluation


  0%|          | 0/196 [00:00<?, ?it/s]100%|██████████| 196/196 [02:38<00:00,  1.24it/s]
/home/9_QuAnTuM_6/cdaniel/venv_syntrans/lib/python3.8/site-packages/seqeval/metrics/v1.py:57: UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
***** Test results *****
              precision    recall  f1-score   support

    CARDINAL     0.7133    0.6503    0.6803       612
        DATE     0.6922    0.7254    0.7084      1045
       EVENT     0.4429    0.3875    0.4133        80
         FAC     0.3390    0.3974    0.3659       151
         GPE     0.8456    0.8714    0.8583      1936
    LANGUAGE     0.5135    0.2468    0.3333        77
         LAW     0.4130    0.3333    0.3689        57
         LOC     0.5934    0.4977    0.5414       217
       MONEY     0.5370    0.4754    0.5043        61
        NORP     0.6211    0.7536    0.6809       422
     ORDINAL     0.8208    0.8304    0.8256       171
         ORG     0.5289    0.5869    0.5564       857
     PERCENT     0.3333    0.4722    0.3908        36
      PERSON     0.7192    0.7885    0.7523      1371
     PRODUCT     0.2705    0.3367    0.3000        98
    QUANTITY     0.3485    0.4340    0.3866        53
        SEP]     0.0000    0.0000    0.0000         0
        TIME     0.6071    0.6355    0.6210       214
 WORK_OF_ART     0.3000    0.2538    0.2750       130

   micro avg     0.6487    0.7110    0.6784      7588
   macro avg     0.5073    0.5093    0.5033      7588
weighted avg     0.6821    0.7110    0.6946      7588

 Special token predictions: 0
  0%|          | 0/942 [00:00<?, ?it/s]Train Epoch 7:  21%|██        | 200/942 [12:22<45:52,  3.71s/it, loss=0.059]Train Epoch 7:  42%|████▏     | 400/942 [24:43<33:29,  3.71s/it, loss=0.0622]Train Epoch 7:  64%|██████▎   | 600/942 [37:05<21:08,  3.71s/it, loss=0.0745]Train Epoch 7:  85%|████████▍ | 800/942 [49:24<08:45,  3.70s/it, loss=0.0572]Train Epoch 7: 100%|██████████| 942/942 [58:10<00:00,  3.70s/it, loss=0.0917]
O Token Predictions: 447026, NER token predictions: 57332
loss: 0.0834173557759834 w prec: 0.815729338081138 w recall: 0.8016705932747912 w f1: 0.8082084594984448
  0%|          | 0/153 [00:00<?, ?it/s]Validation Epoch 7: 100%|██████████| 153/153 [01:58<00:00,  1.29it/s, loss=0.00128]O Token Predictions: 67004, NER token predictions: 7896
loss: 0.15407624415249802 w prec: 0.7284351035722969 w recall: 0.7444732125074687 w f1: 0.7346851885936531
Adjusting learning rate of group 0 to 6.0000e-06.
Model evaluation


  0%|          | 0/196 [00:00<?, ?it/s]100%|██████████| 196/196 [02:38<00:00,  1.24it/s]
***** Test results *****
              precision    recall  f1-score   support

    CARDINAL     0.7030    0.6225    0.6603       612
        DATE     0.7036    0.7177    0.7106      1045
       EVENT     0.4133    0.3875    0.4000        80
         FAC     0.3661    0.4437    0.4012       151
         GPE     0.8721    0.8667    0.8694      1936
    LANGUAGE     0.5758    0.2468    0.3455        77
         LAW     0.3621    0.3684    0.3652        57
         LOC     0.4978    0.5115    0.5045       217
       MONEY     0.5849    0.5082    0.5439        61
        NORP     0.6927    0.7156    0.7040       422
     ORDINAL     0.8035    0.8129    0.8081       171
         ORG     0.5158    0.5893    0.5501       857
     PERCENT     0.3878    0.5278    0.4471        36
      PERSON     0.7476    0.7994    0.7726      1371
     PRODUCT     0.2742    0.3469    0.3063        98
    QUANTITY     0.3443    0.3962    0.3684        53
        SEP]     0.0000    0.0000    0.0000         0
        TIME     0.5816    0.6495    0.6137       214
 WORK_OF_ART     0.3544    0.2154    0.2679       130

   micro avg     0.6680    0.7080    0.6874      7588
   macro avg     0.5148    0.5119    0.5073      7588
weighted avg     0.6955    0.7080    0.6998      7588

 Special token predictions: 0
  0%|          | 0/942 [00:00<?, ?it/s]Train Epoch 8:  21%|██        | 200/942 [12:22<45:54,  3.71s/it, loss=0.226]Train Epoch 8:  42%|████▏     | 400/942 [24:42<33:28,  3.71s/it, loss=0.057]Train Epoch 8:  64%|██████▎   | 600/942 [37:03<21:06,  3.70s/it, loss=0.0886]Train Epoch 8:  85%|████████▍ | 800/942 [49:23<08:45,  3.70s/it, loss=0.136] Train Epoch 8: 100%|██████████| 942/942 [58:08<00:00,  3.70s/it, loss=0.0431]
O Token Predictions: 446823, NER token predictions: 57535
loss: 0.07700805802633807 w prec: 0.8281029278359268 w recall: 0.8139590918826302 w f1: 0.8206010130635916
  0%|          | 0/153 [00:00<?, ?it/s]Validation Epoch 8: 100%|██████████| 153/153 [01:58<00:00,  1.29it/s, loss=0.134]O Token Predictions: 66994, NER token predictions: 7906
loss: 0.1566143021888398 w prec: 0.7311165058086735 w recall: 0.7394941246763593 w f1: 0.7335244139037927
Adjusting learning rate of group 0 to 6.0000e-06.
Model evaluation


  0%|          | 0/196 [00:00<?, ?it/s]100%|██████████| 196/196 [02:38<00:00,  1.24it/s]
***** Test results *****
              precision    recall  f1-score   support

    CARDINAL     0.7269    0.6307    0.6754       612
        DATE     0.6856    0.7053    0.6953      1045
       EVENT     0.4286    0.4500    0.4390        80
         FAC     0.3454    0.4437    0.3884       151
         GPE     0.8709    0.8574    0.8641      1936
    LANGUAGE     0.5758    0.2468    0.3455        77
         LAW     0.4314    0.3860    0.4074        57
         LOC     0.5829    0.4700    0.5204       217
       MONEY     0.5085    0.4918    0.5000        61
        NORP     0.7023    0.7156    0.7089       422
     ORDINAL     0.8258    0.8596    0.8424       171
         ORG     0.5300    0.5776    0.5528       857
     PERCENT     0.4255    0.5556    0.4819        36
      PERSON     0.7398    0.7841    0.7613      1371
     PRODUCT     0.2975    0.3673    0.3288        98
    QUANTITY     0.3284    0.4151    0.3667        53
        SEP]     0.0000    0.0000    0.0000         0
        TIME     0.5586    0.6682    0.6085       214
 WORK_OF_ART     0.3010    0.2385    0.2661       130

   micro avg     0.6607    0.7024    0.6809      7588
   macro avg     0.5192    0.5191    0.5133      7588
weighted avg     0.6968    0.7024    0.6977      7588

 Special token predictions: 0
  0%|          | 0/942 [00:00<?, ?it/s]Train Epoch 9:  21%|██        | 200/942 [12:22<45:54,  3.71s/it, loss=0.193]Train Epoch 9:  42%|████▏     | 400/942 [24:42<33:28,  3.71s/it, loss=0.187]Train Epoch 9:  64%|██████▎   | 600/942 [37:02<21:05,  3.70s/it, loss=0.0339]Train Epoch 9:  85%|████████▍ | 800/942 [49:23<08:45,  3.70s/it, loss=0.214] Train Epoch 9: 100%|██████████| 942/942 [58:09<00:00,  3.70s/it, loss=0.0648]
O Token Predictions: 446643, NER token predictions: 57715
loss: 0.07239252862554131 w prec: 0.8371405381680805 w recall: 0.8260066395373742 w f1: 0.8311608710414287
  0%|          | 0/153 [00:00<?, ?it/s]Validation Epoch 9: 100%|██████████| 153/153 [01:58<00:00,  1.29it/s, loss=0.145]O Token Predictions: 67003, NER token predictions: 7897
loss: 0.15688133854540734 w prec: 0.7230917519740736 w recall: 0.7476598287193786 w f1: 0.7338068025133403
Adjusting learning rate of group 0 to 6.0000e-06.
Model evaluation


  0%|          | 0/196 [00:00<?, ?it/s]100%|██████████| 196/196 [02:38<00:00,  1.24it/s]***** Test results *****
              precision    recall  f1-score   support

    CARDINAL     0.6809    0.6520    0.6661       612
        DATE     0.6870    0.7225    0.7043      1045
       EVENT     0.3977    0.4375    0.4167        80
         FAC     0.3404    0.4238    0.3776       151
         GPE     0.8799    0.8549    0.8672      1936
    LANGUAGE     0.4906    0.3377    0.4000        77
         LAW     0.4062    0.4561    0.4298        57
         LOC     0.5000    0.5023    0.5011       217
       MONEY     0.5161    0.5246    0.5203        61
        NORP     0.6817    0.7512    0.7148       422
     ORDINAL     0.8276    0.8421    0.8348       171
         ORG     0.5455    0.5741    0.5594       857
     PERCENT     0.5476    0.6389    0.5897        36
      PERSON     0.7531    0.7943    0.7732      1371
     PRODUCT     0.2937    0.4286    0.3485        98
    QUANTITY     0.3492    0.4151    0.3793        53
        SEP]     0.0000    0.0000    0.0000         0
        TIME     0.5748    0.6822    0.6239       214
 WORK_OF_ART     0.2963    0.2462    0.2689       130

   micro avg     0.6736    0.7127    0.6926      7588
   macro avg     0.5141    0.5413    0.5250      7588
weighted avg     0.6958    0.7127    0.7033      7588

 Special token predictions: 0
Test Data
Loading NER labels from ./data/ud/**/*-test-orig.ner
en_atis-ud-test-orig.ner
num sentences: 586
en_cesl-ud-test-orig.ner
num sentences: 500
en_ewt-ud-test-orig.ner
num sentences: 1955
en_gum-ud-test-orig.ner
num sentences: 851
en_lines-ud-test-orig.ner
num sentences: 988
en_pud-ud-test-orig.ner
num sentences: 973
en_partut-ud-test-orig.ner
num sentences: 149
en_pronouns-ud-test-orig.ner

num sentences: 265
Example of NER labels: [[['what', 'O'], ['are', 'O'], ['the', 'O'], ['coach', 'O'], ['flights', 'O'], ['between', 'O'], ['dallas', 'S-GPE'], ['and', 'O'], ['baltimore', 'S-GPE'], ['leaving', 'O'], ['august', 'B-DATE'], ['tenth', 'E-DATE'], ['and', 'O'], ['returning', 'O'], ['august', 'B-DATE'], ['twelve', 'E-DATE']], [['i', 'O'], ['want', 'O'], ['a', 'O'], ['flight', 'O'], ['from', 'O'], ['nashville', 'S-GPE'], ['to', 'O'], ['seattle', 'S-GPE'], ['that', 'O'], ['arrives', 'O'], ['no', 'O'], ['later', 'O'], ['than', 'O'], ['3', 'B-TIME'], ['pm', 'E-TIME']]]
6267 sentences, 196 batches of size 32

Control example of InputFeatures
Input Ids: [101, 1045, 2215, 1037, 3462, 2013, 8423, 2000, 5862, 2008, 8480, 2053, 2101, 2084, 1017, 7610, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Input Mask: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Label Ids: [77, 1, 1, 1, 1, 1, 16, 1, 16, 1, 1, 1, 1, 1, 28, 30, 78, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Valid Ids: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
Label Mask: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Segment Ids: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
  0%|          | 0/196 [00:00<?, ?it/s]100%|██████████| 196/196 [02:38<00:00,  1.24it/s]