tlemberger commited on
Commit
0b28f00
1 Parent(s): 9853dfd

update with new training params and perf

Browse files
Files changed (1) hide show
  1. README.md +19 -18
README.md CHANGED
@@ -51,15 +51,14 @@ The training was run on a NVIDIA DGX Station with 4XTesla V100 GPUs.
51
 
52
  Training code is available at https://github.com/source-data/soda-roberta
53
 
54
- - Command: `python -m tokcl.train NER --num_train_epochs=3.5`
55
  - Tokenizer vocab size: 50265
56
  - Training data: EMBO/sd-nlp NER
57
- - Training with 31410 examples.
58
- - Evaluating on 8861 examples.
59
  - Training on 15 features: O, I-SMALL_MOLECULE, B-SMALL_MOLECULE, I-GENEPROD, B-GENEPROD, I-SUBCELLULAR, B-SUBCELLULAR, I-CELL, B-CELL, I-TISSUE, B-TISSUE, I-ORGANISM, B-ORGANISM, I-EXP_ASSAY, B-EXP_ASSAY
60
- - Epochs: 3.5
61
- - `per_device_train_batch_size`: 32
62
- - `per_device_eval_batch_size`: 32
63
  - `learning_rate`: 0.0001
64
  - `weight_decay`: 0.0
65
  - `adam_beta1`: 0.9
@@ -69,20 +68,22 @@ Training code is available at https://github.com/source-data/soda-roberta
69
 
70
  ## Eval results
71
 
72
- Testing on 4224 examples of test set with `sklearn.metrics`:
73
 
74
  ```
75
  precision recall f1-score support
76
 
77
- CELL 0.77 0.81 0.79 3477
78
- EXP_ASSAY 0.71 0.70 0.71 7049
79
- GENEPROD 0.86 0.90 0.88 16140
80
- ORGANISM 0.80 0.82 0.81 2759
81
- SMALL_MOLECULE 0.78 0.82 0.80 4446
82
- SUBCELLULAR 0.71 0.75 0.73 2125
83
- TISSUE 0.70 0.75 0.73 1971
84
-
85
- micro avg 0.79 0.82 0.81 37967
86
- macro avg 0.76 0.79 0.78 37967
87
- weighted avg 0.79 0.82 0.81 37967
 
 
88
  ```
51
 
52
  Training code is available at https://github.com/source-data/soda-roberta
53
 
 
54
  - Tokenizer vocab size: 50265
55
  - Training data: EMBO/sd-nlp NER
56
+ - Training with 48771 examples.
57
+ - Evaluating on 13801 examples.
58
  - Training on 15 features: O, I-SMALL_MOLECULE, B-SMALL_MOLECULE, I-GENEPROD, B-GENEPROD, I-SUBCELLULAR, B-SUBCELLULAR, I-CELL, B-CELL, I-TISSUE, B-TISSUE, I-ORGANISM, B-ORGANISM, I-EXP_ASSAY, B-EXP_ASSAY
59
+ - Epochs: 0.6
60
+ - `per_device_train_batch_size`: 16
61
+ - `per_device_eval_batch_size`: 16
62
  - `learning_rate`: 0.0001
63
  - `weight_decay`: 0.0
64
  - `adam_beta1`: 0.9
68
 
69
  ## Eval results
70
 
71
+ Testing on 7178 examples of test set with `sklearn.metrics`:
72
 
73
  ```
74
  precision recall f1-score support
75
 
76
+ CELL 0.69 0.81 0.74 5245
77
+ EXP_ASSAY 0.56 0.57 0.56 10067
78
+ GENEPROD 0.77 0.89 0.82 23587
79
+ ORGANISM 0.72 0.82 0.77 3623
80
+ SMALL_MOLECULE 0.70 0.80 0.75 6187
81
+ SUBCELLULAR 0.65 0.72 0.69 3700
82
+ TISSUE 0.62 0.73 0.67 3207
83
+
84
+ micro avg 0.70 0.79 0.74 55616
85
+ macro avg 0.67 0.77 0.72 55616
86
+ weighted avg 0.70 0.79 0.74 55616
87
+
88
+ {'test_loss': 0.1830928772687912, 'test_accuracy_score': 0.9334821000160841, 'test_precision': 0.6987463009514112, 'test_recall': 0.789682825086306, 'test_f1': 0.7414366506288511, 'test_runtime': 61.0547, 'test_samples_per_second': 117.567, 'test_steps_per_second': 1.851}
89
  ```