EMBO
/

sd-ner

Token Classification Transformers PyTorch JAX

roberta token classification Inference Endpoints

Model card Files Files and versions Community

tlemberger commited on Mar 26, 2022

Commit

0b28f00

•

1 Parent(s): 9853dfd

update with new training params and perf

Browse files

Files changed (1) hide show

README.md +19 -18

README.md CHANGED Viewed

@@ -51,15 +51,14 @@ The training was run on a NVIDIA DGX Station with 4XTesla V100 GPUs.
 Training code is available at https://github.com/source-data/soda-roberta
-- Command: `python -m tokcl.train NER  --num_train_epochs=3.5`
 - Tokenizer vocab size: 50265
 - Training data: EMBO/sd-nlp NER
-- Training with 31410 examples.
-- Evaluating on 8861 examples.
 - Training on 15 features: O, I-SMALL_MOLECULE, B-SMALL_MOLECULE, I-GENEPROD, B-GENEPROD, I-SUBCELLULAR, B-SUBCELLULAR, I-CELL, B-CELL, I-TISSUE, B-TISSUE, I-ORGANISM, B-ORGANISM, I-EXP_ASSAY, B-EXP_ASSAY
-- Epochs: 3.5
-- `per_device_train_batch_size`: 32
-- `per_device_eval_batch_size`: 32
 - `learning_rate`: 0.0001
 - `weight_decay`: 0.0
 - `adam_beta1`: 0.9
@@ -69,20 +68,22 @@ Training code is available at https://github.com/source-data/soda-roberta
 ## Eval results
-Testing on 4224 examples of test set with `sklearn.metrics`:
 ```
                 precision    recall  f1-score   support
-          CELL       0.77      0.81      0.79      3477
-     EXP_ASSAY       0.71      0.70      0.71      7049
-      GENEPROD       0.86      0.90      0.88     16140
-      ORGANISM       0.80      0.82      0.81      2759
-SMALL_MOLECULE       0.78      0.82      0.80      4446
-   SUBCELLULAR       0.71      0.75      0.73      2125
-        TISSUE       0.70      0.75      0.73      1971
-     micro avg       0.79      0.82      0.81     37967
-     macro avg       0.76      0.79      0.78     37967
-  weighted avg       0.79      0.82      0.81     37967
 ```

 Training code is available at https://github.com/source-data/soda-roberta
 - Tokenizer vocab size: 50265
 - Training data: EMBO/sd-nlp NER
+- Training with 48771 examples.
+- Evaluating on 13801 examples.
 - Training on 15 features: O, I-SMALL_MOLECULE, B-SMALL_MOLECULE, I-GENEPROD, B-GENEPROD, I-SUBCELLULAR, B-SUBCELLULAR, I-CELL, B-CELL, I-TISSUE, B-TISSUE, I-ORGANISM, B-ORGANISM, I-EXP_ASSAY, B-EXP_ASSAY
+- Epochs: 0.6
+- `per_device_train_batch_size`: 16
+- `per_device_eval_batch_size`: 16
 - `learning_rate`: 0.0001
 - `weight_decay`: 0.0
 - `adam_beta1`: 0.9
 ## Eval results
+Testing on 7178 examples of test set with `sklearn.metrics`:
 ```
                 precision    recall  f1-score   support
+          CELL       0.69      0.81      0.74      5245
+     EXP_ASSAY       0.56      0.57      0.56     10067
+      GENEPROD       0.77      0.89      0.82     23587
+      ORGANISM       0.72      0.82      0.77      3623
+SMALL_MOLECULE       0.70      0.80      0.75      6187
+   SUBCELLULAR       0.65      0.72      0.69      3700
+        TISSUE       0.62      0.73      0.67      3207
+     micro avg       0.70      0.79      0.74     55616
+     macro avg       0.67      0.77      0.72     55616
+  weighted avg       0.70      0.79      0.74     55616
+{'test_loss': 0.1830928772687912, 'test_accuracy_score': 0.9334821000160841, 'test_precision': 0.6987463009514112, 'test_recall': 0.789682825086306, 'test_f1': 0.7414366506288511, 'test_runtime': 61.0547, 'test_samples_per_second': 117.567, 'test_steps_per_second': 1.851}
 ```