EMBO
/

sd-panelization

@@ -30,7 +30,7 @@ To have a quick check of the model:
 ```python
 from transformers import pipeline, RobertaTokenizerFast, RobertaForTokenClassification
-example = """<s> Figure 1.Genetic screens reveal that ZBTB7A and ZBTB7B regulate ATXN1, and ZBTB7B shows more pronounced effect. (A) Western blot analysis of Atxn1 levels in cerebellar granule neurons (CGNs) after knockdown of Zbtb7a and/or Zbtb7b (n = 3).(B) Genetic interaction of wild‐type human ATXN1(30Q) with either ZBTB7A or ZBTB7B expressed in the Drosophila eyes. Co‐expression of ATXN1(30Q) with ZBTB7A or ZBTB7B severely disrupted the external Drosophila eye structure and increased ATXN1 levels. Scale bar: 100 μm. </s>"""
 tokenizer = RobertaTokenizerFast.from_pretrained('roberta-base', max_len=512)
 model = RobertaForTokenClassification.from_pretrained('EMBO/sd-panels')
 ner = pipeline('ner', model, tokenizer=tokenizer)
@@ -52,13 +52,13 @@ The training was run on a NVIDIA DGX Station with 4XTesla V100 GPUs.
 Training code is available at https://github.com/source-data/soda-roberta
-- Command: `python -m tokcl.train PANELIZATION --num_train_epochs=6`
 - Tokenizer vocab size: 50265
 - Training data: EMBO/sd-nlp NER
-- Training with 8234 examples.
-- Evaluating on 2338 examples.
 - Training on 2 features: `O`, `B-PANEL_START`
-- Epochs: 6.0
 - `per_device_train_batch_size`: 32
 - `per_device_eval_batch_size`: 32
 - `learning_rate`: 0.0001
@@ -70,14 +70,14 @@ Training code is available at https://github.com/source-data/soda-roberta
 ## Eval results
-Testing on 1131 examples from test set with `sklearn.metrics`:
 ```
               precision    recall  f1-score   support
- PANEL_START       0.92      0.95      0.94      3285
-   micro avg       0.92      0.95      0.94      3285
-   macro avg       0.92      0.95      0.94      3285
-weighted avg       0.92      0.95      0.94      3285
 ```

 ```python
 from transformers import pipeline, RobertaTokenizerFast, RobertaForTokenClassification
+example = """Fig 4. a, Volume density of early (Avi) and late (Avd) autophagic vacuoles.a, Volume density of early (Avi) and late (Avd) autophagic vacuoles from four independent cultures. Examples of Avi and Avd are shown in b and c, respectively. Bars represent 0.4����m. d, Labelling density of cathepsin-D as estimated in two independent experiments. e, Labelling density of LAMP-1."""
 tokenizer = RobertaTokenizerFast.from_pretrained('roberta-base', max_len=512)
 model = RobertaForTokenClassification.from_pretrained('EMBO/sd-panels')
 ner = pipeline('ner', model, tokenizer=tokenizer)
 Training code is available at https://github.com/source-data/soda-roberta
+- Command: `python -m tokcl.train PANELIZATION --num_train_epochs=10`
 - Tokenizer vocab size: 50265
 - Training data: EMBO/sd-nlp NER
+- TTraining with 2175 examples.
+- Evaluating on 622 examples.
 - Training on 2 features: `O`, `B-PANEL_START`
+- Epochs: 10.0
 - `per_device_train_batch_size`: 32
 - `per_device_eval_batch_size`: 32
 - `learning_rate`: 0.0001
 ## Eval results
+Testing on 337 examples from test set with `sklearn.metrics`:
 ```
               precision    recall  f1-score   support
+ PANEL_START       0.88      0.97      0.92       785
+   micro avg       0.88      0.97      0.92       785
+   macro avg       0.88      0.97      0.92       785
+weighted avg       0.88      0.97      0.92       785
 ```