Thomas Lemberger commited on
Commit
fe92f13
1 Parent(s): 15ca687
Files changed (1) hide show
  1. README.md +10 -10
README.md CHANGED
@@ -30,7 +30,7 @@ To have a quick check of the model:
30
 
31
  ```python
32
  from transformers import pipeline, RobertaTokenizerFast, RobertaForTokenClassification
33
- example = """<s> Figure 1.Genetic screens reveal that ZBTB7A and ZBTB7B regulate ATXN1, and ZBTB7B shows more pronounced effect. (A) Western blot analysis of Atxn1 levels in cerebellar granule neurons (CGNs) after knockdown of Zbtb7a and/or Zbtb7b (n = 3).(B) Genetic interaction of wild‐type human ATXN1(30Q) with either ZBTB7A or ZBTB7B expressed in the Drosophila eyes. Co‐expression of ATXN1(30Q) with ZBTB7A or ZBTB7B severely disrupted the external Drosophila eye structure and increased ATXN1 levels. Scale bar: 100 μm. </s>"""
34
  tokenizer = RobertaTokenizerFast.from_pretrained('roberta-base', max_len=512)
35
  model = RobertaForTokenClassification.from_pretrained('EMBO/sd-panels')
36
  ner = pipeline('ner', model, tokenizer=tokenizer)
@@ -52,13 +52,13 @@ The training was run on a NVIDIA DGX Station with 4XTesla V100 GPUs.
52
 
53
  Training code is available at https://github.com/source-data/soda-roberta
54
 
55
- - Command: `python -m tokcl.train PANELIZATION --num_train_epochs=6`
56
  - Tokenizer vocab size: 50265
57
  - Training data: EMBO/sd-nlp NER
58
- - Training with 8234 examples.
59
- - Evaluating on 2338 examples.
60
  - Training on 2 features: `O`, `B-PANEL_START`
61
- - Epochs: 6.0
62
  - `per_device_train_batch_size`: 32
63
  - `per_device_eval_batch_size`: 32
64
  - `learning_rate`: 0.0001
@@ -70,14 +70,14 @@ Training code is available at https://github.com/source-data/soda-roberta
70
 
71
  ## Eval results
72
 
73
- Testing on 1131 examples from test set with `sklearn.metrics`:
74
 
75
  ```
76
  precision recall f1-score support
77
 
78
- PANEL_START 0.92 0.95 0.94 3285
79
 
80
- micro avg 0.92 0.95 0.94 3285
81
- macro avg 0.92 0.95 0.94 3285
82
- weighted avg 0.92 0.95 0.94 3285
83
  ```
 
30
 
31
  ```python
32
  from transformers import pipeline, RobertaTokenizerFast, RobertaForTokenClassification
33
+ example = """Fig 4. a, Volume density of early (Avi) and late (Avd) autophagic vacuoles.a, Volume density of early (Avi) and late (Avd) autophagic vacuoles from four independent cultures. Examples of Avi and Avd are shown in b and c, respectively. Bars represent 0.4����m. d, Labelling density of cathepsin-D as estimated in two independent experiments. e, Labelling density of LAMP-1."""
34
  tokenizer = RobertaTokenizerFast.from_pretrained('roberta-base', max_len=512)
35
  model = RobertaForTokenClassification.from_pretrained('EMBO/sd-panels')
36
  ner = pipeline('ner', model, tokenizer=tokenizer)
 
52
 
53
  Training code is available at https://github.com/source-data/soda-roberta
54
 
55
+ - Command: `python -m tokcl.train PANELIZATION --num_train_epochs=10`
56
  - Tokenizer vocab size: 50265
57
  - Training data: EMBO/sd-nlp NER
58
+ - TTraining with 2175 examples.
59
+ - Evaluating on 622 examples.
60
  - Training on 2 features: `O`, `B-PANEL_START`
61
+ - Epochs: 10.0
62
  - `per_device_train_batch_size`: 32
63
  - `per_device_eval_batch_size`: 32
64
  - `learning_rate`: 0.0001
 
70
 
71
  ## Eval results
72
 
73
+ Testing on 337 examples from test set with `sklearn.metrics`:
74
 
75
  ```
76
  precision recall f1-score support
77
 
78
+ PANEL_START 0.88 0.97 0.92 785
79
 
80
+ micro avg 0.88 0.97 0.92 785
81
+ macro avg 0.88 0.97 0.92 785
82
+ weighted avg 0.88 0.97 0.92 785
83
  ```