Dr. Jorge Abreu Vicente commited on
Commit
3febf7d
1 Parent(s): cbb69b4

update model card README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -48
README.md CHANGED
@@ -21,16 +21,13 @@ model-index:
21
  metrics:
22
  - name: Precision
23
  type: precision
24
- value: 0.9243747400938632
25
  - name: Recall
26
  type: recall
27
- value: 0.9284563518109672
28
  - name: F1
29
  type: f1
30
- value: 0.9264110502500595
31
- widget:
32
- - text: "XPT of siRNA treated [MASK] cells after 48 hours of knockdown. Treated cells were fed with the indicated amounts of C8L peptid conjugated to iron oxide beads via a disulfide bond. The cells were then exposed to RF33. 70-Luc Reporter [MASK] T cells overnight. Error bars show SD of >3 replicate wells. * p<0.05 for siRNA vs control [MASK] using two-way ANOVA. Representative plot of 3 independent experiments."
33
- - text: "The [MASK] intensity along the line across a lipid droplet in (A) was measured by ImageJ.The lipid droplet localization of [MASK]-[MASK], represented by two peaks, is clearly visible in fat cells from ppl > [MASK] larvae , but it is lost in fat cells from ppl > [MASK] larvae with [MASK] RNAi or overexpression of [MASK]/[MASK]. More than 30 lipid droplets of each genotype were measured. One typical image curve is shown for each genotype."
34
  ---
35
 
36
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -40,36 +37,23 @@ should probably proofread and complete it, then remove this comment. -->
40
 
41
  This model is a fine-tuned version of [michiyasunaga/BioLinkBERT-large](https://huggingface.co/michiyasunaga/BioLinkBERT-large) on the source_data_nlp dataset.
42
  It achieves the following results on the evaluation set:
43
- - Loss: 0.0118
44
- - Accuracy Score: 0.9959
45
- - Precision: 0.9244
46
- - Recall: 0.9285
47
- - F1: 0.9264
48
 
49
  ## Model description
50
 
51
- The generation of this model is explained in more detail in Abreu-Vicente & Lemberger (in prep).
52
- The model is fine-tuned from [michiyasunaga/BioLinkBERT-large](https://huggingface.co/michiyasunaga/BioLinkBERT-large).
53
- The use of [michiyasunaga/BioLinkBERT-large](https://huggingface.co/michiyasunaga/BioLinkBERT-large) was decided after proceeding to the analysis of 14 different models
54
- in the [SourceData](https://huggingface.co/datasets/EMBO/sd-nlp-non-tokenized) dataset.
55
-
56
- ### The SourceData dataset
57
-
58
- This dataset is based on the content of the SourceData (https://sourcedata.embo.org) database, which contains manually annotated figure legends written in English and extracted from scientific papers in the domain of cell and molecular biology (Liechti et al, Nature Methods, 2017, https://doi.org/10.1038/nmeth.4471). Unlike the dataset sd-nlp, pre-tokenized with the roberta-base tokenizer, this dataset is not previously tokenized, but just splitted into words. Users can therefore use it to fine-tune other models. Additional details at https://github.com/source-data/soda-roberta
59
-
60
- The dataset in the 🤗 Hub is just a processed version of the entire annotated dataset that is presented also in Abreu-Vicente & Lemberger (in prep).
61
- Further details on the entire dataset can be found in the [BCVI BIO-ID track](https://biocreative.bioinformatics.udel.edu/resources/corpora/bcvi-bio-id-track/) task associated.
62
-
63
- This model is fine-tuned in the biological `GENEPROD_ROLES` task. On it, gene products are masked and the goal of the model is to classify them as `CONTROLLED_VAR` or `MEASURED_VAR`.
64
- The performance of the model is similar in both classes.
65
 
66
  ## Intended uses & limitations
67
 
68
- The intended use of this model is to infer the semantic role of gene products (genes and proteins) with regard to the causal hypotheses tested in experiments reported in scientific papers. Although the model could be trained without masking the entities, its performance is considerable better (f1 >~ 0.2) when they are masked. This involves a prior step that is running a NER model to identify gene products.
69
 
70
  ## Training and evaluation data
71
 
72
- The training, evaluation, and test splits of the data used can be found in [SourceData dataset](https://huggingface.co/datasets/EMBO/sd-nlp-non-tokenized).
73
 
74
  ## Training procedure
75
 
@@ -82,34 +66,18 @@ The following hyperparameters were used during training:
82
  - seed: 42
83
  - optimizer: Adafactor
84
  - lr_scheduler_type: linear
85
- - num_epochs: 2.0
86
 
87
  ### Training results
88
 
89
  | Training Loss | Epoch | Step | Validation Loss | Accuracy Score | Precision | Recall | F1 |
90
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:---------:|:------:|:------:|
91
- | 0.0115 | 1.0 | 2066 | 0.0126 | 0.9955 | 0.9130 | 0.9216 | 0.9173 |
92
- | 0.0074 | 2.0 | 4132 | 0.0118 | 0.9959 | 0.9244 | 0.9285 | 0.9264 |
93
-
94
- ### Test results
95
-
96
- ```
97
- precision recall f1-score support
98
-
99
- CONTROLLED_VAR 0.91 0.93 0.92 7241
100
- MEASURED_VAR 0.94 0.93 0.93 8720
101
-
102
- micro avg 0.93 0.93 0.93 15961
103
- macro avg 0.92 0.93 0.93 15961
104
- weighted avg 0.93 0.93 0.93 15961
105
-
106
- {'test_loss': 0.011081044562160969, 'test_accuracy_score': 0.9962086330220685, 'test_precision': 0.925242960378769, 'test_recall': 0.9305181379612806, 'test_f1': 0.9278730515727985, 'test_runtime': 87.5388, 'test_samples_per_second': 93.958, 'test
107
- _steps_per_second': 0.377}
108
- ```
109
 
110
  ### Framework versions
111
 
112
- - Transformers 4.15.0
113
  - Pytorch 1.11.0a0+bfe5ad2
114
  - Datasets 1.17.0
115
- - Tokenizers 0.10.3
 
21
  metrics:
22
  - name: Precision
23
  type: precision
24
+ value: 0.9218777784363701
25
  - name: Recall
26
  type: recall
27
+ value: 0.9280386657915151
28
  - name: F1
29
  type: f1
30
+ value: 0.9249479631281595
 
 
 
31
  ---
32
 
33
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
37
 
38
  This model is a fine-tuned version of [michiyasunaga/BioLinkBERT-large](https://huggingface.co/michiyasunaga/BioLinkBERT-large) on the source_data_nlp dataset.
39
  It achieves the following results on the evaluation set:
40
+ - Loss: 0.0141
41
+ - Accuracy Score: 0.9950
42
+ - Precision: 0.9219
43
+ - Recall: 0.9280
44
+ - F1: 0.9249
45
 
46
  ## Model description
47
 
48
+ More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
49
 
50
  ## Intended uses & limitations
51
 
52
+ More information needed
53
 
54
  ## Training and evaluation data
55
 
56
+ More information needed
57
 
58
  ## Training procedure
59
 
 
66
  - seed: 42
67
  - optimizer: Adafactor
68
  - lr_scheduler_type: linear
69
+ - num_epochs: 1.0
70
 
71
  ### Training results
72
 
73
  | Training Loss | Epoch | Step | Validation Loss | Accuracy Score | Precision | Recall | F1 |
74
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:---------:|:------:|:------:|
75
+ | 0.0129 | 1.0 | 1569 | 0.0141 | 0.9950 | 0.9219 | 0.9280 | 0.9249 |
76
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77
 
78
  ### Framework versions
79
 
80
+ - Transformers 4.20.0
81
  - Pytorch 1.11.0a0+bfe5ad2
82
  - Datasets 1.17.0
83
+ - Tokenizers 0.12.1