Li commited on
Commit
cbbb66d
1 Parent(s): fead6e5

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -0
README.md CHANGED
@@ -20,6 +20,90 @@ Bioformer-8L was pre-trained from scratch on the same corpus as the vocabulary (
20
 
21
  Pre-training of Bioformer-8L was performed on a single Cloud TPU device (TPUv2, 8 cores, 8GB memory per core). The maximum input sequence length was fixed to 512, and the batch size was set to 256. We pre-trained Bioformer-8L for 2 million steps, which took about 8.3 days.
22
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
  ## Awards
25
  Bioformer-8L achieved top performance (highest micro-F1 score) in the BioCreative VII COVID-19 multi-label topic classification challenge (https://doi.org/10.1093/database/baac069)
 
20
 
21
  Pre-training of Bioformer-8L was performed on a single Cloud TPU device (TPUv2, 8 cores, 8GB memory per core). The maximum input sequence length was fixed to 512, and the batch size was set to 256. We pre-trained Bioformer-8L for 2 million steps, which took about 8.3 days.
22
 
23
+ ## Usage
24
+
25
+ Prerequisites: python3, pytorch, transformers and datasets
26
+
27
+ We have tested the following commands on Python v3.9.16, PyTorch v1.13.1+cu117, Datasets v2.9.0 and Transformers v4.26.
28
+
29
+ To install pytorch, please refer to instructions [here](https://pytorch.org/get-started/locally).
30
+
31
+ To install the `transformers` and `datasets` library:
32
+ ```
33
+ pip install transformers
34
+ pip install datasets
35
+ ```
36
+
37
+ ### Filling mask
38
+
39
+ ```
40
+ from transformers import pipeline
41
+ unmasker8L = pipeline('fill-mask', model='bioformers/bioformer-8L')
42
+ unmasker8L("[MASK] refers to a group of diseases that affect how the body uses blood sugar (glucose)")
43
+
44
+ unmasker16L = pipeline('fill-mask', model='bioformers/bioformer-16L')
45
+ unmasker16L("[MASK] refers to a group of diseases that affect how the body uses blood sugar (glucose)")
46
+
47
+ ```
48
+
49
+ Output of `bioformer-8L`:
50
+
51
+ ```
52
+ [{'score': 0.3207533359527588,
53
+ 'token': 13473,
54
+ 'token_str': 'Diabetes',
55
+ 'sequence': 'Diabetes refers to a group of diseases that affect how the body uses blood sugar ( glucose )'},
56
+
57
+ {'score': 0.19234347343444824,
58
+ 'token': 17740,
59
+ 'token_str': 'Obesity',
60
+ 'sequence': 'Obesity refers to a group of diseases that affect how the body uses blood sugar ( glucose )'},
61
+
62
+ {'score': 0.09200277179479599,
63
+ 'token': 10778,
64
+ 'token_str': 'T2DM',
65
+ 'sequence': 'T2DM refers to a group of diseases that affect how the body uses blood sugar ( glucose )'},
66
+
67
+ {'score': 0.08494312316179276,
68
+ 'token': 2228,
69
+ 'token_str': 'It',
70
+ 'sequence': 'It refers to a group of diseases that affect how the body uses blood sugar ( glucose )'},
71
+
72
+ {'score': 0.0412776917219162,
73
+ 'token': 22263,
74
+ 'token_str':
75
+ 'Hypertension',
76
+ 'sequence': 'Hypertension refers to a group of diseases that affect how the body uses blood sugar ( glucose )'}]
77
+ ```
78
+
79
+ Output of `bioformer-16L`:
80
+
81
+ ```
82
+ [{'score': 0.7262957692146301,
83
+ 'token': 13473,
84
+ 'token_str': 'Diabetes',
85
+ 'sequence': 'Diabetes refers to a group of diseases that affect how the body uses blood sugar ( glucose )'},
86
+
87
+ {'score': 0.124954953789711,
88
+ 'token': 10778,
89
+ 'token_str': 'T2DM',
90
+ 'sequence': 'T2DM refers to a group of diseases that affect how the body uses blood sugar ( glucose )'},
91
+
92
+ {'score': 0.04062706232070923,
93
+ 'token': 2228,
94
+ 'token_str': 'It',
95
+ 'sequence': 'It refers to a group of diseases that affect how the body uses blood sugar ( glucose )'},
96
+
97
+ {'score': 0.022694870829582214,
98
+ 'token': 17740,
99
+ 'token_str': 'Obesity',
100
+ 'sequence': 'Obesity refers to a group of diseases that affect how the body uses blood sugar ( glucose )'},
101
+
102
+ {'score': 0.009743048809468746,
103
+ 'token': 13960,
104
+ 'token_str': 'T2D',
105
+ 'sequence': 'T2D refers to a group of diseases that affect how the body uses blood sugar ( glucose )'}]
106
+ ```
107
 
108
  ## Awards
109
  Bioformer-8L achieved top performance (highest micro-F1 score) in the BioCreative VII COVID-19 multi-label topic classification challenge (https://doi.org/10.1093/database/baac069)