raynardj commited on
Commit
58d6399
1 Parent(s): 862fec9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -0
README.md CHANGED
@@ -6,6 +6,7 @@ tags:
6
  - cancer
7
  - gene
8
  - clinical trial
 
9
  license: apache-2.0
10
  datasets:
11
  - pubmed
@@ -13,3 +14,27 @@ widget:
13
  - text: "The <mask> effects of hyperatomarin"
14
  ---
15
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  - cancer
7
  - gene
8
  - clinical trial
9
+ - bioinformatic
10
  license: apache-2.0
11
  datasets:
12
  - pubmed
14
  - text: "The <mask> effects of hyperatomarin"
15
  ---
16
 
17
+ # Roberta-Base fine-tuned on [PubMed](https://pubmed.ncbi.nlm.nih.gov/) Abstract
18
+ > We limit the training textual data to the following [MeSH](https://www.ncbi.nlm.nih.gov/mesh/)
19
+ * All the child MeSH of ```Biomarkers, Tumor(D014408)```, including things like ```Carcinoembryonic Antigen(D002272)```
20
+ * All the child MeSH of ```Carcinoma(D002277)```, including things like all kinds of carcinoma: like ```Carcinoma, Lewis Lung(D018827)``` etc. around 80 kinds of carcinoma
21
+ * All the child MeSH of ```Clinical Trial(D016439)```
22
+ * The training text file amounts to 531Mb
23
+ ## Training
24
+ * Trained on language modeling task, with ```mlm_probability=0.15```, on 2 Tesla V100 32G
25
+ ```python
26
+ training_args = TrainingArguments(
27
+ output_dir=config.save, #select model path for checkpoint
28
+ overwrite_output_dir=True,
29
+ num_train_epochs=3,
30
+ per_device_train_batch_size=30,
31
+ per_device_eval_batch_size=60,
32
+ evaluation_strategy= 'steps',
33
+ save_total_limit=2,
34
+ eval_steps=250,
35
+ metric_for_best_model='eval_loss',
36
+ greater_is_better=False,
37
+ load_best_model_at_end =True,
38
+ prediction_loss_only=True,
39
+ report_to = "none")
40
+ ```