raynardj commited on
Commit
862fec9
1 Parent(s): 0d18133

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -27
README.md CHANGED
@@ -1,30 +1,3 @@
1
- # Roberta-Base fine-tuned on [PubMed](https://pubmed.ncbi.nlm.nih.gov/) Abstract
2
- > We limit the training textual data to the following [MeSH](https://www.ncbi.nlm.nih.gov/mesh/)
3
- * All the child MeSH of ```Biomarkers, Tumor(D014408)```, including things like ```Carcinoembryonic Antigen(D002272)```
4
- * All the child MeSH of ```Carcinoma(D002277)```, including things like all kinds of carcinoma: like ```Carcinoma, Lewis Lung(D018827)``` etc. around 80 kinds of carcinoma
5
- * All the child MeSH of ```Clinical Trial(D016439)```
6
- * The training text file amounts to 531Mb
7
- ## Training
8
- * Trained on language modeling task, with ```mlm_probability=0.15```, on 2 Tesla V100 32G
9
- ```python
10
- training_args = TrainingArguments(
11
- output_dir=config.save, #select model path for checkpoint
12
- overwrite_output_dir=True,
13
- num_train_epochs=3,
14
- per_device_train_batch_size=30,
15
- per_device_eval_batch_size=60,
16
- evaluation_strategy= 'steps',
17
- save_total_limit=2,
18
- eval_steps=250,
19
- metric_for_best_model='eval_loss',
20
- greater_is_better=False,
21
- load_best_model_at_end =True,
22
- prediction_loss_only=True,
23
- report_to = "none")
24
- ```
25
-
26
-
27
-
28
  ---
29
  language:
30
  - en
@@ -36,5 +9,7 @@ tags:
36
  license: apache-2.0
37
  datasets:
38
  - pubmed
 
 
39
  ---
40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  language:
3
  - en
 
9
  license: apache-2.0
10
  datasets:
11
  - pubmed
12
+ widget:
13
+ - text: "The <mask> effects of hyperatomarin"
14
  ---
15