IDEA-CCNL
/

Yuyuan-Bart-139M

Text2Text Generation

Inference Endpoints

Model card Files Files and versions Community

roygan commited on Apr 24, 2022

Commit

aecbc0e

•

1 Parent(s): 6435958

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -24,7 +24,7 @@ Paper: [BioBART: Pretraining and Evaluation of A Biomedical Generative Language
 We use PubMed abstracts as the pretraining corpora. The corpora contain about 41 GB of biomedical research paper abstracts on PubMed.
 ## Pretraining Setup
-We continuously pretrain both base versions of BART for 120k steps with a batch size of 2560. We use the same vocabulary as BART to tokenize the texts. Although the input length limitation of BART is 1024, the tokenized PubMed abstracts rarely exceed 512. Therefore, for the sake of training efficiency, we truncate all the input texts to 512 maximum length. We mask 30% of the input tokens and the masked span length is determined by sampling from a Poisson distribution (λ = 3) as used in BART. We use a learning rate scheduler of 0.02 warm-up ratio and linear decay. The learning rate is set to 1e-4. We train the base version of BioBART(139M parameters) on 2 DGX with 16 40GB A100 GPUs for about 100 hours with the help of the open-resource framework DeepSpeed.

 We use PubMed abstracts as the pretraining corpora. The corpora contain about 41 GB of biomedical research paper abstracts on PubMed.
 ## Pretraining Setup
+We continuously pretrain base versions of BART for 120k steps with a batch size of 2560. We use the same vocabulary as BART to tokenize the texts. Although the input length limitation of BART is 1024, the tokenized PubMed abstracts rarely exceed 512. Therefore, for the sake of training efficiency, we truncate all the input texts to 512 maximum length. We mask 30% of the input tokens and the masked span length is determined by sampling from a Poisson distribution (λ = 3) as used in BART. We use a learning rate scheduler of 0.02 warm-up ratio and linear decay. The learning rate is set to 1e-4. We train the base version of BioBART(139M parameters) on 2 DGX with 16 40GB A100 GPUs for about 100 hours with the help of the open-resource framework DeepSpeed.