IDEA-CCNL
/

Randeng-BART-759M-Chinese-BertTokenizer

Text2Text Generation

Inference Endpoints

Model card Files Files and versions Community

gxy commited on Sep 21, 2022

Commit

d3876ab

•

1 Parent(s): e4f529d

FEAT: first commit

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -10,7 +10,7 @@ widget:
 ---
-# Randeng-BART-759M-BertTokenizer model (Chinese)，one model of [Fengshenbang-LM](https://github.com/IDEA-CCNL/Fengshenbang-LM)
 The 759M million parameter Randeng-BART large model, using 180G Chinese data, 8 A100(40G) training for 7 days，which is a Encoder-Only transformer structure.
@@ -18,7 +18,7 @@ We use bert vocab as our tokenizer.
 ## Task Description
-Randeng-BART-759M-BertTokenizer is pre-trained by Text-Infilling task from BART [paper](https://readpaper.com/pdf-annotate/note?noteId=675945911766249472&pdfId=550970997159968917)
 You can find our pretrain's code in [Fengshengbang-LM](https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/main/fengshen/examples/pretrain_randeng_bart)
@@ -28,8 +28,8 @@ You can find our pretrain's code in [Fengshengbang-LM](https://github.com/IDEA-C
 from transformers import BartForConditionalGeneration, AutoTokenizer, Text2TextGenerationPipeline
 import torch
-tokenizer=AutoTokenizer.from_pretrained('IDEA-CCNL/Randeng-BART-759M-BertTokenizer', use_fast=false)
-model=BartForConditionalGeneration.from_pretrained('IDEA-CCNL/Randeng-BART-759M-BertTokenizer')
 text = '桂林是著名的[MASK]，它有很多[MASK]。'
 text2text_generator = Text2TextGenerationPipeline(model, tokenizer)
 print(text2text_generator(text, max_length=50, do_sample=False))

 ---
+# Randeng-BART-759M-Chinese-BertTokenizer model (Chinese)，one model of [Fengshenbang-LM](https://github.com/IDEA-CCNL/Fengshenbang-LM)
 The 759M million parameter Randeng-BART large model, using 180G Chinese data, 8 A100(40G) training for 7 days，which is a Encoder-Only transformer structure.
 ## Task Description
+Randeng-BART-759M-Chinese-BertTokenizer is pre-trained by Text-Infilling task from BART [paper](https://readpaper.com/pdf-annotate/note?noteId=675945911766249472&pdfId=550970997159968917)
 You can find our pretrain's code in [Fengshengbang-LM](https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/main/fengshen/examples/pretrain_randeng_bart)
 from transformers import BartForConditionalGeneration, AutoTokenizer, Text2TextGenerationPipeline
 import torch
+tokenizer=AutoTokenizer.from_pretrained('IDEA-CCNL/Randeng-BART-759M-Chinese-BertTokenizer', use_fast=false)
+model=BartForConditionalGeneration.from_pretrained('IDEA-CCNL/Randeng-BART-759M-Chinese-BertTokenizer')
 text = '桂林是著名的[MASK]，它有很多[MASK]。'
 text2text_generator = Text2TextGenerationPipeline(model, tokenizer)
 print(text2text_generator(text, max_length=50, do_sample=False))