--- language: - zh license: apache-2.0 tags: - bart widget: - text: "桂林是著名的[MASK],它有很多[MASK]。" --- # Randeng-BART-759M-BertTokenizer model (Chinese),one model of [Fengshenbang-LM](https://github.com/IDEA-CCNL/Fengshenbang-LM) The 759M million parameter Randeng-BART large model, using 180G Chinese data, 8 A100(40G) training for 7 days,which is a Encoder-Only transformer structure. We use bert vocab as our tokenizer. ## Task Description Randeng-BART-759M-BertTokenizer is pre-trained by Text-Infilling task from BART [paper](https://readpaper.com/pdf-annotate/note?noteId=675945911766249472&pdfId=550970997159968917) You can find our pretrain's code in [Fengshengbang-LM](https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/main/fengshen/examples/pretrain_randeng_bart) ## Usage ```python from transformers import BartForConditionalGeneration, AutoTokenizer, Text2TextGenerationPipeline import torch tokenizer=AutoTokenizer.from_pretrained('IDEA-CCNL/Randeng-BART-759M-BertTokenizer', use_fast=false) model=BartForConditionalGeneration.from_pretrained('IDEA-CCNL/Randeng-BART-759M-BertTokenizer') text = '桂林是著名的[MASK],它有很多[MASK]。' text2text_generator = Text2TextGenerationPipeline(model, tokenizer) print(text2text_generator(text, max_length=50, do_sample=False)) ``` ## Citation If you find the resource is useful, please cite the following website in your paper. ``` @misc{Fengshenbang-LM, title={Fengshenbang-LM}, author={IDEA-CCNL}, year={2022}, howpublished={\url{https://github.com/IDEA-CCNL/Fengshenbang-LM}}, } ```