metadata
language:
- zh
license: apache-2.0
tags:
- bart
widget:
- text: 桂林是著名的[MASK],它有很多[MASK]。
Randeng-BART-759M-BertTokenizer model (Chinese),one model of Fengshenbang-LM
The 759M million parameter Randeng-BART large model, using 180G Chinese data, 8 A100(40G) training for 7 days,which is a Encoder-Only transformer structure.
We use bert vocab as our tokenizer.
Task Description
Randeng-BART-759M-BertTokenizer is pre-trained by Text-Infilling task from BART paper
You can find our pretrain's code in Fengshengbang-LM
Usage
from transformers import BartForConditionalGeneration, AutoTokenizer, Text2TextGenerationPipeline
import torch
tokenizer=AutoTokenizer.from_pretrained('IDEA-CCNL/Randeng-BART-759M-BertTokenizer', use_fast=false)
model=BartForConditionalGeneration.from_pretrained('IDEA-CCNL/Randeng-BART-759M-BertTokenizer')
text = '桂林是著名的[MASK],它有很多[MASK]。'
text2text_generator = Text2TextGenerationPipeline(model, tokenizer)
print(text2text_generator(text, max_length=50, do_sample=False))
Citation
If you find the resource is useful, please cite the following website in your paper.
@misc{Fengshenbang-LM,
title={Fengshenbang-LM},
author={IDEA-CCNL},
year={2022},
howpublished={\url{https://github.com/IDEA-CCNL/Fengshenbang-LM}},
}