---
language: 
  - zh
license: apache-2.0
tags:
  - bart

widget:
- text: "桂林是著名的[MASK]，它有很多[MASK]。"


---
# Randeng-BART-759M-BertTokenizer model (Chinese)，one model of [Fengshenbang-LM](https://github.com/IDEA-CCNL/Fengshenbang-LM)

The 759M million parameter Randeng-BART large model, using 180G Chinese data, 8 A100(40G) training for 7 days，which is a Encoder-Only transformer structure.

We use bert vocab as our tokenizer.

## Task Description

Randeng-BART-759M-BertTokenizer is pre-trained by Text-Infilling task from BART [paper](https://readpaper.com/pdf-annotate/note?noteId=675945911766249472&pdfId=550970997159968917)

You can find our pretrain's code in [Fengshengbang-LM](https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/main/fengshen/examples/pretrain_randeng_bart)

## Usage

```python
from transformers import BartForConditionalGeneration, AutoTokenizer, Text2TextGenerationPipeline
import torch

tokenizer=AutoTokenizer.from_pretrained('IDEA-CCNL/Randeng-BART-759M-BertTokenizer', use_fast=false)
model=BartForConditionalGeneration.from_pretrained('IDEA-CCNL/Randeng-BART-759M-BertTokenizer')
text = '桂林是著名的[MASK]，它有很多[MASK]。'
text2text_generator = Text2TextGenerationPipeline(model, tokenizer)
print(text2text_generator(text, max_length=50, do_sample=False))
```

## Citation

If you find the resource is useful, please cite the following website in your paper.

```
@misc{Fengshenbang-LM,
  title={Fengshenbang-LM},
  author={IDEA-CCNL},
  year={2022},
  howpublished={\url{https://github.com/IDEA-CCNL/Fengshenbang-LM}},
}
```