---
datasets:
- botp/yentinglin-zh_TW_c4
language:
- zh
pipeline_tag: fill-mask
---

| Dataset\BERT Pretrain  | bert-based-chinese | ckiplab | GufoLab |
| ------------- |:-------------:|:-------------:|:-------------:|
| 5000 Tradition Chinese Dataset	|0.7183|	0.6989|	**0.8081**|
| 10000 Sol-Idea Dataset	| 0.7874|	0.7913|	**0.8025**|
| ALL DataSet	| 0.7694| 	0.7678| 	**0.8038**|

### Model Sources
- **Paper:** [BERT](https://arxiv.org/abs/1810.04805)

## Uses

#### Direct Use

This model can be used for masked language modeling 


## Risks, Limitations and Biases
**CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes.**

Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)).


## Training

#### Training Procedure
* **type_vocab_size:** 2
* **vocab_size:** 21128
* **num_hidden_layers:** 12

#### Training Data
botp/yentinglin-zh_TW_c4

## Evaluation

#### Results

[More Information Needed]


## How to Get Started With the Model
```python
from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained('EZlee/bert-based-chinese', use_auth_token=True)

model = AutoModelForMaskedLM.from_pretrained("EZlee/bert-based-chinese", use_auth_token=True)

```