--- datasets: - botp/yentinglin-zh_TW_c4 language: - zh pipeline_tag: fill-mask --- | Dataset\BERT Pretrain | bert-based-chinese | ckiplab | GufoLab | | ------------- |:-------------:|:-------------:|:-------------:| | 5000 Tradition Chinese Dataset |0.7183| 0.6989| **0.8081**| | 10000 Sol-Idea Dataset | 0.7874| 0.7913| **0.8025**| | ALL DataSet | 0.7694| 0.7678| **0.8038**| ### Model Sources - **Paper:** [BERT](https://arxiv.org/abs/1810.04805) ## Uses #### Direct Use This model can be used for masked language modeling ## Risks, Limitations and Biases **CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes.** Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). ## Training #### Training Procedure * **type_vocab_size:** 2 * **vocab_size:** 21128 * **num_hidden_layers:** 12 #### Training Data botp/yentinglin-zh_TW_c4 ## Evaluation #### Results [More Information Needed] ## How to Get Started With the Model ```python from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained('EZlee/bert-based-chinese', use_auth_token=True) model = AutoModelForMaskedLM.from_pretrained("EZlee/bert-based-chinese", use_auth_token=True) ```