--- datasets: - botp/yentinglin-zh_TW_c4 language: - zh pipeline_tag: fill-mask --- ### Model Sources - **Paper:** [BERT](https://arxiv.org/abs/1810.04805) ## Uses #### Direct Use This model can be used for masked language modeling ## Training #### Training Procedure * **type_vocab_size:** 2 * **vocab_size:** 21128 * **num_hidden_layers:** 12 #### Training Data botp/yentinglin-zh_TW_c4 ## Evaluation | Dataset\BERT Pretrain | bert-based-chinese | ckiplab | GufoLab | | ------------- |:-------------:|:-------------:|:-------------:| | 5000 Tradition Chinese Dataset |0.7183| 0.6989| **0.8081**| | 10000 Sol-Idea Dataset | 0.7874| 0.7913| **0.8025**| | ALL DataSet | 0.7694| 0.7678| **0.8038**| #### Results | Test ID\Results | [MASK] Input | Result Output | | -------------|-------------|-------------| | 1|今天禮拜[MASK]?我[MASK]是很想[MASK]班。|今天禮拜六?我不是很想上班。 | | 2|[MASK]灣並[MASK]是[MASK]國不可分割的一部分。|臺灣並不是中國不可分割的一部分。 | | 3|如果可以是韋[MASK]安的最新歌[MASK]。|如果可以是韋禮安的最新歌曲。 | | 4|[MASK]水老[MASK]有賣很多鐵蛋的攤販。|淡水老街有賣很多鐵蛋的攤販。 | ## How to Get Started With the Model #### Private Model Download **Installation** ``` $ curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash $ sudo apt-get install git-lfs $ git lfs install $ pip install huggingface_hub ``` **Login HuggingFace** ``` $ huggingface-cli login Token:Your own 'write' token. ``` **Pyhon Code** ```python from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained('Azion/bert-based-chinese', use_auth_token=True) model = AutoModelForMaskedLM.from_pretrained("Azion/bert-based-chinese", use_auth_token=True) ```