bert-based-chinese / README.md
EZlee's picture
Update README.md
20c3685
---
datasets:
- botp/yentinglin-zh_TW_c4
language:
- zh
pipeline_tag: fill-mask
---
### Model Sources
- **Paper:** [BERT](https://arxiv.org/abs/1810.04805)
## Uses
#### Direct Use
This model can be used for masked language modeling
## Training
#### Training Procedure
* **type_vocab_size:** 2
* **vocab_size:** 21128
* **num_hidden_layers:** 12
#### Training Data
botp/yentinglin-zh_TW_c4
## Evaluation
| Dataset\BERT Pretrain | bert-based-chinese | ckiplab | GufoLab |
| ------------- |:-------------:|:-------------:|:-------------:|
| 5000 Tradition Chinese Dataset |0.7183| 0.6989| **0.8081**|
| 10000 Sol-Idea Dataset | 0.7874| 0.7913| **0.8025**|
| ALL DataSet | 0.7694| 0.7678| **0.8038**|
#### Results
| Test ID\Results | [MASK] Input | Result Output |
| -------------|-------------|-------------|
| 1|今天禮拜[MASK]?我[MASK]是很想[MASK]班。|今天禮拜六?我不是很想上班。 |
| 2|[MASK]灣並[MASK]是[MASK]國不可分割的一部分。|臺灣並不是中國不可分割的一部分。 |
| 3|如果可以是韋[MASK]安的最新歌[MASK]。|如果可以是韋禮安的最新歌曲。 |
| 4|[MASK]水老[MASK]有賣很多鐵蛋的攤販。|淡水老街有賣很多鐵蛋的攤販。 |
**git-lfs Installation**
```
$ curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
$ sudo apt-get install git-lfs
$ git lfs install
$ pip install huggingface_hub
```
## How to Get Started With the Model
#### Login HuggingFace on Terminal
```
$ huggingface-cli login
Token:Your own huggingface token.
```
#### Login HuggingFace on Jupyter Notebook
```
from huggingface_hub import notebook_login
notebook_login()
Token:Your own huggingface token.
```
#### Pyhon Code
```python
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained('Azion/bert-based-chinese', use_auth_token=True)
model = AutoModelForMaskedLM.from_pretrained("Azion/bert-based-chinese", use_auth_token=True)
```