File size: 2,012 Bytes
0465aa6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dd06ceb
 
 
 
 
0465aa6
dd06ceb
0465aa6
dd06ceb
 
 
 
 
 
0465aa6
916841c
dd06ceb
 
 
 
 
 
 
20c3685
916841c
3efc028
dd06ceb
 
 
b7bd0eb
dd06ceb
 
3efc028
916841c
 
 
 
 
 
 
 
3efc028
dd06ceb
0465aa6
 
 
0e7f293
0465aa6
0e7f293
0465aa6
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
datasets:
- botp/yentinglin-zh_TW_c4
language:
- zh
pipeline_tag: fill-mask
---

### Model Sources
- **Paper:** [BERT](https://arxiv.org/abs/1810.04805)

## Uses

#### Direct Use

This model can be used for masked language modeling 


## Training

#### Training Procedure
* **type_vocab_size:** 2
* **vocab_size:** 21128
* **num_hidden_layers:** 12

#### Training Data
botp/yentinglin-zh_TW_c4

## Evaluation

| Dataset\BERT Pretrain  | bert-based-chinese | ckiplab | GufoLab |
| ------------- |:-------------:|:-------------:|:-------------:|
| 5000 Tradition Chinese Dataset	|0.7183|	0.6989|	**0.8081**|
| 10000 Sol-Idea Dataset	| 0.7874|	0.7913|	**0.8025**|
| ALL DataSet	| 0.7694| 	0.7678| 	**0.8038**|

#### Results

| Test ID\Results  | [MASK] Input | Result Output |
| -------------|-------------|-------------|
| 1|今天禮拜[MASK]?我[MASK]是很想[MASK]班。|今天禮拜六?我不是很想上班。 |
| 2|[MASK]灣並[MASK]是[MASK]國不可分割的一部分。|臺灣並不是中國不可分割的一部分。 |
| 3|如果可以是韋[MASK]安的最新歌[MASK]。|如果可以是韋禮安的最新歌曲。 |
| 4|[MASK]水老[MASK]有賣很多鐵蛋的攤販。|淡水老街有賣很多鐵蛋的攤販。 |

**git-lfs Installation**
```
$ curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
$ sudo apt-get install git-lfs
$ git lfs install
$ pip install huggingface_hub

```
## How to Get Started With the Model 

#### Login HuggingFace on Terminal

```
$ huggingface-cli login
Token:Your own huggingface token.
```

#### Login HuggingFace on Jupyter Notebook

```
from huggingface_hub import notebook_login

notebook_login()
Token:Your own huggingface token.
```

#### Pyhon Code

```python
from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained('Azion/bert-based-chinese', use_auth_token=True)

model = AutoModelForMaskedLM.from_pretrained("Azion/bert-based-chinese", use_auth_token=True)

```