Azion
/

bert-based-chinese

Inference Endpoints

Model card Files Files and versions Community

bert-based-chinese / README.md

EZlee's picture

Update README.md

20c3685 10 months ago

|

raw history blame contribute delete

No virus

2.01 kB

	---
	datasets:
	- botp/yentinglin-zh_TW_c4
	language:
	- zh
	pipeline_tag: fill-mask
	---

	### Model Sources
	- Paper: [BERT](https://arxiv.org/abs/1810.04805)

	## Uses

	#### Direct Use

	This model can be used for masked language modeling


	## Training

	#### Training Procedure
	* type_vocab_size: 2
	* vocab_size: 21128
	* num_hidden_layers: 12

	#### Training Data
	botp/yentinglin-zh_TW_c4

	## Evaluation

	\| Dataset\BERT Pretrain \| bert-based-chinese \| ckiplab \| GufoLab \|
	\| ------------- \|:-------------:\|:-------------:\|:-------------:\|
	\| 5000 Tradition Chinese Dataset \|0.7183\| 0.6989\| 0.8081\|
	\| 10000 Sol-Idea Dataset \| 0.7874\| 0.7913\| 0.8025\|
	\| ALL DataSet \| 0.7694\| 0.7678\| 0.8038\|

	#### Results

	\| Test ID\Results \| [MASK] Input \| Result Output \|
	\| -------------\|-------------\|-------------\|
	\| 1\|今天禮拜[MASK]？我[MASK]是很想[MASK]班。\|今天禮拜六？我不是很想上班。 \|
	\| 2\|[MASK]灣並[MASK]是[MASK]國不可分割的一部分。\|臺灣並不是中國不可分割的一部分。 \|
	\| 3\|如果可以是韋[MASK]安的最新歌[MASK]。\|如果可以是韋禮安的最新歌曲。 \|
	\| 4\|[MASK]水老[MASK]有賣很多鐵蛋的攤販。\|淡水老街有賣很多鐵蛋的攤販。 \|

	git-lfs Installation
	```
	$ curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh \| sudo bash
	$ sudo apt-get install git-lfs
	$ git lfs install
	$ pip install huggingface_hub

	```
	## How to Get Started With the Model

	#### Login HuggingFace on Terminal

	```
	$ huggingface-cli login
	Token:Your own huggingface token.
	```

	#### Login HuggingFace on Jupyter Notebook

	```
	from huggingface_hub import notebook_login

	notebook_login()
	Token:Your own huggingface token.
	```

	#### Pyhon Code

	```python
	from transformers import AutoTokenizer, AutoModelForMaskedLM

	tokenizer = AutoTokenizer.from_pretrained('Azion/bert-based-chinese', use_auth_token=True)

	model = AutoModelForMaskedLM.from_pretrained("Azion/bert-based-chinese", use_auth_token=True)

	```