qinluo
/

wobert-chinese-plus

Model card Files Files and versions Community

wobert-chinese-plus / README.md

qinluo's picture

Update README.md

c6d81f0 over 3 years ago

|

1.47 kB

	---
	language: zh
	tags:
	- wobert
	inference: True
	---

	## Word based BERT model

	原模型及说明见：https://github.com/ZhuiyiTechnology/WoBERT

	pytorch 模型见: https://github.com/JunnYu/WoBERT_pytorch

	## 安装 WoBertTokenizer

	```bash
	pip install git+https://github.com/JunnYu/WoBERT_pytorch.git
	```

	## 使用
	```python
	from transformers import TFBertForMaskedLM as WoBertForMaskedLM
	from wobert import WoBertTokenizer

	import tensorflow as tf

	pretrained_model_or_path = pt2tf_wobert_path

	tokenizer = WoBertTokenizer.from_pretrained(pretrained_model_or_path)
	model = WoBertForMaskedLM.from_pretrained(pretrained_model_or_path)

	text = '今天[MASK]很好，我[MASK]去公园玩。'
	inputs = tokenizer(text, return_tensors='tf')
	outputs = model(**inputs).logits[0]

	outputs_sentence = ''
	for i, id in enumerate(tokenizer.encode(text)):
	if id == tokenizer.mask_token_id:
	tokens = tokenizer.convert_ids_to_tokens(tf.math.top_k(outputs[i], k=5)[1])
	outputs_sentence += '[' + '\|'.join(tokens) + ']'
	else:
	outputs_sentence += ''.join(tokenizer.convert_ids_to_tokens([id], skip_special_tokens=True))

	print(outputs_sentence)

	# 今天[天气\|阳光\|天\|心情\|空气]很好，我[想\|要\|打算\|准备\|就]去公园玩。

	```
	## 引用
	Bibtex：
	```tex
	@techreport{zhuiyiwobert,
	title={WoBERT: Word-based Chinese BERT model - ZhuiyiAI},
	author={Jianlin Su},
	year={2020},
	url="https://github.com/ZhuiyiTechnology/WoBERT",
	}
	```