junnyu
/

wobert_chinese_base

Inference Endpoints

Model card Files Files and versions Community

wobert_chinese_base / README.md

junnyu's picture

Update README.md

496cca9 almost 3 years ago

|

raw history blame contribute delete

No virus

1.98 kB

	---
	language: zh
	tags:
	- wobert

	---
	## 介绍
	### tf版本
	https://github.com/ZhuiyiTechnology/WoBERT
	### pytorch版本
	https://github.com/JunnYu/WoBERT_pytorch

	## 安装(主要为了安装WoBertTokenizer)
	注意：transformers版本需要>=4.7.0
	WoBertTokenizer的实现与RoFormerTokenizer是一样的，因此使用RoFormerTokenizer就可以了

	## 使用
	```python
	import torch
	from transformers import BertForMaskedLM as WoBertForMaskedLM
	from transformers import RoFormerTokenizer as WoBertTokenizer

	pretrained_model_or_path_list = [
	"junnyu/wobert_chinese_plus_base", "junnyu/wobert_chinese_base"
	]
	for path in pretrained_model_or_path_list:
	text = "今天[MASK]很好，我[MASK]去公园玩。"
	tokenizer = WoBertTokenizer.from_pretrained(path)
	model = WoBertForMaskedLM.from_pretrained(path)
	inputs = tokenizer(text, return_tensors="pt")
	with torch.no_grad():
	outputs = model(**inputs).logits[0]
	outputs_sentence = ""
	for i, id in enumerate(tokenizer.encode(text)):
	if id == tokenizer.mask_token_id:
	tokens = tokenizer.convert_ids_to_tokens(outputs[i].topk(k=5)[1])
	outputs_sentence += "[" + "\|\|".join(tokens) + "]"
	else:
	outputs_sentence += "".join(
	tokenizer.convert_ids_to_tokens([id],
	skip_special_tokens=True))
	print(outputs_sentence)
	# RoFormer 今天[天气\|\|天\|\|心情\|\|阳光\|\|空气]很好，我[想\|\|要\|\|打算\|\|准备\|\|喜欢]去公园玩。
	# PLUS WoBERT 今天[天气\|\|阳光\|\|天\|\|心情\|\|空气]很好，我[想\|\|要\|\|打算\|\|准备\|\|就]去公园玩。
	# WoBERT 今天[天气\|\|阳光\|\|天\|\|心情\|\|空气]很好，我[想\|\|要\|\|就\|\|准备\|\|也]去公园玩。
	```
	## 引用

	Bibtex：

	```tex
	@techreport{zhuiyiwobert,
	title={WoBERT: Word-based Chinese BERT model - ZhuiyiAI},
	author={Jianlin Su},
	year={2020},
	url="https://github.com/ZhuiyiTechnology/WoBERT",
	}
	```