--- language: zh tags: - wobert --- ## 介绍 ### tf版本 https://github.com/ZhuiyiTechnology/WoBERT ### pytorch版本 https://github.com/JunnYu/WoBERT_pytorch ## 安装(主要为了安装WoBertTokenizer) 注意:transformers版本需要>=4.7.0 WoBertTokenizer的实现与RoFormerTokenizer是一样的,因此使用RoFormerTokenizer就可以了 ## 使用 ```python import torch from transformers import BertForMaskedLM as WoBertForMaskedLM from transformers import RoFormerTokenizer as WoBertTokenizer pretrained_model_or_path_list = [ "junnyu/wobert_chinese_plus_base", "junnyu/wobert_chinese_base" ] for path in pretrained_model_or_path_list: text = "今天[MASK]很好,我[MASK]去公园玩。" tokenizer = WoBertTokenizer.from_pretrained(path) model = WoBertForMaskedLM.from_pretrained(path) inputs = tokenizer(text, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs).logits[0] outputs_sentence = "" for i, id in enumerate(tokenizer.encode(text)): if id == tokenizer.mask_token_id: tokens = tokenizer.convert_ids_to_tokens(outputs[i].topk(k=5)[1]) outputs_sentence += "[" + "||".join(tokens) + "]" else: outputs_sentence += "".join( tokenizer.convert_ids_to_tokens([id], skip_special_tokens=True)) print(outputs_sentence) # RoFormer 今天[天气||天||心情||阳光||空气]很好,我[想||要||打算||准备||喜欢]去公园玩。 # PLUS WoBERT 今天[天气||阳光||天||心情||空气]很好,我[想||要||打算||准备||就]去公园玩。 # WoBERT 今天[天气||阳光||天||心情||空气]很好,我[想||要||就||准备||也]去公园玩。 ``` ## 引用 Bibtex: ```tex @techreport{zhuiyiwobert, title={WoBERT: Word-based Chinese BERT model - ZhuiyiAI}, author={Jianlin Su}, year={2020}, url="https://github.com/ZhuiyiTechnology/WoBERT", } ```