File size: 2,048 Bytes
47e9648 9b6e41a 60e0896 0771f63 f1bfdd0 0771f63 370bc94 e640272 370bc94 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
---
license: afl-3.0
language:
- zh
tags:
- bert
- chinesebert
- MLM
pipeline_tag: fill-mask
---
# ChineseBERT-base
本项目是将ChineseBERT进行了加工,可供使用者直接使用HuggingFace API进行调用,无需再进行多余的代码配置。
原论文地址:
**[ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information](https://arxiv.org/abs/2106.16038)**
*Zijun Sun, Xiaoya Li, Xiaofei Sun, Yuxian Meng, Xiang Ao, Qing He, Fei Wu and Jiwei Li*
原项目地址:
[ChineseBERT github link](https://github.com/ShannonAI/ChineseBert)
原模型地址:
[ShannonAI/ChineseBERT-base](https://huggingface.co/ShannonAI/ChineseBERT-base) (该模型无法直接使用HuggingFace API调用)
# 本项目使用方法
[](https://colab.research.google.com/github/iioSnail/ChineseBert/blob/main/demo/ChineseBERT-Demo.ipynb)
1. 安装pypinyin
```
pip install pypinyin
```
2. 使用AutoClass加载tokenizer和model
```python
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("iioSnail/ChineseBERT-base", trust_remote_code=True)
model = AutoModel.from_pretrained("iioSnail/ChineseBERT-base", trust_remote_code=True)
```
3. 之后与普通BERT使用方法一致
```python
inputs = tokenizer(["我 喜 [MASK] 猫"], return_tensors='pt')
logits = model(**inputs).logits
print(tokenizer.decode(logits.argmax(-1)[0, 1:-1]))
```
输出:
```
tokenizer.decode(logits.argmax(-1)[0, 1:-1])
```
> 获取hidden_state的方法:`model.bert(**inputs).last_hidden_state`
# 常见问题
1. 网络问题,例如:`Connection Error`
解决方案:将模型下载到本地使用。批量下载方案可参考该[博客](https://blog.csdn.net/zhaohongfei_358/article/details/126222999)
2. 将模型下载到本地使用时出现报错:`ModuleNotFoundError: No module named 'transformers_modules.iioSnail/ChineseBERT-base'`
解决方案:将 `iioSnail/ChineseBERT-base` 改为 `iioSnail\ChineseBERT-base` |