File size: 1,980 Bytes
4c96305
 
350968a
 
 
 
 
 
 
4c96305
350968a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
---
license: afl-3.0
language:
- zh
tags:
- bert
- chinesebert
- MLM
pipeline_tag: fill-mask
---

# ChineseBERT-large

本项目是将ChineseBERT进行了加工,可供使用者直接使用HuggingFace API进行调用,无需再进行多余的代码配置。

原论文地址:
**[ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information](https://arxiv.org/abs/2106.16038)**
*Zijun Sun, Xiaoya Li, Xiaofei Sun, Yuxian Meng, Xiang Ao, Qing He, Fei Wu and Jiwei Li*

原项目地址:
[ChineseBERT github link](https://github.com/ShannonAI/ChineseBert)

原模型地址:
[ShannonAI/ChineseBERT-base](https://huggingface.co/ShannonAI/ChineseBERT-base) (该模型无法直接使用HuggingFace API调用)

# 本项目使用方法

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/iioSnail/ChineseBert/blob/main/demo/ChineseBERT-Demo.ipynb)

1. 安装pypinyin

```
pip install pypinyin
```

2. 使用AutoClass加载tokenizer和model

```python
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("iioSnail/ChineseBERT-large", trust_remote_code=True)
model = AutoModel.from_pretrained("iioSnail/ChineseBERT-large", trust_remote_code=True)
```

3. 之后与普通BERT使用方法一致

```python
inputs = tokenizer(["我 喜 [MASK] 猫"], return_tensors='pt')
logits = model(**inputs).logits

print(tokenizer.decode(logits.argmax(-1)[0, 1:-1]))
```

输出:

```
tokenizer.decode(logits.argmax(-1)[0, 1:-1])
```

# 常见问题

1. 网络问题,例如:`Connection Error`

解决方案:将模型下载到本地使用。批量下载方案可参考该[博客](https://blog.csdn.net/zhaohongfei_358/article/details/126222999)

2. 将模型下载到本地使用时出现报错:`ModuleNotFoundError: No module named 'transformers_modules.iioSnail/ChineseBERT-large'`

解决方案:将 `iioSnail/ChineseBERT-large` 改为 `iioSnail\ChineseBERT-large`