File size: 2,352 Bytes

---
language:
- zh
tags:
- bert
- pytorch
- zh
- ner
license: apache-2.0
library_name: transformers
pipeline_tag: token-classification
widget:
  - text:  常建良，男，1963年出生，工科学士，高级工程师
---

# BertSpan for Chinese Named Entity Recognition(bertspan4ner) Model
中文实体识别模型

`bertspan4ner-base-chinese` evaluate PEOPLE(人民日报) test data：

The overall performance of BertSpan on people **test**:

|              | Accuracy  | Recall    | F1  |
| ------------ | ------------------ | ------------------ | ------------------ |
| BertSpan | 0.9610     | 0.9600   | 0.9605     |

在PEOPLE的测试集上达到SOTA水平。

## Usage

本项目开源在实体识别项目：[nerpy](https://github.com/shibing624/nerpy)，可支持bertspan模型，通过如下命令调用：

```shell
>>> from nerpy import NERModel
>>> model = NERModel("bertspan", "shibing624/bertspan4ner-base-chinese")
>>> predictions, raw_outputs, entities = model.predict(["常建良，男，1963年出生，工科学士，高级工程师"], split_on_space=False)
entities: [('常建良', 'PER'), ('1963年', 'TIME')]
```

模型文件组成：
```
bertspan4ner-base-chinese
    ├── config.json
    ├── model_args.json
    ├── pytorch_model.bin
    ├── special_tokens_map.json
    ├── tokenizer_config.json
    └── vocab.txt
```


### 训练数据集
#### 中文实体识别数据集


| 数据集 | 语料 | 下载链接 | 文件大小 |
| :------- | :--------- | :---------: | :---------: |
| **`CNER中文实体识别数据集`** | CNER(12万字) | [CNER github](https://github.com/shibing624/nerpy/tree/main/examples/data/cner)| 1.1MB |
| **`PEOPLE中文实体识别数据集`** | 人民日报数据集（200万字） | [PEOPLE github](https://github.com/shibing624/nerpy/tree/main/examples/data/people)| 12.8MB |


CNER中文实体识别数据集，数据格式：

```text
美	B-LOC
国	I-LOC
的	O
华	B-PER
莱	I-PER
士	I-PER
我	O
跟	O
他	O
```


如果需要训练bertspan4ner，请参考[https://github.com/shibing624/nerpy/tree/main/examples](https://github.com/shibing624/nerpy/tree/main/examples)


## Citation

```latex
@software{nerpy,
  author = {Xu Ming},
  title = {nerpy: Named Entity Recognition toolkit},
  year = {2022},
  url = {https://github.com/shibing624/nerpy},
}
```