Edit model card

BertSpan for Chinese Named Entity Recognition(bertspan4ner) Model

中文实体识别模型

bertspan4ner-base-chinese evaluate PEOPLE(人民日报) test data:

The overall performance of BertSpan on people test:

Accuracy Recall F1
BertSpan 0.9610 0.9600 0.9605

在PEOPLE的测试集上达到SOTA水平。

Usage

本项目开源在实体识别项目:nerpy,可支持bertspan模型,通过如下命令调用:

>>> from nerpy import NERModel
>>> model = NERModel("bertspan", "shibing624/bertspan4ner-base-chinese")
>>> predictions, raw_outputs, entities = model.predict(["常建良,男,1963年出生,工科学士,高级工程师"], split_on_space=False)
entities: [('常建良', 'PER'), ('1963年', 'TIME')]

模型文件组成:

bertspan4ner-base-chinese
    ├── config.json
    ├── model_args.json
    ├── pytorch_model.bin
    ├── special_tokens_map.json
    ├── tokenizer_config.json
    └── vocab.txt

训练数据集

中文实体识别数据集

数据集 语料 下载链接 文件大小
CNER中文实体识别数据集 CNER(12万字) CNER github 1.1MB
PEOPLE中文实体识别数据集 人民日报数据集(200万字) PEOPLE github 12.8MB

CNER中文实体识别数据集,数据格式:

美	B-LOC
国	I-LOC
的	O
华	B-PER
莱	I-PER
士	I-PER
我	O
跟	O
他	O

如果需要训练bertspan4ner,请参考https://github.com/shibing624/nerpy/tree/main/examples

Citation

@software{nerpy,
  author = {Xu Ming},
  title = {nerpy: Named Entity Recognition toolkit},
  year = {2022},
  url = {https://github.com/shibing624/nerpy},
}
Downloads last month
56
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.