shibing624
/

bert4ner-base-chinese

Token Classification

Inference Endpoints

Model card Files Files and versions Community

bert4ner-base-chinese / README.md

shibing624's picture

Update README.md

bb7d55d about 2 years ago

|

No virus

2.19 kB

	---
	language:
	- zh
	tags:
	- bert
	- pytorch
	- zh
	- ner
	license: "apache-2.0"
	---

	# BERT for Chinese Named Entity Recognition(bert4ner) Model
	中文实体识别模型

	`bert4ner-base-chinese` evaluate CNER test data：

	- precision: 0.9395, recall: 0.9604, f1: 0.9498

	由于训练使用的数据使用了CNER的训练集，在CNER的测试集上达到接近SOTA水平。

	模型结构，标准BertSoftmax的网络结构：

	![arch](bert.png)

	## Usage

	本项目开源在实体识别项目：[nerpy](https://github.com/shibing624/nerpy)，可支持bert4ner模型，通过如下命令调用：

	```shell
	>>> from nerpy import NERModel
	>>> model = NERModel("bert", "shibing624/bert4ner-base-chinese")
	>>> predictions, raw_outputs, entities = model.predict(["常建良，男，1963年出生，工科学士，高级工程师"], split_on_space=False)
	entities: [('常建良', 'NAME'), ('工科', 'PRO'), ('学士', 'EDU'), ('高级工程师', 'TITLE')]
	```

	模型文件组成：
	```
	bert4ner-base-chinese
	├── config.json
	├── model_args.json
	├── eval_result.txt
	├── pytorch_model.bin
	├── special_tokens_map.json
	├── tokenizer_config.json
	└── vocab.txt
	```

	### 训练数据集
	#### 中文实体识别数据集


	\| 数据集 \| 语料 \| 下载链接 \| 文件大小 \|
	\| :------- \| :--------- \| :---------: \| :---------: \|
	\| `CNER中文实体识别数据集` \| CNER(12万字) \| [CNER github](https://github.com/shibing624/nerpy/tree/main/examples/data/cner)\| 1.1MB \|
	\| `PEOPLE中文实体识别数据集` \| 人民日报实体集（200万字） \| [PEOPLE github](https://github.com/shibing624/nerpy/tree/main/examples/data/people)\| 12.8MB \|


	CNER中文实体识别数据集，数据格式：

	```text
	美 B-LOC
	国 I-LOC
	的 O
	华 B-PER
	莱 I-PER
	士 I-PER

	我 O
	跟 O
	他 O
	```


	如果需要训练bert4ner，请参考[https://github.com/shibing624/nerpy/tree/main/examples](https://github.com/shibing624/nerpy/tree/main/examples)


	## Citation

	```latex
	@software{nerpy,
	author = {Xu Ming},
	title = {nerpy: Named Entity Recognition toolkit},
	year = {2022},
	url = {https://github.com/shibing624/nerpy},
	}
	```