--- license: afl-3.0 tags: - chinese - ner - medical language: - zh --- # 医疗领域中文命名实体识别 项目地址:https://github.com/iioSnail/chinese_medical_ner 使用方法: ``` from transformers import AutoModelForTokenClassification, BertTokenizerFast tokenizer = BertTokenizerFast.from_pretrained('iioSnail/bert-base-chinese-medical-ner') model = AutoModelForTokenClassification.from_pretrained("iioSnail/bert-base-chinese-medical-ner") sentences = ["瘦脸针、水光针和玻尿酸详解!", "半月板钙化的病因有哪些?"] inputs = tokenizer(sentences, return_tensors="pt", padding=True, add_special_tokens=False) outputs = model(**inputs) outputs = outputs.logits.argmax(-1) * inputs['attention_mask'] print(outputs) ``` 输出结果: ``` tensor([[1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 4, 4], [1, 2, 2, 2, 3, 4, 4, 4, 4, 4, 4, 4, 0, 0]]) ``` 其中 `1=B, 2=I, 3=E, 4=O`。`1, 3`表示一个二字医疗实体,`1,2,3`表示一个3字医疗实体, `1,2,2,3`表示一个4字医疗实体,依次类推。 可以使用项目中的`MedicalNerModel.format_outputs(sentences, outputs)`来将输出进行转换。 效果如下: ``` [ [ {'start': 0, 'end': 3, 'word': '瘦脸针'}, {'start': 4, 'end': 7, 'word': '水光针'}, {'start': 8, 'end': 11, 'word': '玻尿酸'}、 ], [ {'start': 0, 'end': 5, 'word': '半月板钙化'} ] ] ``` 更多信息请参考项目:https://github.com/iioSnail/chinese_medical_ner