shibing624
/

bert4ner-base-chinese

Token Classification

Inference Endpoints

Model card Files Files and versions Community

shibing624 commited on May 7, 2022

Commit

e3792e1

·

1 Parent(s): 1dd5da2

Update README.md

Files changed (1) hide show

README.md +60 -0

README.md CHANGED Viewed

@@ -48,6 +48,66 @@ bert4ner-base-chinese
     └── vocab.txt
 ```
 ### 训练数据集
 #### 中文实体识别数据集

     └── vocab.txt
 ```
+## Usage (HuggingFace Transformers)
+Without [nerpy](https://github.com/shibing624/nerpy), you can use the model like this:
+First, you pass your input through the transformer model, then you have to apply the bio tag to get the entity words.
+Install package:
+```
+pip install transformers seqeval
+```
+```python
+import os
+import torch
+from transformers import AutoTokenizer, AutoModelForTokenClassification
+from seqeval.metrics.sequence_labeling import get_entities
+os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
+# Load model from HuggingFace Hub
+tokenizer = AutoTokenizer.from_pretrained("shibing624/bert4ner-base-chinese")
+model = AutoModelForTokenClassification.from_pretrained("shibing624/bert4ner-base-chinese")
+label_list = ['I-ORG', 'B-LOC', 'O', 'B-ORG', 'I-LOC', 'I-PER', 'B-TIME', 'I-TIME', 'B-PER']
+sentence = "王宏伟来自北京，是个警察，喜欢去王府井游玩儿。"
+def get_entity(sentence):
+    tokens = tokenizer.tokenize(tokenizer.decode(tokenizer.encode(sentence)))
+    inputs = tokenizer.encode(sentence, return_tensors="pt")
+    with torch.no_grad():
+        outputs = model(inputs).logits
+    predictions = torch.argmax(outputs, dim=2)
+    char_tags = [(token, label_list[prediction]) for token, prediction in zip(tokens, predictions[0].numpy())][1:-1]
+    print(sentence)
+    print(char_tags)
+    pred_labels = [i[1] for i in char_tags]
+    entities = []
+    line_entities = get_entities(pred_labels)
+    for i in line_entities:
+        word = sentence[i[1]: i[2] + 1]
+        entity_type = i[0]
+        entities.append((word, entity_type))
+    print("Sentence entity:")
+    print(entities)
+get_entity(sentence)
+```
+output:
+```shell
+王宏伟来自北京，是个警察，喜欢去王府井游玩儿。
+[('王', 'B-PER'), ('宏', 'I-PER'), ('伟', 'I-PER'), ('来', 'O'), ('自', 'O'), ('北', 'B-LOC'), ('京', 'I-LOC'), ('，', 'O'), ('是', 'O'), ('个', 'O'), ('警', 'O'), ('察', 'O'), ('，', 'O'), ('喜', 'O'), ('欢', 'O'), ('去', 'O'), ('王', 'B-LOC'), ('府', 'I-LOC'), ('井', 'I-LOC'), ('游', 'O'), ('玩', 'O'), ('儿', 'O'), ('。', 'O')]
+Sentence entity:
+[('王宏伟', 'PER'), ('北京', 'LOC'), ('王府井', 'LOC')]
+```
 ### 训练数据集
 #### 中文实体识别数据集