|
--- |
|
language: "en" |
|
tags: |
|
- dstc9 |
|
widget: |
|
- text: "i want to book the hilton hotel near china town." |
|
- text: "can you reserve A & B restaurant for me?" |
|
--- |
|
|
|
Only restaurant, hotel, and attraction names are tagged based on the following data and knowledge base. |
|
|
|
Data link: https://github.com/alexa/alexa-with-dstc9-track1-dataset |
|
|
|
Label map: |
|
|
|
"O": 0 |
|
"B-hotel": 1 |
|
"I-hotel": 2 |
|
"B-restaurant": 3 |
|
"I-restaurant": 4 |
|
"B-attraction": 5 |
|
"I-attraction": 6 |
|
|
|
```python |
|
from transformers import AutoConfig, AutoModelForTokenClassification, BertTokenizer |
|
from transformers import TokenClassificationPipeline |
|
import json |
|
|
|
model_path = "wilsontam/dstc9_ner" |
|
|
|
label_map = { |
|
"LABEL_0": "O", |
|
"LABEL_1": "B-hotel", |
|
"LABEL_2": "I-hotel", |
|
"LABEL_3": "B-restaurant", |
|
"LABEL_4": "I-restaurant", |
|
"LABEL_5": "B-attraction", |
|
"LABEL_6": "I-attraction", |
|
} |
|
|
|
config = AutoConfig.from_pretrained( |
|
model_path, |
|
num_labels=len(label_map), |
|
) |
|
model = AutoModelForTokenClassification.from_pretrained( |
|
model_path, |
|
from_tf=False, |
|
config=config, |
|
) |
|
tokenizer = BertTokenizer.from_pretrained( |
|
model_path, |
|
) |
|
|
|
# device=-1: cpu, device=0: gpu |
|
pipeline = TokenClassificationPipeline(model, tokenizer, device=-1) |
|
|
|
tokens = pipeline(["i want to book the hilton hotel near china town.", "can you reserve A & B restaurant for me?"]) |
|
``` |
|
|
|
Credit: Jia-Chen Jason Gu, Wilson Tam |
|
|