File size: 1,345 Bytes
cde2403
 
 
 
 
c686af1
498f30f
cde2403
 
 
 
 
 
fd0ddab
 
 
 
 
 
 
 
 
 
7406ff5
3847186
 
 
 
 
 
8b7011b
 
 
 
 
 
 
 
 
 
3847186
 
8b7011b
3847186
 
 
 
 
 
 
 
 
 
7406ff5
3847186
 
 
7406ff5
3847186
cde2403
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
---
language: "en"
tags:
- dstc9
widget:
- text: "i want to book the hilton hotel near china town."
- text: "can you reserve A & B restaurant for me?"
---

Only restaurant, hotel, and attraction names are tagged based on the following data and knowledge base.

Data link: https://github.com/alexa/alexa-with-dstc9-track1-dataset

Label map:

"O": 0
"B-hotel": 1
"I-hotel": 2
"B-restaurant": 3 
"I-restaurant": 4
"B-attraction": 5
"I-attraction": 6

```python
from transformers import AutoConfig, AutoModelForTokenClassification, BertTokenizer
from transformers import TokenClassificationPipeline
import json

model_path = "wilsontam/dstc9_ner"

label_map = {
"LABEL_0": "O",
"LABEL_1": "B-hotel",
"LABEL_2": "I-hotel",
"LABEL_3": "B-restaurant",
"LABEL_4": "I-restaurant",
"LABEL_5": "B-attraction",
"LABEL_6": "I-attraction",
}

config = AutoConfig.from_pretrained(
  model_path,
  num_labels=len(label_map),
)   
model = AutoModelForTokenClassification.from_pretrained(
  model_path,
  from_tf=False,
  config=config,
)   
tokenizer = BertTokenizer.from_pretrained(
  model_path,
)

# device=-1: cpu, device=0: gpu
pipeline = TokenClassificationPipeline(model, tokenizer, device=-1)

tokens = pipeline(["i want to book the hilton hotel near china town.", "can you reserve A & B restaurant for me?"])
```

Credit: Jia-Chen Jason Gu, Wilson Tam