File size: 8,346 Bytes
2f088d9
e4379f8
2f088d9
9ca293b
 
 
0dc2b1d
9ca293b
 
b735079
269b7cd
c544217
 
b735079
9158fc8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9146844
 
9158fc8
269b7cd
 
 
9158fc8
269b7cd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9158fc8
 
24f7c24
 
9158fc8
 
 
 
 
 
d7a4684
3bda9fd
 
 
 
 
 
 
 
 
d7a4684
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a896ab1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24f7c24
 
a896ab1
 
 
 
9158fc8
 
24f7c24
 
 
 
 
c7cc7c1
 
 
24f7c24
 
e32766f
5e5942a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24f7c24
 
e4379f8
 
24f7c24
 
 
 
 
269b7cd
24f7c24
269b7cd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
import streamlit as st
from transformers import AutoTokenizer, AutoModelForQuestionAnswering, pipeline

st.title('Question-Answering NLU')

st.sidebar.title('Navigation')
menu = st.sidebar.radio("", options=["Demo", "Parsing NLU data into SQuAD 2.0", "Training",
                                     "Evaluation"], index=0)


if menu == "Demo":

    st.markdown('''

        Question Answering NLU (QANLU) is an approach that maps the NLU task into question answering, 
        leveraging pre-trained question-answering models to perform well on few-shot settings. Instead of 
        training an intent classifier or a slot tagger, for example, we can ask the model intent- and 
        slot-related questions in natural language: 
        
        ```
        Context : I'm looking for a cheap flight to Boston.
        
        Question: Is the user looking to book a flight?
        Answer  : Yes
        
        Question: Is the user asking about departure time?
        Answer  : No
        
        Question: What price is the user looking for?
        Answer  : cheap
        
        Question: Where is the user flying from?
        Answer  : (empty)
        ```
        
        Thus, by asking questions for each intent and slot in natural language, we can effectively construct an NLU hypothesis. For more details,
        please read the paper: 
        [Language model is all you need: Natural language understanding as question answering](https://assets.amazon.science/33/ea/800419b24a09876601d8ab99bfb9/language-model-is-all-you-need-natural-language-understanding-as-question-answering.pdf).
        
        In this Space, we will see how to transform an example
        NLU dataset (e.g. utterances and intent / slot annotations) into [SQuAD 2.0 format](https://rajpurkar.github.io/SQuAD-explorer/explore/v2.0/dev/)
        question-answering data that can be used by QANLU.
        
        ### Demo

        Feel free to query the pre-trained QA-NLU model using the buttons below. 
        
        *Please note that this model has been trained on ATIS and may be need to be further fine-tuned to support intents and slots that are not covered in ATIS*.
    ''')
    
    tokenizer = AutoTokenizer.from_pretrained("AmazonScience/qanlu")

    model = AutoModelForQuestionAnswering.from_pretrained("AmazonScience/qanlu")

    qa_pipeline = pipeline('question-answering', model=model, tokenizer=tokenizer)
    
    context = st.text_input(
        'Please enter the context (remember to include "Yes. No. " in the beginning):',
        value="Yes. No. I want a cheap flight to Boston."
    )
    question = st.text_input(
        'Please enter the intent question:',
        value="Are they looking for a flight?"
    )


    qa_input = {
      'context': context,
      'question': question
    }

    if st.button('Ask QANLU'):
        answer = qa_pipeline(qa_input)
        st.write(answer)

elif menu == "Parsing NLU data into SQuAD 2.0":
    st.header('QA-NLU Data Parsing')
    
    st.markdown('''
        Here, we show a small example of how NLU data can be transformed into QANLU data.
        The same method can be used to transform [MATIS++](https://github.com/amazon-research/multiatis) 
        NLU data (e.g. utterances and intent / slot annotations) into [SQuAD 2.0 format](https://rajpurkar.github.io/SQuAD-explorer/explore/v2.0/dev/)
        question-answering data that can be used by QANLU.
        
        Here is an example dataset with three intents and two examples per intent: 
        
        ````
        restaurant, I am looking for some Vietnamese food
        restaurant, What is there to eat around here?
        music, Play my workout playlist
        music, Can you find Bob Dylan songs?
        flight, Show me flights from Oakland to Dallas
        flight, I want two economy tickets from Miami to Chicago
        ````
        
        Now, we need to define some questions, per intent. We can use free-form questions or use templates.
        
        ````
        {
            'restaurant': [
                'Did they ask for a restaurant?',
                'Did they mention a restaurant?'
            ],
            'music': [
                'Did they ask for music?',
                'Do they want to play music?'
            ],
            'flight': [
                'Did they ask for a flight?',
                'Do they want to book a flight?'
            ]
        }
        ````
        
        The next step is to run the `atis.py` script from the [QA-NLU Amazon Research repository](https://github.com/amazon-research/question-answering-nlu).
        That script will produce a json file that looks like this:
        
        ````
        {
        "version": 1.0,
        "data": [
            {
                "title": "MultiATIS++",
                "paragraphs": [
                    {
                        "context": "yes. no. i am looking for some vietnamese food",
                        "qas": [
                            {
                                "question": "did they ask for a restaurant?",
                                "id": "49f1180cb9ce4178a8a90f76c21f69b4",
                                "is_impossible": false,
                                "answers": [
                                    {
                                        "text": "yes",
                                        "answer_start": 0
                                    }
                                ],
                                "slot": "",
                                "intent": "restaurant"
                            },
                            {
                                "question": "did they ask for music?",
                                "id": "a7ffe039fb3e4843ae16d5a68194f45e",
                                "is_impossible": false,
                                "answers": [
                                    {
                                        "text": "no",
                                        "answer_start": 5
                                    }
                                ],
                                "slot": "",
                                "intent": "restaurant"
                            },
                            ... <More questions>
                            
                ... <More paragraphs>
        ````
        
        There are many tunable parameters when generating the above file, such as how many negative examples to include per question. Follow the same process for training a slot-tagging model.
        
    ''')
    
elif menu == "Training":
    st.header('QA-NLU Training')
    
    st.markdown('''
        To train a QA-NLU model on the data we created, we use the `run_squad.py` script from [huggingface](https://github.com/huggingface/transformers/blob/master/examples/legacy/question-answering/run_squad.py) and a SQuAD-trained QA model as our base. As an example, we can use `deepset/roberta-base-squad2` model from [here](https://huggingface.co/deepset/roberta-base-squad2) (assuming 8 GPUs are present):
    ''')
    
    st.code('''
        mkdir models
        
        python -m torch.distributed.launch --nproc_per_node=8 run_squad.py \\
            --model_type roberta \\
            --model_name_or_path deepset/roberta-base-squad2 \\
            --do_train \\
            --do_eval \\
            --do_lower_case \\
            --train_file data/matis_en_train_squad.json \\
            --predict_file data/matis_en_test_squad.json \\
            --learning_rate 3e-5 \\
            --num_train_epochs 2 \\
            --max_seq_length 384 \\
            --doc_stride 64 \\
            --output_dir models/qanlu/ \\
            --per_gpu_train_batch_size 8 \\
            --overwrite_output_dir \\
            --version_2_with_negative \\
            --save_steps 100000 \\
            --gradient_accumulation_steps 8 \\
            --seed $RANDOM
    ''')

elif menu == "Evaluation":
    st.header('QA-NLU Evaluation')
    
    st.markdown('''
        To assess the performance of the trained model, we can use the `calculate_pr.py` script from the [QA-NLU Amazon Research repository](https://github.com/amazon-research/question-answering-nlu).
        
        Feel free to query the pre-trained QA-NLU model in the Demo section.
    ''')