File size: 8,053 Bytes
1f675c8
217277b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2996b73
 
4785938
 
 
 
 
1f675c8
 
217277b
 
 
 
 
 
 
 
 
4785938
217277b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0ea723c
 
 
 
 
 
217277b
 
 
 
 
 
0ea723c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
217277b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
---
tags:
- Transformers
- text-classification
- intent-classification
- multi-class-classification
- natural-language-understanding
languages:
- af-ZA
- am-ET
- ar-SA
- az-AZ
- bn-BD
- cy-GB
- da-DK
- de-DE
- el-GR
- en-US
- es-ES
- fa-IR
- fi-FI
- fr-FR
- he-IL
- hi-IN
- hu-HU
- hy-AM
- id-ID
- is-IS
- it-IT
- ja-JP
- jv-ID
- ka-GE
- km-KH
- kn-IN
- ko-KR
- lv-LV
- ml-IN
- mn-MN
- ms-MY
- my-MM
- nb-NO
- nl-NL
- pl-PL
- pt-PT
- ro-RO
- ru-RU
- sl-SL
- sq-AL
- sv-SE
- sw-KE
- ta-IN
- te-IN
- th-TH
- tl-PH
- tr-TR
- ur-PK
- vi-VN
- zh-CN
- zh-TW
multilinguality:
- af-ZA
- am-ET
- ar-SA
- az-AZ
- bn-BD
- cy-GB
- da-DK
- de-DE
- el-GR
- en-US
- es-ES
- fa-IR
- fi-FI
- fr-FR
- he-IL
- hi-IN
- hu-HU
- hy-AM
- id-ID
- is-IS
- it-IT
- ja-JP
- jv-ID
- ka-GE
- km-KH
- kn-IN
- ko-KR
- lv-LV
- ml-IN
- mn-MN
- ms-MY
- my-MM
- nb-NO
- nl-NL
- pl-PL
- pt-PT
- ro-RO
- ru-RU
- sl-SL
- sq-AL
- sv-SE
- sw-KE
- ta-IN
- te-IN
- th-TH
- tl-PH
- tr-TR
- ur-PK
- vi-VN
- zh-CN
- zh-TW
datasets:
- qanastek/MASSIVE
widget:
- text: "wake me up at five am this week"
- text: "je veux écouter la chanson de jacques brel encore une fois"
- text: "quiero escuchar la canción de arijit singh una vez más"
- text: "olly onde é que á um parque por perto onde eu possa correr"
- text: "פרק הבא בפודקאסט בבקשה"
- text: "亚马逊股价"
- text: "найди билет на поезд в санкт-петербург"
license: cc-by-4.0
---

**People Involved**

* [LABRAK Yanis](https://www.linkedin.com/in/yanis-labrak-8a7412145/) (1)

**Affiliations**

1. [LIA, NLP team](https://lia.univ-avignon.fr/), Avignon University, Avignon, France.

## Demo: How to use in HuggingFace Transformers Pipeline

Requires [transformers](https://pypi.org/project/transformers/): ```pip install transformers```

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline

model_name = 'qanastek/XLMRoberta-Alexa-Intents-Classification'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer)

res = classifier("réveille-moi à neuf heures du matin le vendredi")
print(res)
```

Outputs:

```python
[{'label': 'alarm_set', 'score': 0.9998375177383423}]
```

## Training data

[MASSIVE](https://huggingface.co/datasets/qanastek/MASSIVE) is a parallel dataset of > 1M utterances across 51 languages with annotations for the Natural Language Understanding tasks of intent prediction and slot annotation. Utterances span 60 intents and include 55 slot types. MASSIVE was created by localizing the SLURP dataset, composed of general Intelligent Voice Assistant single-shot interactions.

## Intents

* audio_volume_other
* play_music
* iot_hue_lighton
* general_greet
* calendar_set
* audio_volume_down
* social_query
* audio_volume_mute
* iot_wemo_on
* iot_hue_lightup
* audio_volume_up
* iot_coffee
* takeaway_query
* qa_maths
* play_game
* cooking_query
* iot_hue_lightdim
* iot_wemo_off
* music_settings
* weather_query
* news_query
* alarm_remove
* social_post
* recommendation_events
* transport_taxi
* takeaway_order
* music_query
* calendar_query
* lists_query
* qa_currency
* recommendation_movies
* general_joke
* recommendation_locations
* email_querycontact
* lists_remove
* play_audiobook
* email_addcontact
* lists_createoradd
* play_radio
* qa_stock
* alarm_query
* email_sendemail
* general_quirky
* music_likeness
* cooking_recipe
* email_query
* datetime_query
* transport_traffic
* play_podcasts
* iot_hue_lightchange
* calendar_remove
* transport_query
* transport_ticket
* qa_factoid
* iot_cleaning
* alarm_set
* datetime_convert
* iot_hue_lightoff
* qa_definition
* music_dislikeness

## Evaluation results

```plain
                          precision    recall  f1-score   support

             alarm_query     0.9661    0.9037    0.9338      1734
            alarm_remove     0.9484    0.9608    0.9545      1071
               alarm_set     0.8611    0.9254    0.8921      2091
       audio_volume_down     0.8657    0.9537    0.9075       561
       audio_volume_mute     0.8608    0.9130    0.8861      1632
      audio_volume_other     0.8684    0.5392    0.6653       306
         audio_volume_up     0.7198    0.8446    0.7772       663
          calendar_query     0.7555    0.8229    0.7878      6426
         calendar_remove     0.8688    0.9441    0.9049      3417
            calendar_set     0.9092    0.9014    0.9053     10659
           cooking_query     0.0000    0.0000    0.0000         0
          cooking_recipe     0.9282    0.8592    0.8924      3672
        datetime_convert     0.8144    0.7686    0.7909       765
          datetime_query     0.9152    0.9305    0.9228      4488
        email_addcontact     0.6482    0.8431    0.7330       612
             email_query     0.9629    0.9319    0.9472      6069
      email_querycontact     0.6853    0.8032    0.7396      1326
         email_sendemail     0.9530    0.9381    0.9455      5814
           general_greet     0.1026    0.3922    0.1626        51
            general_joke     0.9305    0.9123    0.9213       969
          general_quirky     0.6984    0.5417    0.6102      8619
            iot_cleaning     0.9590    0.9359    0.9473      1326
              iot_coffee     0.9304    0.9749    0.9521      1836
     iot_hue_lightchange     0.8794    0.9374    0.9075      1836
        iot_hue_lightdim     0.8695    0.8711    0.8703      1071
        iot_hue_lightoff     0.9440    0.9229    0.9334      2193
         iot_hue_lighton     0.4545    0.5882    0.5128       153
         iot_hue_lightup     0.9271    0.8315    0.8767      1377
            iot_wemo_off     0.9615    0.8715    0.9143       918
             iot_wemo_on     0.8455    0.7941    0.8190       510
       lists_createoradd     0.8437    0.8356    0.8396      1989
             lists_query     0.8918    0.8335    0.8617      2601
            lists_remove     0.9536    0.8601    0.9044      2652
       music_dislikeness     0.7725    0.7157    0.7430       204
          music_likeness     0.8570    0.8159    0.8359      1836
             music_query     0.8667    0.8050    0.8347      1785
          music_settings     0.4024    0.3301    0.3627       306
              news_query     0.8343    0.8657    0.8498      6324
          play_audiobook     0.8172    0.8125    0.8149      2091
               play_game     0.8666    0.8403    0.8532      1785
              play_music     0.8683    0.8845    0.8763      8976
           play_podcasts     0.8925    0.9125    0.9024      3213
              play_radio     0.8260    0.8935    0.8585      3672
             qa_currency     0.9459    0.9578    0.9518      1989
           qa_definition     0.8638    0.8552    0.8595      2907
              qa_factoid     0.7959    0.8178    0.8067      7191
                qa_maths     0.8937    0.9302    0.9116      1275
                qa_stock     0.7995    0.9412    0.8646      1326
   recommendation_events     0.7646    0.7702    0.7674      2193
recommendation_locations     0.7489    0.8830    0.8104      1581
   recommendation_movies     0.6907    0.7706    0.7285      1020
             social_post     0.9623    0.9080    0.9344      4131
            social_query     0.8104    0.7914    0.8008      1275
          takeaway_order     0.7697    0.8458    0.8059      1122
          takeaway_query     0.9059    0.8571    0.8808      1785
         transport_query     0.8141    0.7559    0.7839      2601
          transport_taxi     0.9222    0.9403    0.9312      1173
        transport_ticket     0.9259    0.9384    0.9321      1785
       transport_traffic     0.6919    0.9660    0.8063       765
           weather_query     0.9387    0.9492    0.9439      7956

                accuracy                         0.8617    151674
               macro avg     0.8162    0.8273    0.8178    151674
            weighted avg     0.8639    0.8617    0.8613    151674
```