qanastek's picture
First commit
b0a8168
metadata
tags:
  - Transformers
  - Token Classification
  - Slot Annotation
languages:
  - af-ZA
  - am-ET
  - ar-SA
  - az-AZ
  - bn-BD
  - cy-GB
  - da-DK
  - de-DE
  - el-GR
  - en-US
  - es-ES
  - fa-IR
  - fi-FI
  - fr-FR
  - he-IL
  - hi-IN
  - hu-HU
  - hy-AM
  - id-ID
  - is-IS
  - it-IT
  - ja-JP
  - jv-ID
  - ka-GE
  - km-KH
  - kn-IN
  - ko-KR
  - lv-LV
  - ml-IN
  - mn-MN
  - ms-MY
  - my-MM
  - nb-NO
  - nl-NL
  - pl-PL
  - pt-PT
  - ro-RO
  - ru-RU
  - sl-SL
  - sq-AL
  - sv-SE
  - sw-KE
  - ta-IN
  - te-IN
  - th-TH
  - tl-PH
  - tr-TR
  - ur-PK
  - vi-VN
  - zh-CN
  - zh-TW
multilinguality:
  - af-ZA
  - am-ET
  - ar-SA
  - az-AZ
  - bn-BD
  - cy-GB
  - da-DK
  - de-DE
  - el-GR
  - en-US
  - es-ES
  - fa-IR
  - fi-FI
  - fr-FR
  - he-IL
  - hi-IN
  - hu-HU
  - hy-AM
  - id-ID
  - is-IS
  - it-IT
  - ja-JP
  - jv-ID
  - ka-GE
  - km-KH
  - kn-IN
  - ko-KR
  - lv-LV
  - ml-IN
  - mn-MN
  - ms-MY
  - my-MM
  - nb-NO
  - nl-NL
  - pl-PL
  - pt-PT
  - ro-RO
  - ru-RU
  - sl-SL
  - sq-AL
  - sv-SE
  - sw-KE
  - ta-IN
  - te-IN
  - th-TH
  - tl-PH
  - tr-TR
  - ur-PK
  - vi-VN
  - zh-CN
  - zh-TW
datasets:
  - qanastek/MASSIVE
widget:
  - text: wake me up at five am this week
  - text: je veux écouter la chanson de jacques brel encore une fois
  - text: quiero escuchar la canción de arijit singh una vez más
  - text: olly onde é que á um parque por perto onde eu possa correr
  - text: פרק הבא בפודקאסט בבקשה
  - text: 亚马逊股价
  - text: найди билет на поезд в санкт-петербург
license: cc-by-4.0

People Involved

Affiliations

  1. LIA, NLP team, Avignon University, Avignon, France.

Demo: How to use in HuggingFace Transformers Pipeline

Requires transformers: pip install transformers

from transformers import AutoTokenizer, AutoModelForTokenClassification, TokenClassificationPipeline

tokenizer = AutoTokenizer.from_pretrained('qanastek/XLMRoberta-Alexa-Intents-NER-NLU')
model = AutoModelForTokenClassification.from_pretrained('qanastek/XLMRoberta-Alexa-Intents-NER-NLU')
predict = TokenClassificationPipeline(model=model, tokenizer=tokenizer)
res = predict("réveille-moi à neuf heures du matin le vendredi")
print(res)

Outputs:

[{'word': '▁neuf', 'score': 0.9911066293716431, 'entity': 'B-time', 'index': 6, 'start': 15, 'end': 19},
{'word': '▁heures', 'score': 0.9200698733329773, 'entity': 'I-time', 'index': 7, 'start': 20, 'end': 26},
{'word': '▁du', 'score': 0.8476170897483826, 'entity': 'I-time', 'index': 8, 'start': 27, 'end': 29},
{'word': '▁matin', 'score': 0.8271021246910095, 'entity': 'I-time', 'index': 9, 'start': 30, 'end': 35},
{'word': '▁vendredi', 'score': 0.9813069701194763, 'entity': 'B-date', 'index': 11, 'start': 39, 'end': 47}]

Training data

MASSIVE is a parallel dataset of > 1M utterances across 51 languages with annotations for the Natural Language Understanding tasks of intent prediction and slot annotation. Utterances span 60 intents and include 55 slot types. MASSIVE was created by localizing the SLURP dataset, composed of general Intelligent Voice Assistant single-shot interactions.

Named Entities

  • O
  • currency_name
  • personal_info
  • app_name
  • list_name
  • alarm_type
  • cooking_type
  • time_zone
  • media_type
  • change_amount
  • transport_type
  • drink_type
  • news_topic
  • artist_name
  • weather_descriptor
  • transport_name
  • player_setting
  • email_folder
  • music_album
  • coffee_type
  • meal_type
  • song_name
  • date
  • movie_type
  • movie_name
  • game_name
  • business_type
  • music_descriptor
  • joke_type
  • music_genre
  • device_type
  • house_place
  • place_name
  • sport_type
  • podcast_name
  • game_type
  • timeofday
  • business_name
  • time
  • definition_word
  • audiobook_author
  • event_name
  • general_frequency
  • relation
  • color_type
  • audiobook_name
  • food_type
  • person
  • transport_agency
  • email_address
  • podcast_descriptor
  • order_type
  • ingredient
  • transport_descriptor
  • playlist_name
  • radio_name

Evaluation results

                        precision    recall  f1-score   support

          B-alarm_type     0.8077    0.1074    0.1896       391
            B-app_name     0.3954    0.5581    0.4629       559
         B-artist_name     0.7424    0.8188    0.7787      7594
    B-audiobook_author     0.7626    0.2751    0.4044       607
      B-audiobook_name     0.7448    0.6036    0.6668      2863
       B-business_name     0.7705    0.8122    0.7908     11120
       B-business_type     0.6715    0.6630    0.6672      3596
       B-change_amount     0.7830    0.7719    0.7774       846
         B-coffee_type     0.4705    0.7298    0.5722       459
          B-color_type     0.6674    0.9244    0.7751      2618
        B-cooking_type     0.8630    0.4087    0.5547       986
       B-currency_name     0.8851    0.9250    0.9046      4496
                B-date     0.8638    0.9276    0.8946     33909
     B-definition_word     0.9051    0.8398    0.8712      6474
         B-device_type     0.8446    0.8407    0.8426      7459
          B-drink_type     0.0000    0.0000    0.0000       127
       B-email_address     0.9084    0.9684    0.9374      1075
        B-email_folder     0.7464    0.9457    0.8343       663
          B-event_name     0.7695    0.7648    0.7671     26880
           B-food_type     0.6861    0.8487    0.7588      9047
           B-game_name     0.8716    0.7366    0.7984      2866
   B-general_frequency     0.7587    0.7920    0.7750      1548
         B-house_place     0.9137    0.8814    0.8972      5765
          B-ingredient     0.6176    0.0479    0.0890       876
           B-joke_type     0.8029    0.7780    0.7903      1126
           B-list_name     0.8563    0.7582    0.8043      6195
           B-meal_type     0.5883    0.9148    0.7161      1948
          B-media_type     0.9111    0.8302    0.8688     13301
          B-movie_name     0.5312    0.3696    0.4359       230
          B-movie_type     0.2829    0.3931    0.3290       290
         B-music_album     0.0000    0.0000    0.0000       104
    B-music_descriptor     0.2987    0.2500    0.2722       760
         B-music_genre     0.7731    0.7953    0.7840      4821
          B-news_topic     0.6437    0.6668    0.6551      5441
          B-order_type     0.6739    0.8073    0.7346      2091
              B-person     0.8290    0.9138    0.8693     25490
       B-personal_info     0.6177    0.6765    0.6458      1249
          B-place_name     0.8696    0.8252    0.8468     30683
      B-player_setting     0.6816    0.6156    0.6469      4048
       B-playlist_name     0.5858    0.4923    0.5350      1942
  B-podcast_descriptor     0.7211    0.5209    0.6049      2367
        B-podcast_name     0.6930    0.5462    0.6109      2091
          B-radio_name     0.7304    0.7598    0.7448      4126
            B-relation     0.7852    0.8708    0.8258      5689
           B-song_name     0.6053    0.6909    0.6453      4131
          B-sport_type     0.0000    0.0000    0.0000         0
                B-time     0.8729    0.7341    0.7975     17338
           B-time_zone     0.7031    0.6352    0.6674      1428
           B-timeofday     0.7367    0.8168    0.7747      4853
    B-transport_agency     0.8161    0.7100    0.7594      1000
B-transport_descriptor     0.8333    0.1014    0.1807       148
      B-transport_name     0.7979    0.3002    0.4363       513
      B-transport_type     0.9332    0.8942    0.9133      5858
  B-weather_descriptor     0.8501    0.7775    0.8122      8815
          I-alarm_type     1.0000    0.1500    0.2609       120
            I-app_name     0.0000    0.0000    0.0000       101
         I-artist_name     0.7325    0.8246    0.7758      3819
    I-audiobook_author     0.8373    0.2800    0.4197       625
      I-audiobook_name     0.6729    0.5007    0.5742      2227
       I-business_name     0.7715    0.6326    0.6952      4265
       I-business_type     0.5932    0.4661    0.5220      1004
       I-change_amount     0.7876    0.8715    0.8274       817
         I-coffee_type     0.9160    0.5755    0.7069       417
          I-color_type     0.2781    0.1912    0.2266       272
        I-cooking_type     0.0000    0.0000    0.0000        17
       I-currency_name     0.8220    0.9327    0.8738      2005
                I-date     0.7864    0.8482    0.8161     15957
     I-definition_word     0.8707    0.6955    0.7733      1859
         I-device_type     0.8837    0.8907    0.8872      4172
          I-drink_type     0.0000    0.0000    0.0000         4
       I-email_address     0.9701    0.9794    0.9747      2911
        I-email_folder     0.6881    0.9186    0.7868       221
          I-event_name     0.6499    0.5553    0.5989     11745
           I-food_type     0.6568    0.7825    0.7141      3306
           I-game_name     0.8183    0.6053    0.6959      1652
   I-general_frequency     0.8168    0.8338    0.8252      1625
         I-house_place     0.9368    0.7289    0.8199      1302
          I-ingredient     0.2857    0.0070    0.0137       285
           I-joke_type     0.7941    0.6117    0.6910       309
           I-list_name     0.7382    0.3989    0.5179      1993
           I-meal_type     0.5699    0.8174    0.6716       334
          I-media_type     0.8355    0.7420    0.7860      4450
          I-movie_name     0.6250    0.1244    0.2075       201
          I-movie_type     0.0190    0.0541    0.0281        74
         I-music_album     0.0000    0.0000    0.0000        42
    I-music_descriptor     0.4231    0.5631    0.4832       293
         I-music_genre     0.6716    0.5455    0.6020      1087
          I-news_topic     0.6384    0.4187    0.5057      3824
          I-order_type     0.4850    0.8050    0.6053       523
              I-person     0.7844    0.8400    0.8113      8218
       I-personal_info     0.7343    0.8363    0.7820       727
          I-place_name     0.6855    0.6703    0.6778      8198
      I-player_setting     0.5293    0.4982    0.5132      1361
       I-playlist_name     0.4696    0.5182    0.4927      1729
  I-podcast_descriptor     0.7448    0.4563    0.5659      2584
        I-podcast_name     0.5923    0.5192    0.5534      1248
          I-radio_name     0.7907    0.8404    0.8148      5766
            I-relation     0.4455    0.4873    0.4655       788
           I-song_name     0.5495    0.7433    0.6319      3120
          I-sport_type     0.0000    0.0000    0.0000         0
                I-time     0.8115    0.8288    0.8201     18118
           I-time_zone     0.9708    0.1670    0.2850      1395
           I-timeofday     0.7531    0.5734    0.6511      1287
    I-transport_agency     0.6667    0.1961    0.3030        51
I-transport_descriptor     0.6429    0.0849    0.1500       106
      I-transport_name     0.8469    0.5453    0.6634       497
      I-transport_type     0.9556    0.7248    0.8243       505
  I-weather_descriptor     0.7756    0.4046    0.5318      2887
                     O     0.9502    0.9560    0.9531   1031927

              accuracy                         0.9031   1455270
             macro avg     0.6674    0.5860    0.5972   1455270
          weighted avg     0.9032    0.9031    0.9013   1455270