Edit model card

UniverSLU-17 Natural Phrase

UniverSLU-17 Natural Phrase is a Multi-task Spoken Language Understanding model from CMU WAVLab. It adapts Whisper to additional tasks through instruction tuning, i.e., finetuning by describing the task using natural language instructions followed by the list of label options. Our demo is available here. More details about the SLU tasks that the model is trained on and it's performance on these tasks can be found in our paper: https://aclanthology.org/2024.naacl-long.151/

Citing UniverSLU, ESPnet

@inproceedings{watanabe2018espnet,
  author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
  title={{ESPnet}: End-to-End Speech Processing Toolkit},
  year={2018},
  booktitle={Proceedings of Interspeech},
  pages={2207--2211},
  doi={10.21437/Interspeech.2018-1456},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
}

@inproceedings{arora-etal-2024-universlu,
    title = "{U}niver{SLU}: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions",
    author = "Arora, Siddhant  and
      Futami, Hayato  and
      Jung, Jee-weon  and
      Peng, Yifan  and
      Sharma, Roshan  and
      Kashiwagi, Yosuke  and
      Tsunoo, Emiru  and
      Livescu, Karen  and
      Watanabe, Shinji",
    editor = "Duh, Kevin  and
      Gomez, Helena  and
      Bethard, Steven",
    booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
    month = jun,
    year = "2024",
    address = "Mexico City, Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.naacl-long.151",
    doi = "10.18653/v1/2024.naacl-long.151",
    pages = "2754--2774",
    abstract = "Recent studies leverage large language models with multi-tasking capabilities, using natural language prompts to guide the model{'}s behavior and surpassing performance of task-specific models. Motivated by this, we ask: can we build a single model that jointly performs various spoken language understanding (SLU) tasks? We start by adapting a pre-trained automatic speech recognition model to additional tasks using single-token task specifiers. We enhance this approach through instruction tuning, i.e., finetuning by describing the task using natural language instructions followed by the list of label options. Our approach can generalize to new task descriptions for the seen tasks during inference, thereby enhancing its user-friendliness. We demonstrate the efficacy of our single multi-task learning model {``}UniverSLU{''} for 12 speech classification and sequence generation task types spanning 17 datasets and 9 languages. On most tasks, UniverSLU achieves competitive performance and often even surpasses task-specific models. Additionally, we assess the zero-shot capabilities, finding that the model generalizes to new datasets and languages for seen task types.",
}


Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Space using espnet/UniverSLU-17-Natural-Phrase 1