speech-test commited on
Commit
e370a4d
1 Parent(s): 4cc1cb6

Add task description

Browse files
Files changed (1) hide show
  1. README.md +83 -0
README.md ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ datasets:
4
+ - superb
5
+ tags:
6
+ - speech
7
+ - audio
8
+ - hubert
9
+ - s3prl
10
+ license: apache-2.0
11
+ ---
12
+
13
+ # Hubert-Base for Intent Classification
14
+
15
+ ## Model description
16
+
17
+ This is a ported version of [S3PRL's Hubert for the SUPERB Intent Classification task](https://github.com/s3prl/s3prl/tree/master/s3prl/downstream/fluent_commands).
18
+
19
+ The base model is [hubert-base-ls960](https://huggingface.co/facebook/hubert-base-ls960).
20
+ It is pretrained on 16kHz sampled speech audio.
21
+ When using the model make sure that your speech input is also sampled at 16Khz.
22
+
23
+ For more information refer to [SUPERB: Speech processing Universal PERformance Benchmark](https://arxiv.org/abs/2105.01051)
24
+
25
+ ## Task description
26
+
27
+ Intent Classification (IC) classifies utterances into predefined classes to determine the intent of
28
+ speakers. SUPERB uses the
29
+ [Fluent Speech Commands](https://fluent.ai/fluent-speech-commands-a-dataset-for-spoken-language-understanding-research/)
30
+ dataset, where each utterance is tagged with three intent labels: **action**, **object**, and **location**.
31
+
32
+ For the original model's training and evaluation instructions refer to the
33
+ [S3PRL downstream task README](https://github.com/s3prl/s3prl/tree/master/s3prl/downstream#ic-intent-classification---fluent-speech-commands).
34
+
35
+
36
+ ## Usage examples
37
+
38
+ You can use the model directly like so:
39
+ ```python
40
+ import torch
41
+ import numpy as np
42
+ from datasets import load_dataset
43
+ from transformers import HubertForSequenceClassification, Wav2Vec2FeatureExtractor
44
+
45
+ # TODO: replace with the official superb dataset
46
+ superb_ks = load_dataset("anton-l/superb_dummy", "ic", split="test")
47
+ model = HubertForSequenceClassification.from_pretrained("superb/hubert-base-superb-ic")
48
+ feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("superb/hubert-base-superb-ic")
49
+
50
+ audio = np.array(superb_ks[0]["speech"])
51
+ # compute attention masks and normalize the waveform if needed
52
+ inputs = feature_extractor(audio, sampling_rate=16_000, return_tensors="pt")
53
+
54
+ logits = model(**inputs).logits[0]
55
+
56
+ action_ids = torch.argmax(logits[:, :6], dim=-1).tolist()
57
+ action_labels = [model.config.id2label[_id] for _id in action_ids]
58
+
59
+ object_ids = torch.argmax(logits[:, 6:20], dim=-1).tolist()
60
+ object_labels = [model.config.id2label[_id + 6] for _id in object_ids]
61
+
62
+ location_ids = torch.argmax(logits[:, 20:24], dim=-1).tolist()
63
+ location_labels = [model.config.id2label[_id + 20] for _id in location_ids]
64
+ ```
65
+
66
+ ## Eval results
67
+
68
+ The evaluation metric is accuracy.
69
+
70
+ | | `s3prl` | `transformers` |
71
+ |------|---------|----------------|
72
+ |`test`| TBA | TBA |
73
+
74
+ ### BibTeX entry and citation info
75
+
76
+ ```bibtex
77
+ @article{yang2021superb,
78
+ title={SUPERB: Speech processing Universal PERformance Benchmark},
79
+ author={Yang, Shu-wen and Chi, Po-Han and Chuang, Yung-Sung and Lai, Cheng-I Jeff and Lakhotia, Kushal and Lin, Yist Y and Liu, Andy T and Shi, Jiatong and Chang, Xuankai and Lin, Guan-Ting and others},
80
+ journal={arXiv preprint arXiv:2105.01051},
81
+ year={2021}
82
+ }
83
+ ```