speech-test commited on
Commit
7c506fb
1 Parent(s): e370a4d

Update info

Browse files
Files changed (1) hide show
  1. README.md +18 -14
README.md CHANGED
@@ -6,7 +6,6 @@ tags:
6
  - speech
7
  - audio
8
  - hubert
9
- - s3prl
10
  license: apache-2.0
11
  ---
12
 
@@ -16,13 +15,12 @@ license: apache-2.0
16
 
17
  This is a ported version of [S3PRL's Hubert for the SUPERB Intent Classification task](https://github.com/s3prl/s3prl/tree/master/s3prl/downstream/fluent_commands).
18
 
19
- The base model is [hubert-base-ls960](https://huggingface.co/facebook/hubert-base-ls960).
20
- It is pretrained on 16kHz sampled speech audio.
21
- When using the model make sure that your speech input is also sampled at 16Khz.
22
 
23
  For more information refer to [SUPERB: Speech processing Universal PERformance Benchmark](https://arxiv.org/abs/2105.01051)
24
 
25
- ## Task description
26
 
27
  Intent Classification (IC) classifies utterances into predefined classes to determine the intent of
28
  speakers. SUPERB uses the
@@ -38,20 +36,26 @@ For the original model's training and evaluation instructions refer to the
38
  You can use the model directly like so:
39
  ```python
40
  import torch
41
- import numpy as np
42
  from datasets import load_dataset
43
  from transformers import HubertForSequenceClassification, Wav2Vec2FeatureExtractor
44
 
45
- # TODO: replace with the official superb dataset
46
- superb_ks = load_dataset("anton-l/superb_dummy", "ic", split="test")
 
 
 
 
 
 
 
47
  model = HubertForSequenceClassification.from_pretrained("superb/hubert-base-superb-ic")
48
  feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("superb/hubert-base-superb-ic")
49
 
50
- audio = np.array(superb_ks[0]["speech"])
51
  # compute attention masks and normalize the waveform if needed
52
- inputs = feature_extractor(audio, sampling_rate=16_000, return_tensors="pt")
53
 
54
- logits = model(**inputs).logits[0]
55
 
56
  action_ids = torch.argmax(logits[:, :6], dim=-1).tolist()
57
  action_labels = [model.config.id2label[_id] for _id in action_ids]
@@ -67,9 +71,9 @@ location_labels = [model.config.id2label[_id + 20] for _id in location_ids]
67
 
68
  The evaluation metric is accuracy.
69
 
70
- | | `s3prl` | `transformers` |
71
- |------|---------|----------------|
72
- |`test`| TBA | TBA |
73
 
74
  ### BibTeX entry and citation info
75
 
 
6
  - speech
7
  - audio
8
  - hubert
 
9
  license: apache-2.0
10
  ---
11
 
 
15
 
16
  This is a ported version of [S3PRL's Hubert for the SUPERB Intent Classification task](https://github.com/s3prl/s3prl/tree/master/s3prl/downstream/fluent_commands).
17
 
18
+ The base model is [hubert-base-ls960](https://huggingface.co/facebook/hubert-base-ls960), which is pretrained on 16kHz
19
+ sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.
 
20
 
21
  For more information refer to [SUPERB: Speech processing Universal PERformance Benchmark](https://arxiv.org/abs/2105.01051)
22
 
23
+ ## Task and dataset description
24
 
25
  Intent Classification (IC) classifies utterances into predefined classes to determine the intent of
26
  speakers. SUPERB uses the
 
36
  You can use the model directly like so:
37
  ```python
38
  import torch
39
+ import librosa
40
  from datasets import load_dataset
41
  from transformers import HubertForSequenceClassification, Wav2Vec2FeatureExtractor
42
 
43
+ def map_to_array(example):
44
+ speech, _ = librosa.load(example["file"], sr=16000, mono=True)
45
+ example["speech"] = speech
46
+ return example
47
+
48
+ # load a demo dataset and read audio files
49
+ dataset = load_dataset("anton-l/superb_demo", "ic", split="test")
50
+ dataset = dataset.map(map_to_array)
51
+
52
  model = HubertForSequenceClassification.from_pretrained("superb/hubert-base-superb-ic")
53
  feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("superb/hubert-base-superb-ic")
54
 
 
55
  # compute attention masks and normalize the waveform if needed
56
+ inputs = feature_extractor(dataset[:4]["speech"], sampling_rate=16000, padding=True, return_tensors="pt")
57
 
58
+ logits = model(**inputs).logits
59
 
60
  action_ids = torch.argmax(logits[:, :6], dim=-1).tolist()
61
  action_labels = [model.config.id2label[_id] for _id in action_ids]
 
71
 
72
  The evaluation metric is accuracy.
73
 
74
+ | | **s3prl** | **transformers** |
75
+ |--------|-----------|------------------|
76
+ |**test**| `0.9834` | `N/A` |
77
 
78
  ### BibTeX entry and citation info
79