nguyenvulebinh commited on
Commit
5b9fb17
1 Parent(s): 36d7278

add sample audio file

Browse files
.ipynb_checkpoints/README-checkpoint.md ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: vi
3
+ datasets:
4
+ - common_voice
5
+ - librispeech_asr
6
+ - how2
7
+ - must-c-v1
8
+ - must-c-v2
9
+ - europarl
10
+ - tedlium
11
+ tags:
12
+ - audio
13
+ - automatic-speech-recognition
14
+ license: cc-by-nc-4.0
15
+ ---
16
+
17
+ # Fine-Tune Wav2Vec2 large model for English ASR
18
+
19
+
20
+ ### Data for fine-tune
21
+
22
+ | Dataset | Duration in hours |
23
+ |--------------|-------------------|
24
+ | Common Voice | 1667 |
25
+ | Europarl | 85 |
26
+ | How2 | 356 |
27
+ | Librispeech | 936 |
28
+ | MuST-C v1 | 407 |
29
+ | MuST-C v2 | 482 |
30
+ | Tedlium | 482 |
31
+
32
+
33
+ ### Evaluation result
34
+
35
+ | Dataset | Duration in hours | WER w/o LM | WER with LM |
36
+ |-------------|-------------------|------------|-------------|
37
+ | Librispeech | 5.4 | 2.9 | 1.1 |
38
+ | Tedlium | 2.6 | 7.9 | 5.4 |
39
+
40
+
41
+ ### Usage
42
+
43
+ [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1FAhtGvjRdHT4W0KeMdMMlL7sm6Hbe7dv?usp=sharing)
44
+
45
+ ```python
46
+ from transformers.file_utils import cached_path, hf_bucket_url
47
+ from importlib.machinery import SourceFileLoader
48
+ from transformers import Wav2Vec2ProcessorWithLM
49
+ from IPython.lib.display import Audio
50
+ import torchaudio
51
+ import torch
52
+
53
+ # Load model & processor
54
+ model_name = "nguyenvulebinh/iwslt-asr-wav2vec-large"
55
+ model = SourceFileLoader("model", cached_path(hf_bucket_url(model_name,filename="model_handling.py"))).load_module().Wav2Vec2ForCTC.from_pretrained(model_name)
56
+ processor = Wav2Vec2ProcessorWithLM.from_pretrained(model_name)
57
+
58
+ # Load an example audio (16k)
59
+ audio, sample_rate = torchaudio.load(cached_path(hf_bucket_url(model_name, filename="tst_2010_sample.wav")))
60
+ input_data = processor.feature_extractor(audio[0], sampling_rate=16000, return_tensors='pt')
61
+
62
+ # Infer
63
+ output = model(**input_data)
64
+
65
+ # Output transcript without LM
66
+ print(processor.tokenizer.decode(output.logits.argmax(dim=-1)[0].detach().cpu().numpy()))
67
+ # and of course there's teams that have a lot more tada structures and among the best are recent graduates of kindergarten
68
+
69
+ # Output transcript with LM
70
+ print(processor.decode(output.logits.cpu().detach().numpy()[0], beam_width=100).text)
71
+ # and of course there are teams that have a lot more ta da structures and among the best are recent graduates of kindergarten
72
+ ```
73
+
74
+ ### Model Parameters License
75
+
76
+ The ASR model parameters are made available for non-commercial use only, under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. You can find details at: https://creativecommons.org/licenses/by-nc/4.0/legalcode
77
+
78
+
79
+ ### Contact
80
+
81
+ nguyenvulebinh@gmail.com
82
+
83
+ [![Follow](https://img.shields.io/twitter/follow/nguyenvulebinh?style=social)](https://twitter.com/intent/follow?screen_name=nguyenvulebinh)
README.md CHANGED
@@ -1,3 +1,83 @@
1
  ---
2
- license: cc-by-4.0
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: vi
3
+ datasets:
4
+ - common_voice
5
+ - librispeech_asr
6
+ - how2
7
+ - must-c-v1
8
+ - must-c-v2
9
+ - europarl
10
+ - tedlium
11
+ tags:
12
+ - audio
13
+ - automatic-speech-recognition
14
+ license: cc-by-nc-4.0
15
  ---
16
+
17
+ # Fine-Tune Wav2Vec2 large model for English ASR
18
+
19
+
20
+ ### Data for fine-tune
21
+
22
+ | Dataset | Duration in hours |
23
+ |--------------|-------------------|
24
+ | Common Voice | 1667 |
25
+ | Europarl | 85 |
26
+ | How2 | 356 |
27
+ | Librispeech | 936 |
28
+ | MuST-C v1 | 407 |
29
+ | MuST-C v2 | 482 |
30
+ | Tedlium | 482 |
31
+
32
+
33
+ ### Evaluation result
34
+
35
+ | Dataset | Duration in hours | WER w/o LM | WER with LM |
36
+ |-------------|-------------------|------------|-------------|
37
+ | Librispeech | 5.4 | 2.9 | 1.1 |
38
+ | Tedlium | 2.6 | 7.9 | 5.4 |
39
+
40
+
41
+ ### Usage
42
+
43
+ [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1FAhtGvjRdHT4W0KeMdMMlL7sm6Hbe7dv?usp=sharing)
44
+
45
+ ```python
46
+ from transformers.file_utils import cached_path, hf_bucket_url
47
+ from importlib.machinery import SourceFileLoader
48
+ from transformers import Wav2Vec2ProcessorWithLM
49
+ from IPython.lib.display import Audio
50
+ import torchaudio
51
+ import torch
52
+
53
+ # Load model & processor
54
+ model_name = "nguyenvulebinh/iwslt-asr-wav2vec-large"
55
+ model = SourceFileLoader("model", cached_path(hf_bucket_url(model_name,filename="model_handling.py"))).load_module().Wav2Vec2ForCTC.from_pretrained(model_name)
56
+ processor = Wav2Vec2ProcessorWithLM.from_pretrained(model_name)
57
+
58
+ # Load an example audio (16k)
59
+ audio, sample_rate = torchaudio.load(cached_path(hf_bucket_url(model_name, filename="tst_2010_sample.wav")))
60
+ input_data = processor.feature_extractor(audio[0], sampling_rate=16000, return_tensors='pt')
61
+
62
+ # Infer
63
+ output = model(**input_data)
64
+
65
+ # Output transcript without LM
66
+ print(processor.tokenizer.decode(output.logits.argmax(dim=-1)[0].detach().cpu().numpy()))
67
+ # and of course there's teams that have a lot more tada structures and among the best are recent graduates of kindergarten
68
+
69
+ # Output transcript with LM
70
+ print(processor.decode(output.logits.cpu().detach().numpy()[0], beam_width=100).text)
71
+ # and of course there are teams that have a lot more ta da structures and among the best are recent graduates of kindergarten
72
+ ```
73
+
74
+ ### Model Parameters License
75
+
76
+ The ASR model parameters are made available for non-commercial use only, under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. You can find details at: https://creativecommons.org/licenses/by-nc/4.0/legalcode
77
+
78
+
79
+ ### Contact
80
+
81
+ nguyenvulebinh@gmail.com
82
+
83
+ [![Follow](https://img.shields.io/twitter/follow/nguyenvulebinh?style=social)](https://twitter.com/intent/follow?screen_name=nguyenvulebinh)