Titouan commited on
Commit
223aee2
2 Parent(s): 8279335 6980701

Merge branch 'main' of https://huggingface.co/speechbrain/asr-crdnn-commonvoice-it into main

Browse files
Files changed (2) hide show
  1. README.md +14 -13
  2. example-it.wav +0 -0
README.md CHANGED
@@ -2,13 +2,14 @@
2
  language: "it"
3
  thumbnail:
4
  tags:
5
- - ASR
6
  - CTC
7
  - Attention
8
  - pytorch
 
9
  license: "apache-2.0"
10
  datasets:
11
- - commonvoice
12
  metrics:
13
  - wer
14
  - cer
@@ -18,7 +19,7 @@ metrics:
18
 
19
  This repository provides all the necessary tools to perform automatic speech
20
  recognition from an end-to-end system pretrained on CommonVoice (IT) within
21
- SpeechBrain. For a better experience we encourage you to learn more about
22
  [SpeechBrain](https://speechbrain.github.io). The given ASR model performance are:
23
 
24
  | Release | Test CER | Test WER | GPUs |
@@ -27,19 +28,19 @@ SpeechBrain. For a better experience we encourage you to learn more about
27
 
28
  ## Pipeline description
29
 
30
- This ASR system is composed with 2 different but linked blocks:
31
  1. Tokenizer (unigram) that transforms words into subword units and trained with
32
  the train transcriptions (train.tsv) of CommonVoice (IT).
33
- 3. Acoustic model (CRDNN + CTC/Attention). The CRDNN architecture is made of
34
- N blocks of convolutional neural networks with normalisation and pooling on the
35
  frequency domain. Then, a bidirectional LSTM is connected to a final DNN to obtain
36
  the final acoustic representation that is given to the CTC and attention decoders.
37
 
38
  ## Intended uses & limitations
39
 
40
- This model has been primilarly developed to be run within SpeechBrain as a pretrained ASR model
41
  for the Italian language. Thanks to the flexibility of SpeechBrain, any of the 2 blocks
42
- detailed above can be extracted and connected to you custom pipeline as long as SpeechBrain is
43
  installed.
44
 
45
  ## Install SpeechBrain
@@ -47,19 +48,19 @@ installed.
47
  First of all, please install SpeechBrain with the following command:
48
 
49
  ```
50
- pip install \\we hide ! SpeechBrain is still private :p
51
  ```
52
 
53
  Please notice that we encourage you to read our tutorials and learn more about
54
  [SpeechBrain](https://speechbrain.github.io).
55
 
56
- ### Transcribing your own audio files
57
 
58
  ```python
59
  from speechbrain.pretrained import EncoderDecoderASR
60
 
61
- asr_model = EncoderDecoderASR.from_hparams(source="speechbrain/asr-crdnn-commonvoice-it")
62
- asr_model.transcribe_file("path_to_your_file.wav")
63
 
64
  ```
65
 
@@ -72,6 +73,6 @@ asr_model.transcribe_file("path_to_your_file.wav")
72
  year = {2021},
73
  publisher = {GitHub},
74
  journal = {GitHub repository},
75
- howpublished = {\url{https://github.com/speechbrain/speechbrain}},
76
  }
77
  ```
 
2
  language: "it"
3
  thumbnail:
4
  tags:
5
+ - automatic-speech-recognition
6
  - CTC
7
  - Attention
8
  - pytorch
9
+ - speechbrain
10
  license: "apache-2.0"
11
  datasets:
12
+ - common_voice
13
  metrics:
14
  - wer
15
  - cer
 
19
 
20
  This repository provides all the necessary tools to perform automatic speech
21
  recognition from an end-to-end system pretrained on CommonVoice (IT) within
22
+ SpeechBrain. For a better experience, we encourage you to learn more about
23
  [SpeechBrain](https://speechbrain.github.io). The given ASR model performance are:
24
 
25
  | Release | Test CER | Test WER | GPUs |
 
28
 
29
  ## Pipeline description
30
 
31
+ This ASR system is composed of 2 different but linked blocks:
32
  1. Tokenizer (unigram) that transforms words into subword units and trained with
33
  the train transcriptions (train.tsv) of CommonVoice (IT).
34
+ 2. Acoustic model (CRDNN + CTC/Attention). The CRDNN architecture is made of
35
+ N blocks of convolutional neural networks with normalization and pooling on the
36
  frequency domain. Then, a bidirectional LSTM is connected to a final DNN to obtain
37
  the final acoustic representation that is given to the CTC and attention decoders.
38
 
39
  ## Intended uses & limitations
40
 
41
+ This model has been primarily developed to be run within SpeechBrain as a pretrained ASR model
42
  for the Italian language. Thanks to the flexibility of SpeechBrain, any of the 2 blocks
43
+ detailed above can be extracted and connected to your custom pipeline as long as SpeechBrain is
44
  installed.
45
 
46
  ## Install SpeechBrain
 
48
  First of all, please install SpeechBrain with the following command:
49
 
50
  ```
51
+ pip install speechbrain
52
  ```
53
 
54
  Please notice that we encourage you to read our tutorials and learn more about
55
  [SpeechBrain](https://speechbrain.github.io).
56
 
57
+ ### Transcribing your own audio files (in Italian)
58
 
59
  ```python
60
  from speechbrain.pretrained import EncoderDecoderASR
61
 
62
+ asr_model = EncoderDecoderASR.from_hparams(source="speechbrain/asr-crdnn-commonvoice-it", savedir="pretrained_models/asr-crdnn-commonvoice-it")
63
+ asr_model.transcribe_file("speechbrain/asr-crdnn-commonvoice-it/example-it.wav")
64
 
65
  ```
66
 
 
73
  year = {2021},
74
  publisher = {GitHub},
75
  journal = {GitHub repository},
76
+ howpublished = {\\url{https://github.com/speechbrain/speechbrain}},
77
  }
78
  ```
example-it.wav ADDED
Binary file (136 kB). View file