crossdelenna
commited on
Commit
•
fb23782
1
Parent(s):
681e369
Update README.md
Browse files
README.md
CHANGED
@@ -8,27 +8,27 @@ model-index:
|
|
8 |
|
9 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
10 |
should probably proofread and complete it, then remove this comment. -->
|
11 |
-
|
12 |
# wav2vec2-large-en-in-lm
|
13 |
|
14 |
-
This model is a fine-tuned version of [crossdelenna/wav2vec2-large-en-in-lm](https://huggingface.co/crossdelenna/wav2vec2-large-en-in-lm)
|
|
|
15 |
It achieves the following results on the evaluation set:
|
16 |
- Loss: 0.0478
|
17 |
- Wer: 0.0951
|
18 |
|
19 |
-
## Model description
|
20 |
-
|
21 |
-
More information needed
|
22 |
|
|
|
|
|
23 |
## Intended uses & limitations
|
24 |
-
|
25 |
-
More information needed
|
26 |
|
27 |
## Training and evaluation data
|
28 |
-
|
29 |
-
More information needed
|
30 |
|
31 |
## Training procedure
|
|
|
|
|
32 |
|
33 |
### Training hyperparameters
|
34 |
|
|
|
8 |
|
9 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
10 |
should probably proofread and complete it, then remove this comment. -->
|
11 |
+
|
12 |
# wav2vec2-large-en-in-lm
|
13 |
|
14 |
+
This model is a fine-tuned version of [crossdelenna/wav2vec2-large-en-in-lm](https://huggingface.co/crossdelenna/wav2vec2-large-en-in-lm)
|
15 |
+
|
16 |
It achieves the following results on the evaluation set:
|
17 |
- Loss: 0.0478
|
18 |
- Wer: 0.0951
|
19 |
|
|
|
|
|
|
|
20 |
|
21 |
+
## Model description
|
22 |
+
Wav2vec2 Automatic speech recognition for Indian English accent using the language model.
|
23 |
## Intended uses & limitations
|
24 |
+
This model is intended for my personal use only. Intentionally, the data set has absolutely no speech variance. It is fine-tuned only on my own data and I am using it for live speech dictation with Pyaudio non-blocking streaming microphone data (https://gist.github.com/KenoLeon/13dfb803a21a08cf224b2e6df0feed80). Before inference, train further on your own data. The training data has a lot of quantitative finance-related terminologies and a lot of modern reddit slangs. Note that it doesn't hash out F words.
|
|
|
25 |
|
26 |
## Training and evaluation data
|
27 |
+
Facebook base large dataset further fine-tuned on thirty-two hours of personal recordings. It has a male voice with an Indian English accent. The recording is done on the omnidirectional microphone with a lot of background noise.
|
|
|
28 |
|
29 |
## Training procedure
|
30 |
+
I downloaded my Reddit and Twitter data and started recording with each clip not exceeding 13 seconds. When I got enough sample size of 6 hrs I fine-tuned the model which had approximately 19% WER. Afterwards, I kept adding the data and kept fine-tuning it. It is now trained on thirty hours of data.
|
31 |
+
(Now the idea is to fine-tune every two-three months only on unrecognized words)
|
32 |
|
33 |
### Training hyperparameters
|
34 |
|