crossdelenna commited on
Commit
fb23782
1 Parent(s): 681e369

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -9
README.md CHANGED
@@ -8,27 +8,27 @@ model-index:
8
 
9
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
10
  should probably proofread and complete it, then remove this comment. -->
11
-
12
  # wav2vec2-large-en-in-lm
13
 
14
- This model is a fine-tuned version of [crossdelenna/wav2vec2-large-en-in-lm](https://huggingface.co/crossdelenna/wav2vec2-large-en-in-lm) on an unknown dataset.
 
15
  It achieves the following results on the evaluation set:
16
  - Loss: 0.0478
17
  - Wer: 0.0951
18
 
19
- ## Model description
20
-
21
- More information needed
22
 
 
 
23
  ## Intended uses & limitations
24
-
25
- More information needed
26
 
27
  ## Training and evaluation data
28
-
29
- More information needed
30
 
31
  ## Training procedure
 
 
32
 
33
  ### Training hyperparameters
34
 
 
8
 
9
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
10
  should probably proofread and complete it, then remove this comment. -->
11
+
12
  # wav2vec2-large-en-in-lm
13
 
14
+ This model is a fine-tuned version of [crossdelenna/wav2vec2-large-en-in-lm](https://huggingface.co/crossdelenna/wav2vec2-large-en-in-lm)
15
+
16
  It achieves the following results on the evaluation set:
17
  - Loss: 0.0478
18
  - Wer: 0.0951
19
 
 
 
 
20
 
21
+ ## Model description
22
+ Wav2vec2 Automatic speech recognition for Indian English accent using the language model.
23
  ## Intended uses & limitations
24
+ This model is intended for my personal use only. Intentionally, the data set has absolutely no speech variance. It is fine-tuned only on my own data and I am using it for live speech dictation with Pyaudio non-blocking streaming microphone data (https://gist.github.com/KenoLeon/13dfb803a21a08cf224b2e6df0feed80). Before inference, train further on your own data. The training data has a lot of quantitative finance-related terminologies and a lot of modern reddit slangs. Note that it doesn't hash out F words.
 
25
 
26
  ## Training and evaluation data
27
+ Facebook base large dataset further fine-tuned on thirty-two hours of personal recordings. It has a male voice with an Indian English accent. The recording is done on the omnidirectional microphone with a lot of background noise.
 
28
 
29
  ## Training procedure
30
+ I downloaded my Reddit and Twitter data and started recording with each clip not exceeding 13 seconds. When I got enough sample size of 6 hrs I fine-tuned the model which had approximately 19% WER. Afterwards, I kept adding the data and kept fine-tuning it. It is now trained on thirty hours of data.
31
+ (Now the idea is to fine-tune every two-three months only on unrecognized words)
32
 
33
  ### Training hyperparameters
34