antony66
/

whisper-large-v3-russian

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

antony66 commited on May 23, 2024

Commit

c47e9c0

·

verified ·

1 Parent(s): 564e7f5

Update README.md

Files changed (1) hide show

README.md +9 -1

README.md CHANGED Viewed

@@ -25,6 +25,14 @@ The finetuning process took over 60 hours on dual Tesla A100 80Gb.
 ## Usage
 ```python
 import torch
 from transformers import WhisperForConditionalGeneration, WhisperProcessor, pipeline
@@ -62,7 +70,7 @@ asr_pipeline = pipeline(
 # read your wav file into variable wav. For example:
 from io import BufferIO
 wav = BytesIO()
-with open('call.wav', 'rb') as f:
     wav.write(f.read())
 wav.seek(0)

 ## Usage
+In order to process phone calls it is highly recommended that you preprocess your records and adjust volume before performing ASR. For example, like this:
+```bash
+sox record.wav -r 16k record-normalized.wav norm -0.5 compand 0.3,1 -90,-90,-70,-70,-60,-20,0,0 -5 0 0.2
+```
+Then your ASR code should look somewhat like this:
 ```python
 import torch
 from transformers import WhisperForConditionalGeneration, WhisperProcessor, pipeline
 # read your wav file into variable wav. For example:
 from io import BufferIO
 wav = BytesIO()
+with open('record-normalized.wav', 'rb') as f:
     wav.write(f.read())
 wav.seek(0)