antony66 commited on
Commit
67687c3
1 Parent(s): ab61946

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +43 -0
README.md CHANGED
@@ -20,6 +20,49 @@ Dataset used for finetuning is Common Voice 11.0, Russian part.
20
 
21
  After preprocessing of the original dataset (train + test + validation splits were mixed and split to a new train + test split by 0.95/0.05) the original Whisper v3 has WER 9.2 while the finetuned version shows 6.31 (so far).
22
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  ## Work in progress
24
 
25
  This model is in WIP state for now. The goal is to finetune it for speech recognition of phone calls as much as possible. If you want to contribute and you know or have any good dataset please let me know. Your help will be much appreciated.
 
20
 
21
  After preprocessing of the original dataset (train + test + validation splits were mixed and split to a new train + test split by 0.95/0.05) the original Whisper v3 has WER 9.2 while the finetuned version shows 6.31 (so far).
22
 
23
+ ## Usage
24
+
25
+ ```
26
+ import torch
27
+ from transformers import WhisperForConditionalGeneration, WhisperProcessor, pipeline
28
+
29
+ torch_dtype = torch.bfloat16 # set your preferred type here
30
+
31
+ device = 'cpu'
32
+ if torch.cuda.is_available():
33
+ device = 'cuda'
34
+ elif torch.backends.mps.is_available():
35
+ device = 'mps'
36
+ setattr(torch.distributed, "is_initialized", lambda : False) # monkey patching
37
+ device = torch.device(device)
38
+
39
+ whisper = WhisperForConditionalGeneration.from_pretrained(
40
+ "antony66/whisper-large-v3-russian", torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True,
41
+ # add attn_implementation="flash_attention_2" if your GPU supports it
42
+ )
43
+
44
+ processor = WhisperProcessor.from_pretrained("antony66/whisper-large-v3-russian")
45
+
46
+ asr_pipeline = pipeline(
47
+ "automatic-speech-recognition",
48
+ model=whisper,
49
+ tokenizer=processor.tokenizer,
50
+ feature_extractor=processor.feature_extractor,
51
+ max_new_tokens=256,
52
+ chunk_length_s=30,
53
+ batch_size=16,
54
+ return_timestamps=True,
55
+ torch_dtype=torch_dtype,
56
+ device=device,
57
+ )
58
+
59
+ # read your wav file into variable wav
60
+ asr = asr_pipeline(wav, generate_kwargs={"language": "russian", "max_new_tokens": 256}, return_timestamps=False)
61
+
62
+ print(asr['text'])
63
+
64
+ ```
65
+
66
  ## Work in progress
67
 
68
  This model is in WIP state for now. The goal is to finetune it for speech recognition of phone calls as much as possible. If you want to contribute and you know or have any good dataset please let me know. Your help will be much appreciated.