Val123val commited on
Commit
95e89a5
1 Parent(s): 4c205d8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -0
README.md CHANGED
@@ -26,6 +26,7 @@ ru_whisper_small is a fine-tuned version of [openai/whisper-small](https://huggi
26
 
27
  ## Intended uses & limitations
28
 
 
29
  from transformers import WhisperProcessor, WhisperForConditionalGeneration
30
  from datasets import load_dataset
31
 
@@ -45,12 +46,14 @@ predicted_ids = model.generate(input_features)
45
  transcription = processor.batch_decode(predicted_ids, skip_special_tokens=False)
46
 
47
  transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
 
48
 
49
 
50
  ## Long-Form Transcription
51
 
52
  The Whisper model is intrinsically designed to work on audio samples of up to 30s in duration. However, by using a chunking algorithm, it can be used to transcribe audio samples of up to arbitrary length. This is possible through Transformers pipeline method. Chunking is enabled by setting chunk_length_s=30 when instantiating the pipeline. With chunking enabled, the pipeline can be run with batched inference. It can also be extended to predict sequence level timestamps by passing return_timestamps=True:
53
 
 
54
  import torch
55
  from transformers import pipeline
56
  from datasets import load_dataset
@@ -71,12 +74,15 @@ prediction = pipe(sample.copy(), batch_size=8)["text"]
71
 
72
  # we can also return timestamps for the predictions
73
  prediction = pipe(sample.copy(), batch_size=8, return_timestamps=True)["chunks"]
 
74
 
75
 
76
  ## Faster using with Speculative Decoding
77
 
78
  Speculative Decoding was proposed in Fast Inference from Transformers via Speculative Decoding by Yaniv Leviathan et. al. from Google. It works on the premise that a faster, assistant model very often generates the same tokens as a larger main model.
79
 
 
 
80
  import torch
81
  from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor
82
 
@@ -129,6 +135,7 @@ pipe = pipeline(
129
  sample = dataset[0]["audio"]
130
  result = pipe(sample)
131
  print(result["text"])
 
132
 
133
 
134
  ### Training hyperparameters
 
26
 
27
  ## Intended uses & limitations
28
 
29
+ ```bash
30
  from transformers import WhisperProcessor, WhisperForConditionalGeneration
31
  from datasets import load_dataset
32
 
 
46
  transcription = processor.batch_decode(predicted_ids, skip_special_tokens=False)
47
 
48
  transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
49
+ ```
50
 
51
 
52
  ## Long-Form Transcription
53
 
54
  The Whisper model is intrinsically designed to work on audio samples of up to 30s in duration. However, by using a chunking algorithm, it can be used to transcribe audio samples of up to arbitrary length. This is possible through Transformers pipeline method. Chunking is enabled by setting chunk_length_s=30 when instantiating the pipeline. With chunking enabled, the pipeline can be run with batched inference. It can also be extended to predict sequence level timestamps by passing return_timestamps=True:
55
 
56
+ ```bash
57
  import torch
58
  from transformers import pipeline
59
  from datasets import load_dataset
 
74
 
75
  # we can also return timestamps for the predictions
76
  prediction = pipe(sample.copy(), batch_size=8, return_timestamps=True)["chunks"]
77
+ ```
78
 
79
 
80
  ## Faster using with Speculative Decoding
81
 
82
  Speculative Decoding was proposed in Fast Inference from Transformers via Speculative Decoding by Yaniv Leviathan et. al. from Google. It works on the premise that a faster, assistant model very often generates the same tokens as a larger main model.
83
 
84
+
85
+ ```bash
86
  import torch
87
  from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor
88
 
 
135
  sample = dataset[0]["audio"]
136
  result = pipe(sample)
137
  print(result["text"])
138
+ ```
139
 
140
 
141
  ### Training hyperparameters