Val123val
/

ru_whisper_small

@@ -26,6 +26,7 @@ ru_whisper_small is a fine-tuned version of [openai/whisper-small](https://huggi
 ## Intended uses & limitations
 from transformers import WhisperProcessor, WhisperForConditionalGeneration
 from datasets import load_dataset
@@ -45,12 +46,14 @@ predicted_ids = model.generate(input_features)
 transcription = processor.batch_decode(predicted_ids, skip_special_tokens=False)
 transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
 ## Long-Form Transcription
 The Whisper model is intrinsically designed to work on audio samples of up to 30s in duration. However, by using a chunking algorithm, it can be used to transcribe audio samples of up to arbitrary length. This is possible through Transformers pipeline method. Chunking is enabled by setting chunk_length_s=30 when instantiating the pipeline. With chunking enabled, the pipeline can be run with batched inference. It can also be extended to predict sequence level timestamps by passing return_timestamps=True:
 import torch
 from transformers import pipeline
 from datasets import load_dataset
@@ -71,12 +74,15 @@ prediction = pipe(sample.copy(), batch_size=8)["text"]
 # we can also return timestamps for the predictions
 prediction = pipe(sample.copy(), batch_size=8, return_timestamps=True)["chunks"]
 ## Faster using with Speculative Decoding
 Speculative Decoding was proposed in Fast Inference from Transformers via Speculative Decoding by Yaniv Leviathan et. al. from Google. It works on the premise that a faster, assistant model very often generates the same tokens as a larger main model.
 import torch
 from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor
@@ -129,6 +135,7 @@ pipe = pipeline(
 sample = dataset[0]["audio"]
 result = pipe(sample)
 print(result["text"])
 ### Training hyperparameters

 ## Intended uses & limitations
+```bash
 from transformers import WhisperProcessor, WhisperForConditionalGeneration
 from datasets import load_dataset
 transcription = processor.batch_decode(predicted_ids, skip_special_tokens=False)
 transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
+```
 ## Long-Form Transcription
 The Whisper model is intrinsically designed to work on audio samples of up to 30s in duration. However, by using a chunking algorithm, it can be used to transcribe audio samples of up to arbitrary length. This is possible through Transformers pipeline method. Chunking is enabled by setting chunk_length_s=30 when instantiating the pipeline. With chunking enabled, the pipeline can be run with batched inference. It can also be extended to predict sequence level timestamps by passing return_timestamps=True:
+```bash
 import torch
 from transformers import pipeline
 from datasets import load_dataset
 # we can also return timestamps for the predictions
 prediction = pipe(sample.copy(), batch_size=8, return_timestamps=True)["chunks"]
+```
 ## Faster using with Speculative Decoding
 Speculative Decoding was proposed in Fast Inference from Transformers via Speculative Decoding by Yaniv Leviathan et. al. from Google. It works on the premise that a faster, assistant model very often generates the same tokens as a larger main model.
+```bash
 import torch
 from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor
 sample = dataset[0]["audio"]
 result = pipe(sample)
 print(result["text"])
+```
 ### Training hyperparameters