--- license: apache-2.0 --- # faster-whisper-large-v3 This is the model Whisper large-v3 converted to be used in [faster-whisper](https://github.com/guillaumekln/faster-whisper). ## Using You can choose between monkey-patching faster-whisper 0.9.0 (while they don't update it) or using my fork (which is easier). ### Using my fork First, install it by executing: ```shell pip install -U 'transformers[torch]>=4.35.0' https://github.com/PythonicCafe/faster-whisper/archive/refs/heads/feature/large-v3.zip#egg=faster-whisper ``` Then, use it as the regular faster-whisper: ```python import time import faster_whisper filename = "my-audio.mp3" initial_prompt = "My podcast recording" # Or `None` word_timestamps = False vad_filter = True temperature = 0.0 language = "pt" model_size = "large-v3" device, compute_type = "cuda", "float16" # or: device, compute_type = "cpu", "float32" model = faster_whisper.WhisperModel(model_size, device=device, compute_type=compute_type) segments, transcription_info = model.transcribe( filename, word_timestamps=word_timestamps, vad_filter=vad_filter, temperature=temperature, language=language, initial_prompt=initial_prompt, ) print(transcription_info) start_time = time.time() for segment in segments: row = { "start": segment.start, "end": segment.end, "text": segment.text, } if word_timestamps: row["words"] = [ {"start": word.start, "end": word.end, "word": word.word} for word in segment.words ] print(row) end_time = time.time() print(f"Transcription finished in {end_time - start_time:.2f}s") ``` ### Monkey-patching faster-whisper 0.9.0 Make sure you have the latest version: ```shell pip install -U 'faster-whisper>=0.9.0' ``` Then, use it with some little changes: ```python import time import faster_whisper.transcribe # Monkey patch 1 (add model to list) faster_whisper.utils._MODELS["large-v3"] = "turicas/faster-whisper-large-v3" # Monkey patch 2 (fix Tokenizer) faster_whisper.transcribe.Tokenizer.encode = lambda self, text: self.tokenizer.encode(text, add_special_tokens=False) filename = "my-audio.mp3" initial_prompt = "My podcast recording" # Or `None` word_timestamps = False vad_filter = True temperature = 0.0 language = "pt" model_size = "large-v3" device, compute_type = "cuda", "float16" # or: device, compute_type = "cpu", "float32" model = faster_whisper.transcribe.WhisperModel(model_size, device=device, compute_type=compute_type) # Monkey patch 3 (change n_mels) from faster_whisper.feature_extractor import FeatureExtractor model.feature_extractor = FeatureExtractor(feature_size=128) # Monkey patch 4 (change tokenizer) from transformers import AutoProcessor model.hf_tokenizer = AutoProcessor.from_pretrained("openai/whisper-large-v3").tokenizer model.hf_tokenizer.token_to_id = lambda token: model.hf_tokenizer.convert_tokens_to_ids(token) segments, transcription_info = model.transcribe( filename, word_timestamps=word_timestamps, vad_filter=vad_filter, temperature=temperature, language=language, initial_prompt=initial_prompt, ) print(transcription_info) start_time = time.time() for segment in segments: row = { "start": segment.start, "end": segment.end, "text": segment.text, } if word_timestamps: row["words"] = [ {"start": word.start, "end": word.end, "word": word.word} for word in segment.words ] print(row) end_time = time.time() print(f"Transcription finished in {end_time - start_time:.2f}s") ``` ## Converting If you'd like to convert the model yourself, execute: ```shell pip install -U 'ctranslate2>=3.21.0' 'transformers-4.35.0' 'OpenNMT-py==2.*' sentencepiece ct2-transformers-converter --model openai/whisper-large-v3 --output_dir whisper-large-v3-ct2 ``` Then, the files will be at `whisper-large-v3-ct2/`. ## License These files have the same license as the original [openai/whisper-large-v3 model](https://huggingface.co/openai/whisper-large): Apache 2.0.