whisper-large-v2-cantonese model for CTranslate2

This repository contains the conversion of BELLE-2/Belle-whisper-large-v3-zh-punct to the CTranslate2 model format.

This model can be used in CTranslate2 or projects based on CTranslate2 such as faster-whisper.

Example

Installation

pip install faster-whisper

Usage

from faster_whisper import WhisperModel
import datetime
import os

#Confirmed that this code works in faster-whisper 1.02 , numpy 1.23.5 , onnxruntime 1.14.1
#This code will not work if numpy's version exceed 2.0.0 and vad_filter=True

def transcribe_audio(input_file, output_file):
    model_size = "XA9/Belle-faster-whisper-large-v3-zh-punct"
    model = WhisperModel(model_size, device="cpu", compute_type="default")
    segments, info = model.transcribe(input_file, word_timestamps=True, initial_prompt = None,
        beam_size=5, language="zh", max_new_tokens=128, condition_on_previous_text=False,
        vad_filter=False, vad_parameters=dict(min_silence_duration_ms=500))
        
    sub_list = [] 
    srt_content = ""
    srt_number = 0
    for segment in segments:
        print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
        start_time_str = format_to_srt(segment.start)
        end_time_str = format_to_srt(segment.end)
        sub_text = replace_special_chars(segment.text)
        sub_entry = f"{start_time_str} --> {end_time_str}\n{sub_text}\n\n"
        sub_list.append(sub_entry)  # Add formatted subtitles to list
    
    for sub in sub_list: # Add subtitle's index number
        srt_content = srt_content + str(srt_number) + "\n" + sub
        srt_number = srt_number + 1
        
    with open(output_file, 'w', encoding="utf-8") as srt_file:
        srt_file.write(srt_content)
    
    print("")
    print("Saved: " + os.path.abspath(output_file))

def replace_special_chars(text): # remove space and "! " if the first letter is space or "! "
    # Check if the text starts with "!" or " " and ends with " "
    if (text.startswith("! ") or text.startswith(" ")):
        # Replace the special characters with an empty string
        text = text.replace("!", "").replace(" ", "", 1)  # Only replace the first occurrence
    return text


def format_to_srt(seconds): #Convert seconds to SRT's timecode
    dt = datetime.datetime(1, 1, 1) + datetime.timedelta(seconds=seconds)
    formatted_time = "{:02d}:{:02d}:{:02d},{:03d}".format(dt.hour, dt.minute, dt.second, dt.microsecond//1000)
    return formatted_time


transcribe_audio("audio.mp3", "audio.srt")

Example(transcribe with stable-ts)

Installation

Requires FFmpeg in PATH

pip install faster-whisper
pip install stable-ts

Usage

import stable_whisper

model = stable_whisper.load_faster_whisper('XA9/Belle-faster-whisper-large-v3-zh-punct', device='cpu', compute_type='default')
result = model.transcribe_stable('audio.mp3', language='zh', initial_prompt=None,regroup=False, vad=False, condition_on_previous_text=False)
result.to_srt_vtt('audio.srt', word_level=False)

Conversion details

The original model was converted with the following command:

ct2-transformers-converter --model BELLE-2/Belle-whisper-large-v3-zh-punct --output_dir Belle-faster-whisper-large-v3-zh-punct --copy_files tokenizer.json preprocessor_config.json --quantization float16
Downloads last month
24
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.