nyrahealth
/

faster_CrisperWhisper

Model card Files Files and versions Community

Laurin-myreha commited on Sep 2, 2024

Commit

752d065

verified ·

1 Parent(s): 891eb89

Update README.md

Browse files

Files changed (1) hide show

README.md +17 -64

README.md CHANGED Viewed

@@ -1,4 +1,5 @@
----WARNING--- this is the converted CrisperWhisper model into CTranslate2 to be compatible with faster_whisper framework. Since faster whisper uses a slightly different implementation for the DTW attention matrix and output format we currently do not guarantee the same timestamp accuracy as for the transformers implementation. The transcription accuracy and filler detection should work as expected.
 # CrisperWhisper
@@ -21,7 +22,7 @@
     - [Transcription Performance](#transcription-performance)
     - [Segmentation Performance](#segmentation-performance)
 - [Usage](#2-usage)
-  - [with transformers](#21-usage-with-🤗-transformers)
 - [How?](#3-How?)
@@ -87,79 +88,31 @@ The following table uses the metrics as defined in the paper. For this table we
 Here's how to use CrisperWhisper in your Python scripts:
-### 2.1 Usage with 🤗 transformers
-```python
-import os
-import sys
-import torch
 from datasets import load_dataset
-from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
-def adjust_pauses_for_hf_pipeline_output(pipeline_output, split_threshold=0.12):
-    """
-    Adjust pause timings by distributing pauses up to the threshold evenly between adjacent words.
-    """
-    adjusted_chunks = pipeline_output["chunks"].copy()
-    for i in range(len(adjusted_chunks) - 1):
-        current_chunk = adjusted_chunks[i]
-        next_chunk = adjusted_chunks[i + 1]
-        current_start, current_end = current_chunk["timestamp"]
-        next_start, next_end = next_chunk["timestamp"]
-        pause_duration = next_start - current_end
-        if pause_duration > 0:
-            if pause_duration > split_threshold:
-                distribute = split_threshold / 2
-            else:
-                distribute = pause_duration / 2
-            # Adjust current chunk end time
-            adjusted_chunks[i]["timestamp"] = (current_start, current_end + distribute)
-            # Adjust next chunk start time
-            adjusted_chunks[i + 1]["timestamp"] = (next_start - distribute, next_end)
-    pipeline_output["chunks"] = adjusted_chunks
-    return pipeline_output
 device = "cuda:0" if torch.cuda.is_available() else "cpu"
-torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
-model_id = "nyrahealth/CrisperWhisper"
-model = AutoModelForSpeechSeq2Seq.from_pretrained(
-    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
-)
-model.to(device)
-processor = AutoProcessor.from_pretrained(model_id)
-pipe = pipeline(
-    "automatic-speech-recognition",
-    model=model,
-    tokenizer=processor.tokenizer,
-    feature_extractor=processor.feature_extractor,
-    chunk_length_s=30,
-    batch_size=16,
-    return_timestamps='word',
-    torch_dtype=torch_dtype,
-    device=device,
-)
 dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation")
 sample = dataset[0]["audio"]
-hf_pipeline_output = pipe(sample)
-crisper_whisper_result = adjust_pauses_for_hf_pipeline_output(hf_pipeline_output)
-print(crisper_whisper_result)
 ```
 read more about the reasoning behind the pause distribution logic in our paper.
 ## 3. How?

+---WARNING--- this is the converted CrisperWhisper model into CTranslate2 to be compatible with [faster whisper](https://github.com/SYSTRAN/faster-whisper) framework. However, due to the different implementation of the timestamp calculation in faster whisper or more precisely [CTranslate2](https://github.com/OpenNMT/CTranslate2/) we do not guarantee the same timestamp accuracy as with the transformers implementation. The transcription accuracy and filler detection should work as expected.
 # CrisperWhisper
     - [Transcription Performance](#transcription-performance)
     - [Segmentation Performance](#segmentation-performance)
 - [Usage](#2-usage)
+  - [with transformers](#21-usage-with-faster-whisper)
 - [How?](#3-How?)
 Here's how to use CrisperWhisper in your Python scripts:
+### 2.1 Usage with faster whisper
+We also provide a converted model to be compatible with [faster whisper](https://github.com/SYSTRAN/faster-whisper). However, due to the different implementation of the timestamp calculation in faster whisper or more precisely [CTranslate2](https://github.com/OpenNMT/CTranslate2/) the timestamp accuracy can not be guaranteed.
+```python
+from faster_whisper import WhisperModel
 from datasets import load_dataset
+faster_whisper_model = '/home/azureuser/data2/models/faster_crisper_whisper_verbatim_timestamp_finetuned_de_en_swiss'
+# Initialize the Whisper model
 device = "cuda:0" if torch.cuda.is_available() else "cpu"
+torch_dtype = "float16" if torch.cuda.is_available() else "float32"
+model = WhisperModel(faster_whisper_model, device=device, compute_type="float32")
 dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation")
 sample = dataset[0]["audio"]
+segments, info = model.transcribe(sample['array'], beam_size=1, language='en', word_timestamps = True, without_timestamps= True)
+for segment in segments:
+    print(segment)
 ```
 read more about the reasoning behind the pause distribution logic in our paper.
 ## 3. How?