sivan22
/

faster-whisper-ivrit-ai-whisper-large-v2-tuned

Automatic Speech Recognition

Transformers

audio

hf-asr-leaderboard

Inference Endpoints

Model card Files Files and versions Community

sivan22 commited on Mar 31

Commit

b7bc7e4

•

1 Parent(s): 38bec15

Update README.md

Browse files

Files changed (1) hide show

README.md +5 -61

README.md CHANGED Viewed

@@ -134,75 +134,19 @@ It is a 1550M parameters multi-lingual ASR solution.
 To transcribe audio samples, the model has to be used alongside a [`WhisperProcessor`](https://huggingface.co/docs/transformers/model_doc/whisper#transformers.WhisperProcessor).
 ```python
-import torch
-from transformers import WhisperProcessor, WhisperForConditionalGeneration
-SAMPLING_RATE = 16000
-has_cuda = torch.cuda.is_available()
-model_path = 'ivrit-ai/whisper-large-v2-tuned'
-model = WhisperForConditionalGeneration.from_pretrained(model_path)
-if has_cuda:
-    model.to('cuda:0')
-processor = WhisperProcessor.from_pretrained(model_path)
-# audio_resample based on entry being part of an existing dataset.
-# Alternatively, this can be loaded from an audio file.
-audio_resample = librosa.resample(entry['audio']['array'], orig_sr=entry['audio']['sampling_rate'], target_sr=SAMPLING_RATE)
-input_features = processor(audio_resample, sampling_rate=SAMPLING_RATE, return_tensors="pt").input_features
-if has_cuda:
-  input_features = input_features.to('cuda:0')
-predicted_ids = model.generate(input_features, language='he', num_beams=5)
-transcript = processor.batch_decode(predicted_ids, skip_special_tokens=True)
-print(f'Transcript: {transcription[0]}')
 ```
 ## Evaluation
 You can use the [evaluate_model.py](https://github.com/yairl/ivrit.ai/blob/master/evaluate_model.py) reference on GitHub to evalute the model's quality.
-## Long-Form Transcription
-The Whisper model is intrinsically designed to work on audio samples of up to 30s in duration. However, by using a chunking
-algorithm, it can be used to transcribe audio samples of up to arbitrary length. This is possible through Transformers
-[`pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline)
-method. Chunking is enabled by setting `chunk_length_s=30` when instantiating the pipeline. With chunking enabled, the pipeline
-can be run with batched inference. It can also be extended to predict sequence level timestamps by passing `return_timestamps=True`:
-```python
->>> import torch
->>> from transformers import pipeline
->>> from datasets import load_dataset
->>> device = "cuda:0" if torch.cuda.is_available() else "cpu"
->>> pipe = pipeline(
->>>   "automatic-speech-recognition",
->>>   model="ivrit-ai/whisper-large-v2-tuned",
->>>   chunk_length_s=30,
->>>   device=device,
->>> )
->>> ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
->>> sample = ds[0]["audio"]
->>> prediction = pipe(sample.copy(), batch_size=8)["text"]
-" Mr. Quilter is the apostle of the middle classes, and we are glad to welcome his gospel."
->>> # we can also return timestamps for the predictions
->>> prediction = pipe(sample.copy(), batch_size=8, return_timestamps=True)["chunks"]
-[{'text': ' Mr. Quilter is the apostle of the middle classes and we are glad to welcome his gospel.',
-  'timestamp': (0.0, 5.44)}]
-```
-Refer to the blog post [ASR Chunking](https://huggingface.co/blog/asr-chunking) for more details on the chunking algorithm.
 ### BibTeX entry and citation info

 To transcribe audio samples, the model has to be used alongside a [`WhisperProcessor`](https://huggingface.co/docs/transformers/model_doc/whisper#transformers.WhisperProcessor).
 ```python
+from faster_whisper import WhisperModel
+model = WhisperModel("sivan22/faster-whisper-ivrit-ai-whisper-large-v2-tuned")
+segments, info = model.transcribe("audio.mp3")
+for segment in segments:
+    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
 ```
 ## Evaluation
 You can use the [evaluate_model.py](https://github.com/yairl/ivrit.ai/blob/master/evaluate_model.py) reference on GitHub to evalute the model's quality.
 ### BibTeX entry and citation info