jlondonobo
/

whisper-medium-pt

Automatic Speech Recognition

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

jlondonobo commited on Dec 15, 2022

Commit

525cab3

•

1 Parent(s): 2111b26

📝 add how-to-use section to README

Files changed (1) hide show

README.md +57 -0

README.md CHANGED Viewed

@@ -47,6 +47,63 @@ The following table displays a **comparison** between the results of our model a
 | [Edresson/wav2vec2-large-xlsr-coraa-portuguese](https://huggingface.co/Edresson/wav2vec2-large-xlsr-coraa-portuguese)    | 20.080 | 317M       |
 ### Training hyperparameters
 We used the following hyperparameters for training:
 - `learning_rate`: 1e-05

 | [Edresson/wav2vec2-large-xlsr-coraa-portuguese](https://huggingface.co/Edresson/wav2vec2-large-xlsr-coraa-portuguese)    | 20.080 | 317M       |
+### How to use
+You can use this model directly with a pipeline. This is especially useful for short audio. For **long-form** transcriptions please use the code in the [Long-form transcription](#long-form-transcription) section.
+```bash
+pip install git+https://github.com/huggingface/transformers --force-reinstall
+pip install torch
+```
+```python
+>>> from transformers import pipeline
+>>> import torch
+>>> device = 0 if torch.cuda.is_available() else "cpu"
+# Load the pipeline
+>>> transcribe = pipeline(
+...     task="automatic-speech-recognition",
+...     model="jlondonobo/whisper-medium-pt",
+...     chunk_length_s=30,
+...     device=device,
+... )
+# Force model to transcribe in Portuguese
+>>> transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="pt", task="transcribe")
+# Transcribe your audio file
+>>> transcribe("audio.m4a")["text"]
+'Eu falo português.'
+```
+#### Long-form transcription
+To improve the performance of long-form transcription you can convert the HF model into a `whisper` model, and use the original paper's matching algorithm. To do this, you must install `whisper` and a set of tools developed by @bayartsogt.
+```bash
+pip install git+https://github.com/openai/whisper.git
+pip install git+https://github.com/bayartsogt-ya/whisper-multiple-hf-datasets
+```
+Then convert the HuggingFace model and transcribe:
+```python
+>>> import torch
+>>> import whisper
+>>> from multiple_datasets.hub_default_utils import convert_hf_whisper
+>>> device = "cuda" if torch.cuda.is_available() else "cpu"
+# Write HF model to local whisper model
+>>> convert_hf_whisper("jlondonobo/whisper-medium-pt", "local_whisper_model.pt")
+# Load the whisper model
+>>> model = whisper.load_model("local_whisper_model.pt", device=device)
+# Transcribe arbitrarily long audio
+>>> model.transcribe("long_audio.m4a", language="pt")["text"]
+'Olá eu sou o José. Tenho 23 anos e trabalho...'
+```
 ### Training hyperparameters
 We used the following hyperparameters for training:
 - `learning_rate`: 1e-05