jlondonobo commited on
Commit
525cab3
1 Parent(s): 2111b26

📝 add how-to-use section to README

Browse files
Files changed (1) hide show
  1. README.md +57 -0
README.md CHANGED
@@ -47,6 +47,63 @@ The following table displays a **comparison** between the results of our model a
47
  | [Edresson/wav2vec2-large-xlsr-coraa-portuguese](https://huggingface.co/Edresson/wav2vec2-large-xlsr-coraa-portuguese) | 20.080 | 317M |
48
 
49
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
  ### Training hyperparameters
51
  We used the following hyperparameters for training:
52
  - `learning_rate`: 1e-05
 
47
  | [Edresson/wav2vec2-large-xlsr-coraa-portuguese](https://huggingface.co/Edresson/wav2vec2-large-xlsr-coraa-portuguese) | 20.080 | 317M |
48
 
49
 
50
+ ### How to use
51
+ You can use this model directly with a pipeline. This is especially useful for short audio. For **long-form** transcriptions please use the code in the [Long-form transcription](#long-form-transcription) section.
52
+
53
+ ```bash
54
+ pip install git+https://github.com/huggingface/transformers --force-reinstall
55
+ pip install torch
56
+ ```
57
+
58
+ ```python
59
+ >>> from transformers import pipeline
60
+ >>> import torch
61
+
62
+ >>> device = 0 if torch.cuda.is_available() else "cpu"
63
+
64
+ # Load the pipeline
65
+ >>> transcribe = pipeline(
66
+ ... task="automatic-speech-recognition",
67
+ ... model="jlondonobo/whisper-medium-pt",
68
+ ... chunk_length_s=30,
69
+ ... device=device,
70
+ ... )
71
+
72
+ # Force model to transcribe in Portuguese
73
+ >>> transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="pt", task="transcribe")
74
+
75
+ # Transcribe your audio file
76
+ >>> transcribe("audio.m4a")["text"]
77
+ 'Eu falo português.'
78
+ ```
79
+
80
+ #### Long-form transcription
81
+ To improve the performance of long-form transcription you can convert the HF model into a `whisper` model, and use the original paper's matching algorithm. To do this, you must install `whisper` and a set of tools developed by @bayartsogt.
82
+ ```bash
83
+ pip install git+https://github.com/openai/whisper.git
84
+ pip install git+https://github.com/bayartsogt-ya/whisper-multiple-hf-datasets
85
+ ```
86
+
87
+ Then convert the HuggingFace model and transcribe:
88
+ ```python
89
+ >>> import torch
90
+ >>> import whisper
91
+ >>> from multiple_datasets.hub_default_utils import convert_hf_whisper
92
+
93
+ >>> device = "cuda" if torch.cuda.is_available() else "cpu"
94
+
95
+ # Write HF model to local whisper model
96
+ >>> convert_hf_whisper("jlondonobo/whisper-medium-pt", "local_whisper_model.pt")
97
+
98
+ # Load the whisper model
99
+ >>> model = whisper.load_model("local_whisper_model.pt", device=device)
100
+
101
+ # Transcribe arbitrarily long audio
102
+ >>> model.transcribe("long_audio.m4a", language="pt")["text"]
103
+ 'Olá eu sou o José. Tenho 23 anos e trabalho...'
104
+ ```
105
+
106
+
107
  ### Training hyperparameters
108
  We used the following hyperparameters for training:
109
  - `learning_rate`: 1e-05