jlondonobo
commited on
Commit
•
525cab3
1
Parent(s):
2111b26
📝 add how-to-use section to README
Browse files
README.md
CHANGED
@@ -47,6 +47,63 @@ The following table displays a **comparison** between the results of our model a
|
|
47 |
| [Edresson/wav2vec2-large-xlsr-coraa-portuguese](https://huggingface.co/Edresson/wav2vec2-large-xlsr-coraa-portuguese) | 20.080 | 317M |
|
48 |
|
49 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
50 |
### Training hyperparameters
|
51 |
We used the following hyperparameters for training:
|
52 |
- `learning_rate`: 1e-05
|
|
|
47 |
| [Edresson/wav2vec2-large-xlsr-coraa-portuguese](https://huggingface.co/Edresson/wav2vec2-large-xlsr-coraa-portuguese) | 20.080 | 317M |
|
48 |
|
49 |
|
50 |
+
### How to use
|
51 |
+
You can use this model directly with a pipeline. This is especially useful for short audio. For **long-form** transcriptions please use the code in the [Long-form transcription](#long-form-transcription) section.
|
52 |
+
|
53 |
+
```bash
|
54 |
+
pip install git+https://github.com/huggingface/transformers --force-reinstall
|
55 |
+
pip install torch
|
56 |
+
```
|
57 |
+
|
58 |
+
```python
|
59 |
+
>>> from transformers import pipeline
|
60 |
+
>>> import torch
|
61 |
+
|
62 |
+
>>> device = 0 if torch.cuda.is_available() else "cpu"
|
63 |
+
|
64 |
+
# Load the pipeline
|
65 |
+
>>> transcribe = pipeline(
|
66 |
+
... task="automatic-speech-recognition",
|
67 |
+
... model="jlondonobo/whisper-medium-pt",
|
68 |
+
... chunk_length_s=30,
|
69 |
+
... device=device,
|
70 |
+
... )
|
71 |
+
|
72 |
+
# Force model to transcribe in Portuguese
|
73 |
+
>>> transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="pt", task="transcribe")
|
74 |
+
|
75 |
+
# Transcribe your audio file
|
76 |
+
>>> transcribe("audio.m4a")["text"]
|
77 |
+
'Eu falo português.'
|
78 |
+
```
|
79 |
+
|
80 |
+
#### Long-form transcription
|
81 |
+
To improve the performance of long-form transcription you can convert the HF model into a `whisper` model, and use the original paper's matching algorithm. To do this, you must install `whisper` and a set of tools developed by @bayartsogt.
|
82 |
+
```bash
|
83 |
+
pip install git+https://github.com/openai/whisper.git
|
84 |
+
pip install git+https://github.com/bayartsogt-ya/whisper-multiple-hf-datasets
|
85 |
+
```
|
86 |
+
|
87 |
+
Then convert the HuggingFace model and transcribe:
|
88 |
+
```python
|
89 |
+
>>> import torch
|
90 |
+
>>> import whisper
|
91 |
+
>>> from multiple_datasets.hub_default_utils import convert_hf_whisper
|
92 |
+
|
93 |
+
>>> device = "cuda" if torch.cuda.is_available() else "cpu"
|
94 |
+
|
95 |
+
# Write HF model to local whisper model
|
96 |
+
>>> convert_hf_whisper("jlondonobo/whisper-medium-pt", "local_whisper_model.pt")
|
97 |
+
|
98 |
+
# Load the whisper model
|
99 |
+
>>> model = whisper.load_model("local_whisper_model.pt", device=device)
|
100 |
+
|
101 |
+
# Transcribe arbitrarily long audio
|
102 |
+
>>> model.transcribe("long_audio.m4a", language="pt")["text"]
|
103 |
+
'Olá eu sou o José. Tenho 23 anos e trabalho...'
|
104 |
+
```
|
105 |
+
|
106 |
+
|
107 |
### Training hyperparameters
|
108 |
We used the following hyperparameters for training:
|
109 |
- `learning_rate`: 1e-05
|