gigant
/

romanian-wav2vec2

Automatic Speech Recognition

hf-asr-leaderboard

robust-speech-event

Inference Endpoints

Model card Files Files and versions Community

gigant commited on Feb 23, 2022

Commit

1cd8351

•

1 Parent(s): 4140359

Update README.md

Files changed (1) hide show

README.md +81 -0

README.md CHANGED Viewed

@@ -70,6 +70,87 @@ The architecture is based on [facebook/wav2vec2-xls-r-300m](https://huggingface.
 More information needed
 ## Training and evaluation data
 Training data :

 More information needed
+## How to use
+Make sure you have installed the correct dependencies for the language model-boosted version to work. You can just run this command to install the `kenlm` and `pyctcdecode` libraries :
+```pip install https://github.com/kpu/kenlm/archive/master.zip pyctcdecode```
+With the framework `transformers` you can load the model with the following code :
+```
+from transformers import AutoProcessor, AutoModelForCTC
+processor = AutoProcessor.from_pretrained("gigant/romanian-wav2vec2")
+model = AutoModelForCTC.from_pretrained("gigant/romanian-wav2vec2")
+```
+Or, if you want to test the model, you can load the automatic speech recognition pipeline from `transformers` with :
+```
+from transformers import pipeline
+asr = pipeline("automatic-speech-recognition", model="gigant/romanian-wav2vec2")
+```
+## Example use with the `datasets` library
+First, you need to load your data
+We will use the [Romanian Speech Synthesis](https://huggingface.co/datasets/gigant/romanian_speech_synthesis_0_8_1) dataset in this example.
+```
+from datasets import load_dataset
+dataset = load_dataset("gigant/romanian_speech_synthesis_0_8_1")
+```
+You can listen to the samples with the `IPython.display` library :
+```
+from IPython.display import Audio
+i = 0
+sample = dataset["train"][i]
+Audio(sample["audio"]["array"], rate = sample["audio"]["sampling_rate"])
+```
+The model is trained to work with audio sampled at 16kHz, so if the sampling rate of the audio in the dataset is different, we will have to resample it.
+In the example, the audio is sampled at 48kHz. We can see this by checking `dataset["train"][0]["audio"]["sampling_rate"]`
+The following code resample the audio using the `torchaudio` library :
+```
+import torchaudio
+import torch
+i = 0
+audio = sample["audio"]["array"]
+rate = sample["audio"]["sampling_rate"]
+resampler = torchaudio.transforms.Resample(rate, 16_000)
+audio_16 = resampler(torch.Tensor(audio)).numpy()
+```
+To listen to the resampled sample :
+```
+Audio(audio_16, rate=16000)
+```
+Know you can get the model prediction by running
+```
+predicted_text = asr(audio_16)
+ground_truth = dataset["train"][i]["sentence"]
+print(f"Predicted text : {predicted_text}")
+print(f"Ground truth : {ground_truth}")
+```
 ## Training and evaluation data
 Training data :