File size: 2,915 Bytes
1804c40 ee4472b b46e348 211e970 ee4472b aefe5ec b46e348 402e90c b46e348 02e0324 2ccf153 b46e348 7335025 02e0324 27eee85 02e0324 a3dadc7 02e0324 06c0076 02e0324 5fb5b17 8808411 02e0324 15ad1d2 02e0324 15ad1d2 9ef11e9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
---
license: cc-by-4.0
language:
- mk
library_name: speechbrain
metrics:
- wer
- cer
pipeline_tag: automatic-speech-recognition
base_model:
- jonatasgrosman/wav2vec2-large-xlsr-53-russian
model-index:
- name: wav2vec2-aed-macedonian-asr
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: Macedonian Common Voice V.18.0
type: macedonian-common-voice-v.18.0
metrics:
- name: Test WER
type: test-wer
value: 5.66
- name: Test CER
type: test-cer
value: 1.43
---
# Fine-tuned XLSR-53-russian large model for speech recognition in Macedonian
Authors:
1. Dejan Porjazovski
2. Ilina Jakimovska
3. Ordan Chukaliev
4. Nikola Stikov
This collaboration is part of the activities of the Center for Advanced Interdisciplinary Research (CAIR) at UKIM.
## Data used for training
In training of the model, we used the following data sources:
1. Digital Archive for Ethnological and Anthropological Resources (DAEAR) at the Institutе of Ethnology and Anthropology, PMF, UKIM.
2. Audio version of the international journal "EthnoAnthropoZoom" at the Institutе of Ethnology and Anthropology, PMF, UKIM.
3. The podcast "Обични луѓе" by Ilina Jakimovska.
4. The scientific videos from the series "Наука за деца", foundation KANTAROT.
5. Macedonian version of the Mozilla Common Voice (version 18).
## Model description
This model is an attention-based encoder-decoder (AED). The encoder is a Wav2vec2 model and the decoder is RNN-based.
## Usage
The model is developed using the [SpeechBrain](https://speechbrain.github.io) toolkit. To use it, you need to install SpeechBrain with:
```
pip install speechbrain
```
SpeechBrain relies on the Transformers library, therefore you need install it:
```
pip install transformers
```
An external `py_module_file=custom_interface.py` is used as an external Predictor class into this HF repos. We use the `foreign_class` function from `speechbrain.pretrained.interfaces` that allows you to load your custom model.
```python
from speechbrain.inference.interfaces import foreign_class
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
asr_classifier = foreign_class(source="Macedonian-ASR/wav2vec2-aed-macedonian-asr", pymodule_file="custom_interface.py", classname="ASR")
asr_classifier = asr_classifier.to(device)
predictions = asr_classifier.classify_file("audio_file.wav", device)
print(predictions)
```
## Training
To fine-tune this model, you need to run:
```
python train.py hyperparams.yaml
```
```train.py``` file contains the functions necessary for training the model and ```hyperparams.yaml``` contains the hyperparameters. For more details about training the model, refer to the [SpeechBrain](https://speechbrain.github.io) documentation. |