|
--- |
|
metrics: |
|
- wer |
|
- cer |
|
library_name: transformers |
|
pipeline_tag: automatic-speech-recognition |
|
tags: |
|
- Aivaliot |
|
- Greek dialect |
|
--- |
|
|
|
# xls-r-greek-aivaliot |
|
|
|
Aivaliot is a variety of Greek that was spoken in Aivali (known as Ayvalık in Turkish), |
|
located on the Edremit Gulf in Western Turkey, till the beginning of the 20th century. |
|
After the end of the war between Greece and Turkey (1919–1922) and the defeat of the Greek army, |
|
those Aivaliots who managed to survive flew to Greece, principally to the nearby island of Lesbos, |
|
where they settled in various dialectal enclaves. Aivaliot resembles Lesbian in many respects. |
|
According to Ralli (Ralli, 2019), Aivaliot and Lesbian belong to the group of Northern Greek Dialects, |
|
sharing unstressed /i/ and /u/ deletion and unstressed /o/ and /e/ raising. |
|
Aivaliot morphology and the lexicon are influenced by Turkish, because of a long domination |
|
by the Ottomans, as well as by Italo-Romance, due to the pre-Ottoman Genovese rule and trade with Venice (Ralli, 2019b). |
|
However, there are no Turkish or Italo-Romance influences on phonology or syntax. |
|
In 2002, a handful of first-generation Aivaliot speakers could still be found in Lesbos and |
|
elsewhere in Greece and abroad, where they still remembered and practiced their mother tongue (Ralli, 2019). |
|
Nowadays, the dialect is on the way to extinction, since second-generation speakers either have |
|
a passive knowledge of it, or those living in Lesbos mix their own dialectal variety with the parent Lesbian. |
|
|
|
This is the first automatic speech recognition (ASR) model for Aivaliot. |
|
To train the model, we fine-tuned a Greek XLS-R model ([jonatasgrosman/wav2vec2-large-xlsr-53-greek](https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-greek)) on the Aivaliot resources. |
|
|
|
## Resources |
|
|
|
We used recordings from the Asia Minor Archive (AMiGre) to train the model. AMiGre was compiled within the |
|
framework of two research projects that ran in the periods 2002-2005 and 2012-2016. |
|
We obtained permission to use it from the studies’ authors. It consists of narratives elicited from |
|
18 elderly speakers (5 male, 13 female), all refugees from Aivali, who had settled in different villages |
|
of the island of Lesbos. The data collection was carried out in 2002-2003, after obtaining a written |
|
consent of the informants, as well as the approval of the Ethics committee of the University of Patras. |
|
The corpus has a total duration of almost 14 hours. It has been transcribed and annotated by |
|
two native speakers of the dialect, using a transcription system based on the Greek alphabet |
|
and orthography, which is adapted according to SAMPA. The annotations include metadata information, |
|
such as the source of the data, the identity and background of the informants, and the conditions of |
|
the data collection. The corpus is stored on the server of the Laboratory of Modern Greek Dialects of |
|
the University of Patras and is [freely accessible online](http://amigredb.philology.upatras.gr) |
|
|
|
To prepare the dataset, the texts were normalized (see [greek_dialects_asr/](https://gitlab.com/ilsp-spmd-all/speech/greek_dialects_asr/) for scripts), |
|
and all audio files were converted into a 16 kHz mono format. |
|
We split the Praat annotations into audio-transcription segments, which resulted in a dataset of a total duration of 10h 14m 44s. |
|
Note that the removal of music, long pauses, and non-transcribed segments leads to a reduction of the total audio duration (compared to the initial 14h recordings). |
|
|
|
## Metrics |
|
|
|
We evaluated the model on the test set split, which consists of 10% of the dataset recordings. |
|
|
|
|Model|CER|WER| |
|
|---|---|---| |
|
|pre-trained|104.80%|113.67%| |
|
|fine-tuned|39.55%|73.83%| |
|
|
|
## Training hyperparameters |
|
|
|
We fine-tuned the baseline model (`wav2vec2-large-xlsr-53-greek`) on an NVIDIA GeForce RTX 3090, using the following hyperparameters: |
|
|
|
| arg | value | |
|
|-------------------------------|-------| |
|
| `per_device_train_batch_size` | 8 | |
|
| `gradient_accumulation_steps` | 2 | |
|
| `num_train_epochs` | 35 | |
|
| `learning_rate` | 3e-4 | |
|
| `warmup_steps` | 500 | |
|
|
|
## Citation |
|
|
|
To cite this work or read more about the training pipeline, see: |
|
|
|
S. Vakirtzian, C. Tsoukala, S. Bompolas, K. Mouzou, V. Stamou, G. Paraskevopoulos, A. Dimakis, S. Markantonatou, A. Ralli, A. Anastasopoulos, Speech Recognition for Greek Dialects: A Challenging Benchmark, Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), 2024. |