Update README.md

d9ed113 verified 7 months ago

4.57 kB

	---
	metrics:
	- wer
	- cer
	library_name: transformers
	pipeline_tag: automatic-speech-recognition
	tags:
	- Aivaliot
	- Greek dialect
	---

	# xls-r-greek-aivaliot

	Aivaliot is a variety of Greek that was spoken in Aivali (known as Ayvalık in Turkish),
	located on the Edremit Gulf in Western Turkey, till the beginning of the 20th century.
	After the end of the war between Greece and Turkey (1919–1922) and the defeat of the Greek army,
	those Aivaliots who managed to survive flew to Greece, principally to the nearby island of Lesbos,
	where they settled in various dialectal enclaves. Aivaliot resembles Lesbian in many respects.
	According to Ralli (Ralli, 2019), Aivaliot and Lesbian belong to the group of Northern Greek Dialects,
	sharing unstressed /i/ and /u/ deletion and unstressed /o/ and /e/ raising.
	Aivaliot morphology and the lexicon are influenced by Turkish, because of a long domination
	by the Ottomans, as well as by Italo-Romance, due to the pre-Ottoman Genovese rule and trade with Venice (Ralli, 2019b).
	However, there are no Turkish or Italo-Romance influences on phonology or syntax.
	In 2002, a handful of first-generation Aivaliot speakers could still be found in Lesbos and
	elsewhere in Greece and abroad, where they still remembered and practiced their mother tongue (Ralli, 2019).
	Nowadays, the dialect is on the way to extinction, since second-generation speakers either have
	a passive knowledge of it, or those living in Lesbos mix their own dialectal variety with the parent Lesbian.

	This is the first automatic speech recognition (ASR) model for Aivaliot.
	To train the model, we fine-tuned a Greek XLS-R model ([jonatasgrosman/wav2vec2-large-xlsr-53-greek](https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-greek)) on the Aivaliot resources.

	## Resources

	We used recordings from the Asia Minor Archive (AMiGre) to train the model. AMiGre was compiled within the
	framework of two research projects that ran in the periods 2002-2005 and 2012-2016.
	We obtained permission to use it from the studies’ authors. It consists of narratives elicited from
	18 elderly speakers (5 male, 13 female), all refugees from Aivali, who had settled in different villages
	of the island of Lesbos. The data collection was carried out in 2002-2003, after obtaining a written
	consent of the informants, as well as the approval of the Ethics committee of the University of Patras.
	The corpus has a total duration of almost 14 hours. It has been transcribed and annotated by
	two native speakers of the dialect, using a transcription system based on the Greek alphabet
	and orthography, which is adapted according to SAMPA. The annotations include metadata information,
	such as the source of the data, the identity and background of the informants, and the conditions of
	the data collection. The corpus is stored on the server of the Laboratory of Modern Greek Dialects of
	the University of Patras and is [freely accessible online](http://amigredb.philology.upatras.gr)

	To prepare the dataset, the texts were normalized (see [greek_dialects_asr/](https://gitlab.com/ilsp-spmd-all/speech/greek_dialects_asr/) for scripts),
	and all audio files were converted into a 16 kHz mono format.
	We split the Praat annotations into audio-transcription segments, which resulted in a dataset of a total duration of 10h 14m 44s.
	Note that the removal of music, long pauses, and non-transcribed segments leads to a reduction of the total audio duration (compared to the initial 14h recordings).

	## Metrics

	We evaluated the model on the test set split, which consists of 10% of the dataset recordings.

	\|Model\|CER\|WER\|
	\|---\|---\|---\|
	\|pre-trained\|104.80%\|113.67%\|
	\|fine-tuned\|39.55%\|73.83%\|

	## Training hyperparameters

	We fine-tuned the baseline model (`wav2vec2-large-xlsr-53-greek`) on an NVIDIA GeForce RTX 3090, using the following hyperparameters:

	\| arg \| value \|
	\|-------------------------------\|-------\|
	\| `per_device_train_batch_size` \| 8 \|
	\| `gradient_accumulation_steps` \| 2 \|
	\| `num_train_epochs` \| 35 \|
	\| `learning_rate` \| 3e-4 \|
	\| `warmup_steps` \| 500 \|

	## Citation

	To cite this work or read more about the training pipeline, see:

	S. Vakirtzian, C. Tsoukala, S. Bompolas, K. Mouzou, V. Stamou, G. Paraskevopoulos, A. Dimakis, S. Markantonatou, A. Ralli, A. Anastasopoulos, Speech Recognition for Greek Dialects: A Challenging Benchmark, Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), 2024.