Linguistic Atlas and Survey of Irish Dialects, volume 1
Foinse was an Irish-language magazine site. This script uses a list of articles retrieved from the Wayback Machine to build a corpus
The corpus consists of 317 speakers recorded in 554 sessions, where each session consists of 20 read sentences and 10 phonetically rich words. The size of the audio portion of the corpus amounts to around 56 hours, with transcriptions containing 356674 words from a vocabulary of size 46361. Note that in order to limit the required storage for p...
A collection of 97 hours of parliamentary speeches published on the ClarinPL website Note that in order to limit the required storage for preparing this dataset, the audio is stored in the .wav format and is not converted to a float32 array. To convert the audio file to a float32 array, please make use of the `.map()` function as follows: ```p...