new model added

Browse files

Files changed (10) hide show

README.md +9 -8
data_processing/README.md +0 -40
data_processing/ca_multi2vckt.py +0 -152
data_processing/extract_festcat.py +0 -139
data_processing/extract_google_tts.py +0 -168
data_processing/festcat_processing_test.sh +0 -152
data_processing/google_tts_processing_test.sh +0 -124
data_processing/process_data.sh +0 -56
model/best_model.pth +2 -2
model/config.json +0 -0

README.md CHANGED Viewed

@@ -14,18 +14,21 @@ tags:
 - pytorch
 datasets:
-- mozilla-foundation/common_voice_8_0
-- openslr
 ---
 # Aina Project's Catalan multi-speaker text-to-speech model
 ## Model description
-This model was trained from scratch using the [Coqui TTS](https://github.com/coqui-ai/TTS) toolkit on a combination of 3 datasets: [Festcat](http://festcat.talp.cat/devel.php), high quality open speech dataset of [Google](http://openslr.org/69/) (can be found in [OpenSLR 69](https://huggingface.co/datasets/openslr/viewer/SLR69/train)) and [Common Voice v8](https://commonvoice.mozilla.org/ca). For the training, 101460 utterances consisting of 257 speakers were used, which corresponds to nearly 138 hours of speech.
 A live inference demo can be found in our spaces, [here](https://huggingface.co/spaces/projecte-aina/tts-ca-coqui-vits-multispeaker).
 ## Intended uses and limitations
 You can use this model to generate synthetic speech in Catalan with different voices.
@@ -33,7 +36,7 @@ You can use this model to generate synthetic speech in Catalan with different vo
 ## How to use
 ### Usage
-Requiered libraries:
 ```bash
 pip install git+https://github.com/coqui-ai/TTS@dev#egg=TTS
@@ -70,8 +73,6 @@ wavs = synthesizer.tts(text, speaker_idx)
 ## Training
 ### Training Procedure
 ### Data preparation
-The data has been processed using the script [process_data.sh](https://huggingface.co/projecte-aina/tts-ca-coqui-vits-multispeaker/blob/main/data_processing/process_data.sh), which reduces the sampling frequency of the audios, eliminates silences, adds padding and structures the data in the format accepted by the framework. You can find more information [here](https://huggingface.co/projecte-aina/tts-ca-coqui-vits-multispeaker/blob/main/data_processing/README.md).
 ### Hyperparameter
 The model is based on VITS proposed by [Kim et al](https://arxiv.org/abs/2106.06103). The following hyperparameters were set in the coqui framework.
@@ -106,7 +107,7 @@ The model was trained for 730962 steps.
 ## Additional information
 ### Author
-Text Mining Unit (TeMU) at the Barcelona Supercomputing Center (bsc-temu@bsc.es)
 ### Contact information
 For further information, send an email to aina@bsc.es
@@ -119,7 +120,7 @@ Copyright (c) 2022 Text Mining Unit at Barcelona Supercomputing Center
 [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
 ### Funding
-This work was funded by the [Generalitat de Catalunya](https://politiquesdigitals.gencat.cat/ca/inici/index.html#googtrans(ca|en) within the framework of [Projecte AINA](https://politiquesdigitals.gencat.cat/ca/economia/catalonia-ai/aina).
 ## Disclaimer

 - pytorch
 datasets:
+- mozilla-foundation/common_voice_12_0
+- projecte-aina/festcat_trimmed_denoised
+- projecte-aina/openslr-slr69-ca-trimmed-denoised
 ---
 # Aina Project's Catalan multi-speaker text-to-speech model
 ## Model description
+This model was trained from scratch using the [Coqui TTS](https://github.com/coqui-ai/TTS) toolkit on a combination of 3 datasets: [Festcat](http://festcat.talp.cat/devel.php), [OpenSLR69](http://openslr.org/69/) and [Common Voice v12](https://commonvoice.mozilla.org/ca). For the training, we used 487 hours of recordings from 255 speakers. We have trimmed and denoised the data which all except Common Voice can be found in a seperate dataset in [festcat_trimmed_denoised](projecte-aina/festcat_trimmed_denoised) and [openslr69_trimmed_denoised](projecte-aina/openslr-slr69-ca-trimmed-denoised).
 A live inference demo can be found in our spaces, [here](https://huggingface.co/spaces/projecte-aina/tts-ca-coqui-vits-multispeaker).
+The model needs our fork of [espeak-ng](https://github.com/projecte-aina/espeak-ng) to work correctly. For installation and deployment please consult the docker file of our [inference demo](https://huggingface.co/spaces/projecte-aina/tts-ca-coqui-vits-multispeaker/blob/main/Dockerfile).
 ## Intended uses and limitations
 You can use this model to generate synthetic speech in Catalan with different voices.
 ## How to use
 ### Usage
+Required libraries:
 ```bash
 pip install git+https://github.com/coqui-ai/TTS@dev#egg=TTS
 ## Training
 ### Training Procedure
 ### Data preparation
 ### Hyperparameter
 The model is based on VITS proposed by [Kim et al](https://arxiv.org/abs/2106.06103). The following hyperparameters were set in the coqui framework.
 ## Additional information
 ### Author
+Language Technologies Unit (LangTech) at the Barcelona Supercomputing Center
 ### Contact information
 For further information, send an email to aina@bsc.es
 [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
 ### Funding
+This work was funded by the [Generalitat de Catalunya](https://politiquesdigitals.gencat.cat/ca/inici/index.html#googtrans(ca|en) within the framework of [Projecte AINA](https://projecteaina.cat).
 ## Disclaimer

data_processing/README.md DELETED Viewed

@@ -1,40 +0,0 @@
-# Data preparation
-Scripts to process [festcat](http://festcat.talp.cat/devel.php) and [google_tts](http://openslr.org/69/) datasets, to make them compatible with training of modern TTS architectures
-## Requirements
-`sox`, `ffmpeg`
-### Processing steps
-#### Downloads
-Download [festcat](http://festcat.talp.cat/devel.php) and [google_tts](http://openslr.org/69/)
-#### Variables definition
-Open the shell script `.../data_processing/process_data.sh` and modify the following fields:
-```bash
-### Festcat variables ###
-export PATH_TO_FESTCAT_SHELL='.../data_processing/festcat_processing_test.sh'         # Absolute path to festcat_processing_test.sh script
-export PATH_TO_FESTCAT_PY='.../data_processing/extract_festcat.py'                    # Absolute path to extract_festcat.py script
-export PATH_TO_FESTCAT_DATA='.../festcat/'                                            # Path to Festcat dataset
-export FESTCAT_FINAL_PATH='.../festcat_processed'                                     # Path where preprocessed Festcat will be stored
-### Google_tts variables ###
-export PATH_TO_GOOGLE_TTS_SHELL='.../data_processing/google_tts_processing_test.sh'   # Absolute path to google_tts_processing_test.sh script
-export PATH_TO_GOOGLE_TTS_PY='.../data_processing/extract_google_tts.py'              # Absolute path to extract_google_tts.py script
-export PATH_TO_GOOGLE_TTS_DATA='.../google_tts'                                       # Path to Google TTS dataset
-export GOOGLE_TTS_FINAL_PATH='.../google_tts_processed'                               # Path where preprocessed Google TTS will be stored
-### General variables ###
-export VCTK_FORMATER_PATH='.../data_processing/ca_multi2vckt.py'                      # Absolute path to ca_multi2vckt.py script
-export FINAL_PATH='.../multispeaker_ca_test/'                                         # Path where preprocessed and vctk formatted datasets will be stored.
-```
-#### Run preprocessing
-Once the variables are correctly defined, execute the following command in the terminal:
-`sh <...>/data_processing/process_data.sh`
-The processed data in vctk format will be in the directory defined in `export FINAL_PATH='.../multispeaker_ca_test/'`.

data_processing/ca_multi2vckt.py DELETED Viewed

@@ -1,152 +0,0 @@
-import os
-import re
-import argparse
-from glob import glob
-from pathlib import Path
-from subprocess import call
-def main():
-    my_parser = argparse.ArgumentParser()
-    my_parser.add_argument('--google-path',
-                       metavar='path',
-                       type=str,
-                       help='the path to tsv file')
-    my_parser.add_argument('--festcat-path',
-                       metavar='path',
-                       type=str,
-                       help='the path to wavs file')
-    #my_parser.add_argument('--cv-path',
-    #                   metavar='path',
-    #                   type=str,
-    #                   help='the path to wavs file')
-    my_parser.add_argument('--final-path',
-                       metavar='path',
-                       type=str,
-                       help='the path to wavs file')
-    args = my_parser.parse_args()
-    google_path = args.google_path
-    festcat_path = args.festcat_path
-    #common_voice_path = args.cv_path
-    target_base_path = args.final_path
-    google_tts_male = google_path + "/male/"
-    google_tts_female = google_path + "/female/"
-    google_tts_paths = [google_tts_male, google_tts_female]
-    #google_tts_paths = ["/gpfs/scratch/bsc88/bsc88858/google_tts/male/","/gpfs/scratch/bsc88/bsc88858/google_tts/female/"]
-    #festcat_path = "/gpfs/scratch/bsc88/bsc88858/festcat/"
-    #common_voice_path = "/gpfs/scratch/bsc88/bsc88858/cv-corpus-9.0-2022-04-27/ca/"
-    #target_base_path = "/gpfs/scratch/bsc88/bsc88474/data/multispeaker_ca/"
-    if os.path.exists(google_path):
-        print("Converting google_tts data to vctk format")
-        convert_google(google_tts_paths, target_base_path)
-    else:
-        print("Google_tts processed data not found")
-    if os.path.exists(festcat_path):
-        print("Converting festcat data to vctk format")
-        convert_festcat(festcat_path, target_base_path)
-    else:
-        print("Festcat processed data not found")
-    #convert_cv(common_voice_path, target_base_path)
-def convert_google(google_tts_paths, target_base_path):
-    for g_path in google_tts_paths[:1]:
-        meta_files = glob(f"{g_path}/*_*.txt")
-        for meta_file in meta_files:
-            print(meta_file)
-            for line in open(meta_file).readlines():
-                text_id, text = line.strip().split('|')
-                text.replace('¿','')
-                text.replace('¡','')
-                #speaker_id =  '_'.join(text_id.split('_')[:2])
-                speaker_id = text_id.split('_')[1]
-                target_text_file = os.path.join(target_base_path, 'txt',
-                                                speaker_id, text_id+'.txt')
-                target_wav_file = os.path.join(target_base_path, 'wav',
-                                               speaker_id, text_id+'.wav')
-                source_wav_file = os.path.join(g_path, 'wavs', text_id+'.wav')
-                speaker_paths = [os.path.dirname(target_text_file),
-                                 os.path.dirname(target_wav_file)]
-                convert_meta(target_text_file, target_wav_file,
-                             source_wav_file, speaker_paths, text)
-def convert_meta(target_text_file,
-                 target_wav_file,
-                 source_wav_file,
-                 speaker_paths, text):
-                # create directories
-                for speaker_path in speaker_paths:
-                    if not os.path.isdir(speaker_path):
-                        os.mkdir(speaker_path)
-                # write text file
-                with open(target_text_file, 'w') as out:
-                    out.write(text)
-                # copy wav file
-                try:
-                    os.path.isfile(source_wav_file)
-                except:
-                    raise IOError('{} does not exist'.format(source_wav_file))
-                cp_args = ['cp', source_wav_file, target_wav_file]
-                if not os.path.isfile(target_wav_file):
-                    #print(' '.join(cp_args))
-                    call(cp_args)
-def convert_festcat(festcat_path, target_base_path):
-    meta_files = glob(f"{festcat_path}/*/*_train.txt")
-    for meta_file in meta_files:
-        speaker_name = meta_file.split(os.sep)[-2]
-        print(meta_file)
-        for line in open(meta_file).readlines():
-            if '[' not in line:
-                text_id, text = line.strip().split('|')
-                text.replace('¿','')
-                text.replace('¡','')
-                #speaker_id =  '_'.join(text_id.split('_')[:3])
-                speaker_id = speaker_name
-                target_text_file = os.path.join(target_base_path, 'txt',
-                                                speaker_id, text_id+'.txt')
-                target_wav_file = os.path.join(target_base_path, 'wav',
-                                               speaker_id, text_id+'.wav')
-                source_wav_file = os.path.join(festcat_path, speaker_name,
-                                               'wavs', text_id+'.wav')
-                speaker_paths = [os.path.dirname(target_text_file),
-                                 os.path.dirname(target_wav_file)]
-                convert_meta(target_text_file, target_wav_file,
-                             source_wav_file, speaker_paths, text)
-            else:
-                print('line: {} skipped'.format(line))
-def convert_cv(common_voice_path, target_base_path):
-    meta_files = glob(f"{common_voice_path}/*.txt")
-    for meta_file in meta_files:
-        print(meta_file)
-        speaker_id = meta_file.split(os.sep)[-1].replace("ca_","").replace(".txt","")
-        for line in open(meta_file).readlines():
-            text_id, text = line.strip().split('|')
-            target_text_file = os.path.join(target_base_path, 'txt',
-                                            speaker_id, text_id+'.txt')
-            target_wav_file = os.path.join(target_base_path, 'wav',
-                                           speaker_id, text_id+'.wav')
-            source_wav_file = os.path.join(common_voice_path,
-                                           'wavs', text_id+'.wav')
-            speaker_paths = [os.path.dirname(target_text_file),
-                             os.path.dirname(target_wav_file)]
-            convert_meta(target_text_file, target_wav_file,
-                         source_wav_file, speaker_paths, text)
-if __name__ == "__main__":
-    main()

data_processing/extract_festcat.py DELETED Viewed

@@ -1,139 +0,0 @@
-import os
-import re
-import json
-import subprocess
-import argparse
-import logging
-logger = logging.getLogger(__name__)
-def main():
-    my_parser = argparse.ArgumentParser()
-    my_parser.add_argument('--utterance-path',
-                       metavar='path',
-                       type=str,
-                       help='the path to utterance file')
-    my_parser.add_argument('--wavs-path',
-                       metavar='path',
-                       type=str,
-                       help='the path to wavs file')
-    my_parser.add_argument('--locutors',
-                       metavar='N',
-                       type=str,
-                       help='list of speakers names/id separated with commas')
-    args = my_parser.parse_args()
-    locutors = args.locutors
-    locutors = locutors.replace(" ", "");
-    locutors = locutors.split(",")
-    utterance_path = args.utterance_path
-    wavs_path = args.wavs_path
-    for locutor in locutors:
-        # get durations
-        durations = get_durations_dict(wavs_path + '%s_sil_stats.csv'%locutor)
-        aggregate_duration = 0
-        rejected_duration = 0
-        large_duration = 0
-        total_duration = 0
-        path = 'upc_ca_%s_utt/utt'%locutor
-        path = utterance_path + path
-        files = []
-        long_files = []
-        for filename in os.listdir(path):
-            sentence = get_sentence(os.path.join(path, filename))
-            audio_filename = filename.replace('.utt','.wav') # upc_ca_pep_203479.wav
-            if sentence:
-                target_path = 'upc_ca_%s_wav_22k_sil_pad'%locutor
-                target_path = wavs_path + target_path
-                source_filename = 'upc_ca_%s_wav_22k_sil/'%locutor+audio_filename
-                source_filename = wavs_path + source_filename
-                total_duration += durations[audio_filename]
-                if os.path.isfile(source_filename):
-                    if durations[audio_filename] < 10.0:
-                        aggregate_duration += durations[audio_filename]
-                        files.append((os.path.join(target_path,audio_filename), sentence))
-                        #subprocess.call(['cp',source_filename, target_filename])
-                    else:
-                        long_files.append((audio_filename, sentence))
-                        large_duration += durations[audio_filename]
-                else:
-                    print(audio_filename)
-            else:
-                rejected_duration += durations[audio_filename]
-        out(args, locutor, files)
-        out_long(args, locutor, long_files)
-        out_long_json(args, locutor, long_files)
-        print(locutor, aggregate_duration/3600, 'hours')
-        print(locutor, 'rejected due to duration', large_duration/3600, 'hours')
-        print(locutor, 'rejected', rejected_duration/60, 'minutes')
-        print(locutor, total_duration, aggregate_duration+rejected_duration+large_duration)
-def get_durations_dict(filename):
-    durations = {}
-    for line in open(filename).readlines():
-        d = line.split(',')
-        durations[d[0].split('/')[-1]] = float(d[1])
-    return durations
-def get_sentence(filename):
-    utt_all = open(filename, encoding = "ISO-8859-1").read()
-    m = re.search('(\"\\\\\")(.+)(\\\\\"\")', utt_all)
-    sentence = m.groups()[1]
-    # delete interword dashes
-    sentence = re.sub('-(?=([A-Z]))', ' ', sentence)
-    if not re.search('\d', sentence):
-        return sentence
-    else:
-        #print(filename, sentence)
-        return None
-def out(args, locutor, files):
-    outname_length = [('upc_%s_test.txt'%locutor,0),
-                      ('upc_%s_val.txt'%locutor,0),
-                      ('upc_%s_train.txt'%locutor,len(files))]
-    l_sum = sum([el[1] for el in outname_length])
-    if len(files) != l_sum:
-        msg = 'train vs test val distribution wrong: %i'%l_sum
-        raise ValueError('msg')
-    for fout, l in outname_length:
-        open((args.wavs_path + fout), mode= 'a').close()
-        logger.warning(f"fout: {fout}")
-        logger.warning(f"l: {l}")
-        logger.warning(f"Enable l: {len(files)-100}")
-        logger.warning(f"Files: {files}")
-        with open((args.wavs_path + fout), 'w') as out:
-            for i in range(l):
-                f, sentence = files.pop()
-                out.write('%s|%s\n'%(f.split("/")[-1].split(".")[-2],sentence))
-def out_long(args, locutor, files):
-    outname = '%s_longsentences.csv'%locutor
-    outname_path = args.wavs_path + outname
-    open(outname_path, mode= 'a').close()
-    with open(outname_path, 'w') as out:
-        for audio, text in files:
-            out.write('%s,"%s"\n'%(audio, text))
-def out_long_json(args, locutor, files):
-    outname = '%s_longsentences.json'%locutor
-    source = args.wavs_path +'upc_ca_%s_wav_22k_sil/'%locutor
-    outname_path = args.wavs_path + outname
-    open(outname_path, mode= 'a').close()
-    interventions = []
-    for audio, text in files:
-        intervention = {}
-        intervention['text'] = [(locutor, text)]
-        intervention['urls'] = [(locutor, os.path.join(source,audio))]
-        interventions.append(intervention)
-    with open(outname_path, 'w') as out:
-        json.dump({'session': interventions}, out, indent=2)
-if __name__ == "__main__":
-    main()

data_processing/extract_google_tts.py DELETED Viewed

@@ -1,168 +0,0 @@
-import os
-import re
-import json
-import argparse
-import logging
-import csv
-import numpy as np
-logger = logging.getLogger(__name__)
-def main():
-    my_parser = argparse.ArgumentParser()
-    my_parser.add_argument('--tsv-path',
-                       metavar='path',
-                       type=str,
-                       help='the path to tsv file')
-    my_parser.add_argument('--wavs-path',
-                       metavar='path',
-                       type=str,
-                       help='the path to wavs file')
-    my_parser.add_argument('--locutors',
-                       metavar='N',
-                       type=str,
-                       help='list of speakers names/id separated with commas')
-    args = my_parser.parse_args()
-    locutors = args.locutors
-    locutors = locutors.replace(" ", "");
-    locutors = locutors.split(",")
-    tsv_path = args.tsv_path
-    wavs_path = args.wavs_path
-    for locutor in locutors:
-        # get durations
-        durations = get_durations_dict(wavs_path + '%s_sil_stats.csv'%locutor)
-        aggregate_duration = 0
-        rejected_duration = 0
-        large_duration = 0
-        total_duration = 0
-        tsv_name = "line_index_%s.tsv"%locutor
-        tsv_path = tsv_path + tsv_name
-        tsv_file = open(tsv_path)
-        read_tsv = csv.reader(tsv_file, delimiter="\t")
-        files = []
-        long_files = []
-        for row in read_tsv:
-            audio_filename = row[0] + ".wav"
-            #logger.warning(f"Audio_filename {audio_filename}")
-            sentence = row[-1]
-            if sentence:
-                target_path = 'ca_es_%s_22k_sil_pad'%locutor
-                target_path = wavs_path + target_path
-                source_filename = 'ca_es_%s_22k_sil/'%locutor+audio_filename ###
-                source_filename = wavs_path + source_filename
-                #logger.warning(f"source_filename {source_filename}")
-                total_duration += durations[audio_filename]
-                if os.path.isfile(source_filename):
-                    if durations[audio_filename] < 10.0:
-                        aggregate_duration += durations[audio_filename]
-                        files.append((os.path.join(target_path,audio_filename), sentence))
-                        #subprocess.call(['cp',source_filename, target_filename])
-                    else:
-                        long_files.append((audio_filename, sentence))
-                        large_duration += durations[audio_filename]
-                else:
-                    print(audio_filename)
-            else:
-                rejected_duration += durations[audio_filename]
-        speakers_id = find_speakers_id(wavs_path + '%s_sil_stats.csv'%locutor)
-        for id in speakers_id:
-            speaker_file = files_spliter(files = files, speaker_id = id)
-            if len(speaker_file) == 0:
-                continue
-            else:
-                out(args, speaker_id = id, files = speaker_file)
-                #print(f"mv {wavs_path}ca_{id}_test.txt  {wavs_path}{locutor}")
-                #os.system(f"mv {wavs_path}ca_{id}_test.txt  {wavs_path}{locutor}")
-                #os.system(f"mv {wavs_path}ca_{id}_val.txt  {wavs_path}{locutor}")
-                #os.system(f"mv {wavs_path}ca_{id}_train.txt  {wavs_path}{locutor}")
-        #out(args, locutor, files)
-        out_long(args, locutor, long_files)
-        out_long_json(args, locutor, long_files)
-        print(locutor, aggregate_duration/3600, 'hours')
-        print(locutor, 'rejected due to duration', large_duration/3600, 'hours')
-        print(locutor, 'rejected', rejected_duration/60, 'minutes')
-        print(locutor, total_duration, aggregate_duration+rejected_duration+large_duration)
-def get_durations_dict(filename):
-    durations = {}
-    for line in open(filename).readlines():
-        d = line.split(',')
-        durations[d[0].split('/')[-1]] = float(d[1])
-    return durations
-def get_sentence(filename):
-    utt_all = open(filename, encoding = "ISO-8859-1").read()
-    m = re.search('(\"\\\\\")(.+)(\\\\\"\")', utt_all)
-    sentence = m.groups()[1]
-    # delete interword dashes
-    sentence = re.sub('-(?=([A-Z]))', ' ', sentence)
-    if not re.search('\d', sentence):
-        return sentence
-    else:
-        print(filename, sentence)
-        return None
-def out(args, speaker_id, files):
-    outname_length = [('ca_%s_test.txt'%speaker_id,0),
-                      ('ca_%s_val.txt'%speaker_id,0),
-                      ('ca_%s_train.txt'%speaker_id,len(files))]
-    l_sum = sum([el[1] for el in outname_length])
-    if len(files) != l_sum:
-        msg = 'train vs test val distribution wrong: %i'%l_sum
-        raise ValueError('msg')
-    for fout, l in outname_length:
-        open((args.wavs_path + fout), mode= 'a').close()
-        with open((args.wavs_path + fout), 'w') as out:
-            for i in range(l):
-                f, sentence = files.pop()
-                out.write('%s|%s\n'%(f.split("/")[-1].split(".")[-2],sentence))
-    print(len(files))
-def out_long(args, locutor, files):
-    outname = '%s_longsentences.csv'%locutor
-    outname_path = args.wavs_path + outname
-    open(outname_path, mode= 'a').close()
-    with open(outname_path, 'w') as out:
-        for audio, text in files:
-            out.write('%s,"%s"\n'%(audio, text))
-def out_long_json(args, locutor, files):
-    outname = '%s_longsentences.json'%locutor
-    source = args.wavs_path +'ca_es_%s_22k_sil/'%locutor
-    outname_path = args.wavs_path + outname
-    open(outname_path, mode= 'a').close()
-    interventions = []
-    for audio, text in files:
-        intervention = {}
-        intervention['text'] = [(locutor, text)]
-        intervention['urls'] = [(locutor, os.path.join(source,audio))]
-        interventions.append(intervention)
-    with open(outname_path, 'w') as out:
-        json.dump({'session': interventions}, out, indent=2)
-def find_speakers_id(path_tsv):
-  durations = {}
-  for line in open(path_tsv).readlines():
-      d = line.split(',')
-      durations[d[0].split('/')[-1]] = float(d[1])
-  keysList = list(durations.keys())
-  for index in range(len(keysList)):
-    keysList[index] = keysList[index].split("_")[1]
-  keysList = np.ndarray.tolist(np.unique(np.array(keysList)))
-  return keysList
-def files_spliter(files, speaker_id):
-  out_file = []
-  for element in files:
-    if element[0].split("/")[-1].split("_")[1] == speaker_id:
-      out_file.append(element)
-  return out_file
-if __name__ == "__main__":
-    main()

data_processing/festcat_processing_test.sh DELETED Viewed

@@ -1,152 +0,0 @@
-#!/bin/sh
-export FINAL_PATH=$1
-export SOURCE_PATH=$2
-export EXTRACT_PATH=$3
-module load gcc/8.3.0 cuda/10.2 cudnn/7.6.4 nccl/2.4.8 tensorrt/6.0.1 openmpi/4.0.1 atlas scalapack/2.0.2 fftw/3.3.8 szip/2.1.1 ffmpeg/4.2.1 opencv/4.1.1 szip/2.1.1 ffmpeg/4.2.1 opencv/4.1.1 python/3.7.4_ML torch/1.9.0a0 fairseq/2021-10-04 llvm/10.0.1 mecab/0.996
-for name in bet eli eva jan mar ona pau pep pol teo uri
-do
-echo "Processing $name data"
-export SPEAKER_NAME=$name
-export OUTPUT_CSV="${FINAL_PATH}/${SPEAKER_NAME}/${SPEAKER_NAME}_sil_stats.csv"
-export UTTERANCE_PATH="${SOURCE_PATH}/${SPEAKER_NAME}/"
-if [ -d "${FINAL_PATH}" ]; then
-  ### Take action if $DIR exists ###
-  echo "Path ${FINAL_PATH} already created"
-else
-  ###  Control will jump here if $DIR does NOT exists ###
-  mkdir ${FINAL_PATH}
-  echo "Crating: ${FINAL_PATH} "
-fi
-if [ -d "${FINAL_PATH}/${SPEAKER_NAME}" ]; then
-  ### Take action if $DIR exists ###
-  echo "Path ${FINAL_PATH}/${SPEAKER_NAME} already created"
-else
-  ###  Control will jump here if $DIR does NOT exists ###
-  mkdir ${FINAL_PATH}/${SPEAKER_NAME}
-  echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME} "
-fi
-if [ -d "${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav" ]; then
-  ### Take action if $DIR exists ###
-  echo "Path ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav already created"
-else
-  ###  Control will jump here if $DIR does NOT exists ###
-  mkdir ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav
-  echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav "
-fi
-if [ -z "$(ls -A ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav/)" ]; then
-   i=1
-   sp="/-\|"
-   for f in ${SOURCE_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_raw/recordings/*.raw; do
-   t=${f%.raw}.wav; g=${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav/${t##*/}; sox -t raw -r 48k -e signed -b 16 -c 1 $f $g;
-   printf "\r Converiting .raw audios to .wav ${sp:i++%${#sp}:1}"
-   sleep 0.05
-   done
-else
-   echo "Already converted to .wav"
-fi
-if [ -d "${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k" ]; then
-  ### Take action if $DIR exists ###
-  echo "Path ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k already created"
-else
-  ###  Control will jump here if $DIR does NOT exists ###
-  mkdir ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k
-  echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k "
-fi
-if [ -z "$(ls -A ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k/)" ]; then
-   i=1
-   sp="/-\|"
-   for f in ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav/*.wav; do
-   t=${f##*/}; ffmpeg -i $f -ar 22050 ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k/$t -v error < /dev/null;
-   printf "\r Converiting audios of 48kHz to 22kHz ${sp:i++%${#sp}:1}"
-   sleep 0.05
-   done;
-else
-   echo "Already converted to 22kHz file"
-fi
-if [ -d "${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil" ]; then
-  ### Take action if $DIR exists ###
-  echo "Path ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil already created"
-else
-  ###  Control will jump here if $DIR does NOT exists ###
-  mkdir ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil
-  echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil "
-fi
-if [ -z "$(ls -A ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil/)" ]; then
-   i=1
-   sp="/-\|"
-   for f in ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k/*.wav; do
-   t=${f##*/}; sox $f ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil/$t silence 1 0.02 0.5% reverse silence 1 0.02 0.5% reverse;
-   printf "\r Filtering silence ${sp:i++%${#sp}:1}"
-   sleep 0.05
-   done
-else
-   echo "Silence already eliminated"
-fi
-if [ -f "${OUTPUT_CSV}" ]; then
-  ### Take action if $DIR exists ###
-  echo "${OUTPUT_CSV} already exists!"
-else
-  ###  Control will jump here if $DIR does NOT exists ###
-  echo "Crating ${OUTPUT_CSV}"
-  for f in ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil/*.wav; do
-  d=`ffprobe -i $f -show_entries format=duration -v quiet -of csv="p=0"`;
-  echo $f,$d;
-  done >> ${OUTPUT_CSV}
-fi
-if [ -f "${FINAL_PATH}/${SPEAKER_NAME}/upc_${SPEAKER_NAME}_train.txt" ]; then
-  ### Take action if $DIR exists ###
-  echo "Splits already created!"
-else
-  ###  Control will jump here if $DIR does NOT exists ###
-  echo "Crating splits..."
-  python ${EXTRACT_PATH} --wavs-path ${FINAL_PATH}/${SPEAKER_NAME}/ --utterance-path ${UTTERANCE_PATH} --locutors ${SPEAKER_NAME}
-fi
-if [ -d "${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil_pad" ]; then
-  ### Take action if $DIR exists ###
-  echo "Path ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil_pad already created"
-else
-  ###  Control will jump here if $DIR does NOT exists ###
-  mkdir ${FINAL_PATH}/${SPEAKER_NAME}/wavs
-  echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME}/wavs"
-fi
-if [ -z "$(ls -A ${FINAL_PATH}/${SPEAKER_NAME}/wavs/)" ]; then
-   i=1
-   sp="/-\|"
-   for f in ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil/*.wav; do
-   t=${f##*/}; sox $f ${FINAL_PATH}/${SPEAKER_NAME}/wavs/$t pad 0 0.058;
-   printf "\r Adding pad ${sp:i++%${#sp}:1}"
-   sleep 0.05
-   done
-else
-   echo "Pad already added!"
-fi
-rm -r ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil
-rm -r ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k
-rm -r ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav
-done
-echo "Done!"

data_processing/google_tts_processing_test.sh DELETED Viewed

@@ -1,124 +0,0 @@
-#!/bin/sh
-export FINAL_PATH=$1
-export SOURCE_PATH=$2
-export EXTRACT_PATH=$3
-module load gcc/8.3.0 cuda/10.2 cudnn/7.6.4 nccl/2.4.8 tensorrt/6.0.1 openmpi/4.0.1 atlas scalapack/2.0.2 fftw/3.3.8 szip/2.1.1 ffmpeg/4.2.1 opencv/4.1.1 szip/2.1.1 ffmpeg/4.2.1 opencv/4.1.1 python/3.7.4_ML torch/1.9.0a0 fairseq/2021-10-04 llvm/10.0.1 mecab/0.996
-for name in male female
-do
-export SPEAKER_NAME=$name
-export OUTPUT_CSV="${FINAL_PATH}/${SPEAKER_NAME}/${SPEAKER_NAME}_sil_stats.csv"
-export UTTERANCE_PATH="${SOURCE_PATH}/${SPEAKER_NAME}/"
-if [ -d "${FINAL_PATH}" ]; then
-  ### Take action if $DIR exists ###
-  echo "Path ${FINAL_PATH} already created"
-else
-  ###  Control will jump here if $DIR does NOT exists ###
-  mkdir ${FINAL_PATH}
-  echo "Crating: ${FINAL_PATH} "
-fi
-if [ -d "${FINAL_PATH}/${SPEAKER_NAME}" ]; then
-  ### Take action if $DIR exists ###
-  echo "Path ${FINAL_PATH}/${SPEAKER_NAME} already created"
-else
-  ###  Control will jump here if $DIR does NOT exists ###
-  mkdir ${FINAL_PATH}/${SPEAKER_NAME}
-  echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME} "
-fi
-if [ -d "${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k" ]; then
-  ### Take action if $DIR exists ###
-  echo "Path ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k already created"
-else
-  ###  Control will jump here if $DIR does NOT exists ###
-  mkdir ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k
-  echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k "
-fi
-if [ -z "$(ls -A ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k/)" ]; then
-   i=1
-   sp="/-\|"
-   for f in ${SOURCE_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}/*.wav; do
-   t=${f##*/}; ffmpeg -i $f -ar 22050 ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k/$t -v error < /dev/null;
-   printf "\r Converiting audios of 48kHz to 22kHz ${sp:i++%${#sp}:1}"
-   sleep 0.05
-   done;
-else
-   echo "Already converted to 22kHz file"
-fi
-if [ -d "${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil" ]; then
-  ### Take action if $DIR exists ###
-  echo "Path ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil already created"
-else
-  ###  Control will jump here if $DIR does NOT exists ###
-  mkdir ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil
-  echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil "
-fi
-if [ -z "$(ls -A ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil/)" ]; then
-   i=1
-   sp="/-\|"
-   for f in ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k/*.wav; do
-   t=${f##*/}; sox $f ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil/$t silence 1 0.02 0.5% reverse silence 1 0.02 0.5% reverse;
-   printf "\r Filtering silence ${sp:i++%${#sp}:1}"
-   sleep 0.05
-   done
-else
-   echo "Silence has already been filtered!"
-fi
-if [ -f "${OUTPUT_CSV}" ]; then
-  ### Take action if $DIR exists ###
-  echo "${OUTPUT_CSV} already exists!"
-else
-  ###  Control will jump here if $DIR does NOT exists ###
-  echo "Crating ${OUTPUT_CSV}"
-  for f in ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil/*.wav; do
-  d=`ffprobe -i $f -show_entries format=duration -v quiet -of csv="p=0"`;
-  echo $f,$d;
-  done >> ${OUTPUT_CSV}
-fi
-if [ -f "${FINAL_PATH}/${SPEAKER_NAME}/ca_01591_train.txt" ]; then
-  ### Take action if $DIR exists ###
-  echo "Splits already created!"
-else
-  ###  Control will jump here if $DIR does NOT exists ###
-  echo "Crating splits..."
-  python ${EXTRACT_PATH} --wavs-path ${FINAL_PATH}/${SPEAKER_NAME}/ --tsv-path ${SOURCE_PATH}/${SPEAKER_NAME}/ --locutors ${SPEAKER_NAME}
-fi
-if [ -d "${FINAL_PATH}/${SPEAKER_NAME}/wavs" ]; then
-  ### Take action if $DIR exists ###
-  echo "Path ${FINAL_PATH}/${SPEAKER_NAME}/wavs"
-else
-  ###  Control will jump here if $DIR does NOT exists ###
-  mkdir ${FINAL_PATH}/${SPEAKER_NAME}/wavs
-  echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME}/wavs"
-fi
-if [ -z "$(ls -A ${FINAL_PATH}/${SPEAKER_NAME}/wavs/)" ]; then
-   i=1
-   sp="/-\|"
-   for f in ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil/*.wav; do
-   t=${f##*/}; sox $f ${FINAL_PATH}/${SPEAKER_NAME}/wavs/$t pad 0 0.058;
-   printf "\r Adding pad ${sp:i++%${#sp}:1}"
-   sleep 0.05
-   done
-else
-   echo "Pad already added!"
-fi
-rm -r ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil
-rm -r ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k
-done
-echo "Done!"

data_processing/process_data.sh DELETED Viewed

@@ -1,56 +0,0 @@
-#!/bin/bash
-### Festcat variables ###
-export PATH_TO_FESTCAT_SHELL='/gpfs/scratch/bsc88/bsc88858/data_processing/festcat_processing_test.sh'
-export PATH_TO_FESTCAT_PY='/gpfs/scratch/bsc88/bsc88858/data_processing/extract_festcat.py'
-export PATH_TO_FESTCAT_DATA='/gpfs/scratch/bsc88/bsc88858/festcat/'
-export FESTCAT_FINAL_PATH='/gpfs/scratch/bsc88/bsc88858/festcat_processed'
-### Google_tts variables ###
-export PATH_TO_GOOGLE_TTS_SHELL='/gpfs/scratch/bsc88/bsc88858/data_processing/google_tts_processing_test.sh'
-export PATH_TO_GOOGLE_TTS_PY='/gpfs/scratch/bsc88/bsc88858/data_processing/extract_google_tts.py'
-export PATH_TO_GOOGLE_TTS_DATA='/gpfs/scratch/bsc88/bsc88858/google_tts'
-export GOOGLE_TTS_FINAL_PATH='/gpfs/scratch/bsc88/bsc88858/google_tts_processed'
-### General variables ###
-export VCTK_FORMATER_PATH='/gpfs/scratch/bsc88/bsc88858/data_processing/ca_multi2vckt.py'
-export FINAL_PATH='/gpfs/scratch/bsc88/bsc88858/multispeaker_ca_test/'
-if [ -d "${FESTCAT_FINAL_PATH}" ]; then
-  ### Take action if $DIR exists ###
-  echo "Path ${FESTCAT_FINAL_PATH} already exists"
-else
-  ###  Control will jump here if $DIR does NOT exists ###
-  if [ -d "${PATH_TO_FESTCAT_DATA}" ]; then
-    source ${PATH_TO_FESTCAT_SHELL} ${FESTCAT_FINAL_PATH} ${PATH_TO_FESTCAT_DATA} ${PATH_TO_FESTCAT_PY}
-  else
-    echo "Fescat data not found!"
-  fi
-fi
-if [ -d "${GOOGLE_TTS_FINAL_PATH}" ]; then
-  ### Take action if $DIR exists ###
-  echo "Path ${GOOGLE_TTS_FINAL_PATH} already exists"
-else
-  ###  Control will jump here if $DIR does NOT exists ###
-  if [ -d "${PATH_TO_GOOGLE_TTS_DATA}" ]; then
-    source ${PATH_TO_GOOGLE_TTS_SHELL} ${GOOGLE_TTS_FINAL_PATH} ${PATH_TO_GOOGLE_TTS_DATA} ${PATH_TO_GOOGLE_TTS_PY}
-  else
-    echo "Google TTS data not found!"
-  fi
-fi
-if [ -d "${FINAL_PATH}" ]; then
-  ### Take action if $DIR exists ###
-  echo "Path ${FINAL_PATH} already created"
-else
-  ###  Control will jump here if $DIR does NOT exists ###
-  mkdir ${FINAL_PATH}
-  mkdir ${FINAL_PATH}/txt/
-  mkdir ${FINAL_PATH}/wav/
-  echo "Crating: ${FINAL_PATH}"
-  python ${VCTK_FORMATER_PATH} --google-path ${GOOGLE_TTS_FINAL_PATH} --festcat-path ${FESTCAT_FINAL_PATH} --final-path ${FINAL_PATH}
-fi
-echo "Done!"

model/best_model.pth CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b15fa7d2052bada1cf421e49d2d03b00e95b49fcd0e42b7af1d92da2880cdecc
-size 1038659133

 version https://git-lfs.github.com/spec/v1
+oid sha256:7281afd683f92a46feb9068f5dcd96038b0b64b453deee25d147064b34e2dbcf
+size 1040801013

model/config.json CHANGED Viewed

Binary files a/model/config.json and b/model/config.json differ