new model added
Browse files- README.md +9 -8
- data_processing/README.md +0 -40
- data_processing/ca_multi2vckt.py +0 -152
- data_processing/extract_festcat.py +0 -139
- data_processing/extract_google_tts.py +0 -168
- data_processing/festcat_processing_test.sh +0 -152
- data_processing/google_tts_processing_test.sh +0 -124
- data_processing/process_data.sh +0 -56
- model/best_model.pth +2 -2
- model/config.json +0 -0
README.md
CHANGED
@@ -14,18 +14,21 @@ tags:
|
|
14 |
- pytorch
|
15 |
|
16 |
datasets:
|
17 |
-
- mozilla-foundation/
|
18 |
-
-
|
|
|
19 |
|
20 |
---
|
21 |
|
22 |
# Aina Project's Catalan multi-speaker text-to-speech model
|
23 |
## Model description
|
24 |
|
25 |
-
This model was trained from scratch using the [Coqui TTS](https://github.com/coqui-ai/TTS) toolkit on a combination of 3 datasets: [Festcat](http://festcat.talp.cat/devel.php),
|
26 |
|
27 |
A live inference demo can be found in our spaces, [here](https://huggingface.co/spaces/projecte-aina/tts-ca-coqui-vits-multispeaker).
|
28 |
|
|
|
|
|
29 |
## Intended uses and limitations
|
30 |
|
31 |
You can use this model to generate synthetic speech in Catalan with different voices.
|
@@ -33,7 +36,7 @@ You can use this model to generate synthetic speech in Catalan with different vo
|
|
33 |
## How to use
|
34 |
### Usage
|
35 |
|
36 |
-
|
37 |
|
38 |
```bash
|
39 |
pip install git+https://github.com/coqui-ai/TTS@dev#egg=TTS
|
@@ -70,8 +73,6 @@ wavs = synthesizer.tts(text, speaker_idx)
|
|
70 |
## Training
|
71 |
### Training Procedure
|
72 |
### Data preparation
|
73 |
-
The data has been processed using the script [process_data.sh](https://huggingface.co/projecte-aina/tts-ca-coqui-vits-multispeaker/blob/main/data_processing/process_data.sh), which reduces the sampling frequency of the audios, eliminates silences, adds padding and structures the data in the format accepted by the framework. You can find more information [here](https://huggingface.co/projecte-aina/tts-ca-coqui-vits-multispeaker/blob/main/data_processing/README.md).
|
74 |
-
|
75 |
### Hyperparameter
|
76 |
|
77 |
The model is based on VITS proposed by [Kim et al](https://arxiv.org/abs/2106.06103). The following hyperparameters were set in the coqui framework.
|
@@ -106,7 +107,7 @@ The model was trained for 730962 steps.
|
|
106 |
## Additional information
|
107 |
|
108 |
### Author
|
109 |
-
|
110 |
|
111 |
### Contact information
|
112 |
For further information, send an email to aina@bsc.es
|
@@ -119,7 +120,7 @@ Copyright (c) 2022 Text Mining Unit at Barcelona Supercomputing Center
|
|
119 |
[Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
120 |
|
121 |
### Funding
|
122 |
-
This work was funded by the [Generalitat de Catalunya](https://politiquesdigitals.gencat.cat/ca/inici/index.html#googtrans(ca|en) within the framework of [Projecte AINA](https://
|
123 |
|
124 |
|
125 |
## Disclaimer
|
|
|
14 |
- pytorch
|
15 |
|
16 |
datasets:
|
17 |
+
- mozilla-foundation/common_voice_12_0
|
18 |
+
- projecte-aina/festcat_trimmed_denoised
|
19 |
+
- projecte-aina/openslr-slr69-ca-trimmed-denoised
|
20 |
|
21 |
---
|
22 |
|
23 |
# Aina Project's Catalan multi-speaker text-to-speech model
|
24 |
## Model description
|
25 |
|
26 |
+
This model was trained from scratch using the [Coqui TTS](https://github.com/coqui-ai/TTS) toolkit on a combination of 3 datasets: [Festcat](http://festcat.talp.cat/devel.php), [OpenSLR69](http://openslr.org/69/) and [Common Voice v12](https://commonvoice.mozilla.org/ca). For the training, we used 487 hours of recordings from 255 speakers. We have trimmed and denoised the data which all except Common Voice can be found in a seperate dataset in [festcat_trimmed_denoised](projecte-aina/festcat_trimmed_denoised) and [openslr69_trimmed_denoised](projecte-aina/openslr-slr69-ca-trimmed-denoised).
|
27 |
|
28 |
A live inference demo can be found in our spaces, [here](https://huggingface.co/spaces/projecte-aina/tts-ca-coqui-vits-multispeaker).
|
29 |
|
30 |
+
The model needs our fork of [espeak-ng](https://github.com/projecte-aina/espeak-ng) to work correctly. For installation and deployment please consult the docker file of our [inference demo](https://huggingface.co/spaces/projecte-aina/tts-ca-coqui-vits-multispeaker/blob/main/Dockerfile).
|
31 |
+
|
32 |
## Intended uses and limitations
|
33 |
|
34 |
You can use this model to generate synthetic speech in Catalan with different voices.
|
|
|
36 |
## How to use
|
37 |
### Usage
|
38 |
|
39 |
+
Required libraries:
|
40 |
|
41 |
```bash
|
42 |
pip install git+https://github.com/coqui-ai/TTS@dev#egg=TTS
|
|
|
73 |
## Training
|
74 |
### Training Procedure
|
75 |
### Data preparation
|
|
|
|
|
76 |
### Hyperparameter
|
77 |
|
78 |
The model is based on VITS proposed by [Kim et al](https://arxiv.org/abs/2106.06103). The following hyperparameters were set in the coqui framework.
|
|
|
107 |
## Additional information
|
108 |
|
109 |
### Author
|
110 |
+
Language Technologies Unit (LangTech) at the Barcelona Supercomputing Center
|
111 |
|
112 |
### Contact information
|
113 |
For further information, send an email to aina@bsc.es
|
|
|
120 |
[Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
121 |
|
122 |
### Funding
|
123 |
+
This work was funded by the [Generalitat de Catalunya](https://politiquesdigitals.gencat.cat/ca/inici/index.html#googtrans(ca|en) within the framework of [Projecte AINA](https://projecteaina.cat).
|
124 |
|
125 |
|
126 |
## Disclaimer
|
data_processing/README.md
DELETED
@@ -1,40 +0,0 @@
|
|
1 |
-
# Data preparation
|
2 |
-
|
3 |
-
Scripts to process [festcat](http://festcat.talp.cat/devel.php) and [google_tts](http://openslr.org/69/) datasets, to make them compatible with training of modern TTS architectures
|
4 |
-
|
5 |
-
## Requirements
|
6 |
-
`sox`, `ffmpeg`
|
7 |
-
|
8 |
-
### Processing steps
|
9 |
-
|
10 |
-
#### Downloads
|
11 |
-
Download [festcat](http://festcat.talp.cat/devel.php) and [google_tts](http://openslr.org/69/)
|
12 |
-
|
13 |
-
#### Variables definition
|
14 |
-
|
15 |
-
Open the shell script `.../data_processing/process_data.sh` and modify the following fields:
|
16 |
-
|
17 |
-
```bash
|
18 |
-
### Festcat variables ###
|
19 |
-
export PATH_TO_FESTCAT_SHELL='.../data_processing/festcat_processing_test.sh' # Absolute path to festcat_processing_test.sh script
|
20 |
-
export PATH_TO_FESTCAT_PY='.../data_processing/extract_festcat.py' # Absolute path to extract_festcat.py script
|
21 |
-
export PATH_TO_FESTCAT_DATA='.../festcat/' # Path to Festcat dataset
|
22 |
-
export FESTCAT_FINAL_PATH='.../festcat_processed' # Path where preprocessed Festcat will be stored
|
23 |
-
|
24 |
-
### Google_tts variables ###
|
25 |
-
export PATH_TO_GOOGLE_TTS_SHELL='.../data_processing/google_tts_processing_test.sh' # Absolute path to google_tts_processing_test.sh script
|
26 |
-
export PATH_TO_GOOGLE_TTS_PY='.../data_processing/extract_google_tts.py' # Absolute path to extract_google_tts.py script
|
27 |
-
export PATH_TO_GOOGLE_TTS_DATA='.../google_tts' # Path to Google TTS dataset
|
28 |
-
export GOOGLE_TTS_FINAL_PATH='.../google_tts_processed' # Path where preprocessed Google TTS will be stored
|
29 |
-
|
30 |
-
### General variables ###
|
31 |
-
export VCTK_FORMATER_PATH='.../data_processing/ca_multi2vckt.py' # Absolute path to ca_multi2vckt.py script
|
32 |
-
export FINAL_PATH='.../multispeaker_ca_test/' # Path where preprocessed and vctk formatted datasets will be stored.
|
33 |
-
```
|
34 |
-
#### Run preprocessing
|
35 |
-
|
36 |
-
Once the variables are correctly defined, execute the following command in the terminal:
|
37 |
-
|
38 |
-
`sh <...>/data_processing/process_data.sh`
|
39 |
-
|
40 |
-
The processed data in vctk format will be in the directory defined in `export FINAL_PATH='.../multispeaker_ca_test/'`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
data_processing/ca_multi2vckt.py
DELETED
@@ -1,152 +0,0 @@
|
|
1 |
-
import os
|
2 |
-
import re
|
3 |
-
import argparse
|
4 |
-
from glob import glob
|
5 |
-
from pathlib import Path
|
6 |
-
from subprocess import call
|
7 |
-
|
8 |
-
def main():
|
9 |
-
my_parser = argparse.ArgumentParser()
|
10 |
-
my_parser.add_argument('--google-path',
|
11 |
-
metavar='path',
|
12 |
-
type=str,
|
13 |
-
help='the path to tsv file')
|
14 |
-
my_parser.add_argument('--festcat-path',
|
15 |
-
metavar='path',
|
16 |
-
type=str,
|
17 |
-
help='the path to wavs file')
|
18 |
-
#my_parser.add_argument('--cv-path',
|
19 |
-
# metavar='path',
|
20 |
-
# type=str,
|
21 |
-
# help='the path to wavs file')
|
22 |
-
my_parser.add_argument('--final-path',
|
23 |
-
metavar='path',
|
24 |
-
type=str,
|
25 |
-
help='the path to wavs file')
|
26 |
-
args = my_parser.parse_args()
|
27 |
-
google_path = args.google_path
|
28 |
-
festcat_path = args.festcat_path
|
29 |
-
#common_voice_path = args.cv_path
|
30 |
-
target_base_path = args.final_path
|
31 |
-
|
32 |
-
google_tts_male = google_path + "/male/"
|
33 |
-
google_tts_female = google_path + "/female/"
|
34 |
-
google_tts_paths = [google_tts_male, google_tts_female]
|
35 |
-
|
36 |
-
#google_tts_paths = ["/gpfs/scratch/bsc88/bsc88858/google_tts/male/","/gpfs/scratch/bsc88/bsc88858/google_tts/female/"]
|
37 |
-
#festcat_path = "/gpfs/scratch/bsc88/bsc88858/festcat/"
|
38 |
-
#common_voice_path = "/gpfs/scratch/bsc88/bsc88858/cv-corpus-9.0-2022-04-27/ca/"
|
39 |
-
#target_base_path = "/gpfs/scratch/bsc88/bsc88474/data/multispeaker_ca/"
|
40 |
-
|
41 |
-
if os.path.exists(google_path):
|
42 |
-
print("Converting google_tts data to vctk format")
|
43 |
-
convert_google(google_tts_paths, target_base_path)
|
44 |
-
else:
|
45 |
-
print("Google_tts processed data not found")
|
46 |
-
|
47 |
-
if os.path.exists(festcat_path):
|
48 |
-
print("Converting festcat data to vctk format")
|
49 |
-
convert_festcat(festcat_path, target_base_path)
|
50 |
-
else:
|
51 |
-
print("Festcat processed data not found")
|
52 |
-
|
53 |
-
#convert_cv(common_voice_path, target_base_path)
|
54 |
-
|
55 |
-
def convert_google(google_tts_paths, target_base_path):
|
56 |
-
for g_path in google_tts_paths[:1]:
|
57 |
-
meta_files = glob(f"{g_path}/*_*.txt")
|
58 |
-
for meta_file in meta_files:
|
59 |
-
print(meta_file)
|
60 |
-
for line in open(meta_file).readlines():
|
61 |
-
text_id, text = line.strip().split('|')
|
62 |
-
text.replace('¿','')
|
63 |
-
text.replace('¡','')
|
64 |
-
#speaker_id = '_'.join(text_id.split('_')[:2])
|
65 |
-
speaker_id = text_id.split('_')[1]
|
66 |
-
target_text_file = os.path.join(target_base_path, 'txt',
|
67 |
-
speaker_id, text_id+'.txt')
|
68 |
-
target_wav_file = os.path.join(target_base_path, 'wav',
|
69 |
-
speaker_id, text_id+'.wav')
|
70 |
-
source_wav_file = os.path.join(g_path, 'wavs', text_id+'.wav')
|
71 |
-
|
72 |
-
speaker_paths = [os.path.dirname(target_text_file),
|
73 |
-
os.path.dirname(target_wav_file)]
|
74 |
-
|
75 |
-
convert_meta(target_text_file, target_wav_file,
|
76 |
-
source_wav_file, speaker_paths, text)
|
77 |
-
|
78 |
-
def convert_meta(target_text_file,
|
79 |
-
target_wav_file,
|
80 |
-
source_wav_file,
|
81 |
-
speaker_paths, text):
|
82 |
-
|
83 |
-
# create directories
|
84 |
-
for speaker_path in speaker_paths:
|
85 |
-
if not os.path.isdir(speaker_path):
|
86 |
-
os.mkdir(speaker_path)
|
87 |
-
|
88 |
-
# write text file
|
89 |
-
with open(target_text_file, 'w') as out:
|
90 |
-
out.write(text)
|
91 |
-
|
92 |
-
# copy wav file
|
93 |
-
try:
|
94 |
-
os.path.isfile(source_wav_file)
|
95 |
-
except:
|
96 |
-
raise IOError('{} does not exist'.format(source_wav_file))
|
97 |
-
|
98 |
-
cp_args = ['cp', source_wav_file, target_wav_file]
|
99 |
-
if not os.path.isfile(target_wav_file):
|
100 |
-
#print(' '.join(cp_args))
|
101 |
-
call(cp_args)
|
102 |
-
|
103 |
-
def convert_festcat(festcat_path, target_base_path):
|
104 |
-
meta_files = glob(f"{festcat_path}/*/*_train.txt")
|
105 |
-
for meta_file in meta_files:
|
106 |
-
speaker_name = meta_file.split(os.sep)[-2]
|
107 |
-
print(meta_file)
|
108 |
-
for line in open(meta_file).readlines():
|
109 |
-
if '[' not in line:
|
110 |
-
text_id, text = line.strip().split('|')
|
111 |
-
text.replace('¿','')
|
112 |
-
text.replace('¡','')
|
113 |
-
#speaker_id = '_'.join(text_id.split('_')[:3])
|
114 |
-
speaker_id = speaker_name
|
115 |
-
target_text_file = os.path.join(target_base_path, 'txt',
|
116 |
-
speaker_id, text_id+'.txt')
|
117 |
-
target_wav_file = os.path.join(target_base_path, 'wav',
|
118 |
-
speaker_id, text_id+'.wav')
|
119 |
-
source_wav_file = os.path.join(festcat_path, speaker_name,
|
120 |
-
'wavs', text_id+'.wav')
|
121 |
-
|
122 |
-
speaker_paths = [os.path.dirname(target_text_file),
|
123 |
-
os.path.dirname(target_wav_file)]
|
124 |
-
|
125 |
-
convert_meta(target_text_file, target_wav_file,
|
126 |
-
source_wav_file, speaker_paths, text)
|
127 |
-
else:
|
128 |
-
print('line: {} skipped'.format(line))
|
129 |
-
|
130 |
-
def convert_cv(common_voice_path, target_base_path):
|
131 |
-
meta_files = glob(f"{common_voice_path}/*.txt")
|
132 |
-
for meta_file in meta_files:
|
133 |
-
print(meta_file)
|
134 |
-
speaker_id = meta_file.split(os.sep)[-1].replace("ca_","").replace(".txt","")
|
135 |
-
for line in open(meta_file).readlines():
|
136 |
-
text_id, text = line.strip().split('|')
|
137 |
-
|
138 |
-
target_text_file = os.path.join(target_base_path, 'txt',
|
139 |
-
speaker_id, text_id+'.txt')
|
140 |
-
target_wav_file = os.path.join(target_base_path, 'wav',
|
141 |
-
speaker_id, text_id+'.wav')
|
142 |
-
source_wav_file = os.path.join(common_voice_path,
|
143 |
-
'wavs', text_id+'.wav')
|
144 |
-
|
145 |
-
speaker_paths = [os.path.dirname(target_text_file),
|
146 |
-
os.path.dirname(target_wav_file)]
|
147 |
-
|
148 |
-
convert_meta(target_text_file, target_wav_file,
|
149 |
-
source_wav_file, speaker_paths, text)
|
150 |
-
|
151 |
-
if __name__ == "__main__":
|
152 |
-
main()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
data_processing/extract_festcat.py
DELETED
@@ -1,139 +0,0 @@
|
|
1 |
-
import os
|
2 |
-
import re
|
3 |
-
import json
|
4 |
-
import subprocess
|
5 |
-
import argparse
|
6 |
-
import logging
|
7 |
-
|
8 |
-
logger = logging.getLogger(__name__)
|
9 |
-
|
10 |
-
def main():
|
11 |
-
my_parser = argparse.ArgumentParser()
|
12 |
-
my_parser.add_argument('--utterance-path',
|
13 |
-
metavar='path',
|
14 |
-
type=str,
|
15 |
-
help='the path to utterance file')
|
16 |
-
my_parser.add_argument('--wavs-path',
|
17 |
-
metavar='path',
|
18 |
-
type=str,
|
19 |
-
help='the path to wavs file')
|
20 |
-
my_parser.add_argument('--locutors',
|
21 |
-
metavar='N',
|
22 |
-
type=str,
|
23 |
-
help='list of speakers names/id separated with commas')
|
24 |
-
args = my_parser.parse_args()
|
25 |
-
locutors = args.locutors
|
26 |
-
locutors = locutors.replace(" ", "");
|
27 |
-
locutors = locutors.split(",")
|
28 |
-
utterance_path = args.utterance_path
|
29 |
-
wavs_path = args.wavs_path
|
30 |
-
|
31 |
-
for locutor in locutors:
|
32 |
-
# get durations
|
33 |
-
durations = get_durations_dict(wavs_path + '%s_sil_stats.csv'%locutor)
|
34 |
-
aggregate_duration = 0
|
35 |
-
rejected_duration = 0
|
36 |
-
large_duration = 0
|
37 |
-
total_duration = 0
|
38 |
-
path = 'upc_ca_%s_utt/utt'%locutor
|
39 |
-
path = utterance_path + path
|
40 |
-
|
41 |
-
files = []
|
42 |
-
long_files = []
|
43 |
-
for filename in os.listdir(path):
|
44 |
-
sentence = get_sentence(os.path.join(path, filename))
|
45 |
-
audio_filename = filename.replace('.utt','.wav') # upc_ca_pep_203479.wav
|
46 |
-
if sentence:
|
47 |
-
target_path = 'upc_ca_%s_wav_22k_sil_pad'%locutor
|
48 |
-
target_path = wavs_path + target_path
|
49 |
-
source_filename = 'upc_ca_%s_wav_22k_sil/'%locutor+audio_filename
|
50 |
-
source_filename = wavs_path + source_filename
|
51 |
-
total_duration += durations[audio_filename]
|
52 |
-
|
53 |
-
if os.path.isfile(source_filename):
|
54 |
-
if durations[audio_filename] < 10.0:
|
55 |
-
aggregate_duration += durations[audio_filename]
|
56 |
-
files.append((os.path.join(target_path,audio_filename), sentence))
|
57 |
-
#subprocess.call(['cp',source_filename, target_filename])
|
58 |
-
else:
|
59 |
-
long_files.append((audio_filename, sentence))
|
60 |
-
large_duration += durations[audio_filename]
|
61 |
-
else:
|
62 |
-
print(audio_filename)
|
63 |
-
else:
|
64 |
-
rejected_duration += durations[audio_filename]
|
65 |
-
out(args, locutor, files)
|
66 |
-
out_long(args, locutor, long_files)
|
67 |
-
out_long_json(args, locutor, long_files)
|
68 |
-
print(locutor, aggregate_duration/3600, 'hours')
|
69 |
-
print(locutor, 'rejected due to duration', large_duration/3600, 'hours')
|
70 |
-
print(locutor, 'rejected', rejected_duration/60, 'minutes')
|
71 |
-
print(locutor, total_duration, aggregate_duration+rejected_duration+large_duration)
|
72 |
-
|
73 |
-
def get_durations_dict(filename):
|
74 |
-
durations = {}
|
75 |
-
|
76 |
-
for line in open(filename).readlines():
|
77 |
-
d = line.split(',')
|
78 |
-
durations[d[0].split('/')[-1]] = float(d[1])
|
79 |
-
return durations
|
80 |
-
|
81 |
-
def get_sentence(filename):
|
82 |
-
utt_all = open(filename, encoding = "ISO-8859-1").read()
|
83 |
-
m = re.search('(\"\\\\\")(.+)(\\\\\"\")', utt_all)
|
84 |
-
sentence = m.groups()[1]
|
85 |
-
# delete interword dashes
|
86 |
-
sentence = re.sub('-(?=([A-Z]))', ' ', sentence)
|
87 |
-
if not re.search('\d', sentence):
|
88 |
-
return sentence
|
89 |
-
else:
|
90 |
-
#print(filename, sentence)
|
91 |
-
return None
|
92 |
-
|
93 |
-
def out(args, locutor, files):
|
94 |
-
|
95 |
-
outname_length = [('upc_%s_test.txt'%locutor,0),
|
96 |
-
('upc_%s_val.txt'%locutor,0),
|
97 |
-
('upc_%s_train.txt'%locutor,len(files))]
|
98 |
-
l_sum = sum([el[1] for el in outname_length])
|
99 |
-
if len(files) != l_sum:
|
100 |
-
msg = 'train vs test val distribution wrong: %i'%l_sum
|
101 |
-
raise ValueError('msg')
|
102 |
-
|
103 |
-
for fout, l in outname_length:
|
104 |
-
open((args.wavs_path + fout), mode= 'a').close()
|
105 |
-
logger.warning(f"fout: {fout}")
|
106 |
-
logger.warning(f"l: {l}")
|
107 |
-
logger.warning(f"Enable l: {len(files)-100}")
|
108 |
-
logger.warning(f"Files: {files}")
|
109 |
-
with open((args.wavs_path + fout), 'w') as out:
|
110 |
-
for i in range(l):
|
111 |
-
f, sentence = files.pop()
|
112 |
-
out.write('%s|%s\n'%(f.split("/")[-1].split(".")[-2],sentence))
|
113 |
-
|
114 |
-
def out_long(args, locutor, files):
|
115 |
-
outname = '%s_longsentences.csv'%locutor
|
116 |
-
outname_path = args.wavs_path + outname
|
117 |
-
open(outname_path, mode= 'a').close()
|
118 |
-
with open(outname_path, 'w') as out:
|
119 |
-
for audio, text in files:
|
120 |
-
out.write('%s,"%s"\n'%(audio, text))
|
121 |
-
|
122 |
-
def out_long_json(args, locutor, files):
|
123 |
-
outname = '%s_longsentences.json'%locutor
|
124 |
-
source = args.wavs_path +'upc_ca_%s_wav_22k_sil/'%locutor
|
125 |
-
outname_path = args.wavs_path + outname
|
126 |
-
open(outname_path, mode= 'a').close()
|
127 |
-
interventions = []
|
128 |
-
for audio, text in files:
|
129 |
-
intervention = {}
|
130 |
-
intervention['text'] = [(locutor, text)]
|
131 |
-
intervention['urls'] = [(locutor, os.path.join(source,audio))]
|
132 |
-
interventions.append(intervention)
|
133 |
-
|
134 |
-
with open(outname_path, 'w') as out:
|
135 |
-
json.dump({'session': interventions}, out, indent=2)
|
136 |
-
|
137 |
-
if __name__ == "__main__":
|
138 |
-
main()
|
139 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
data_processing/extract_google_tts.py
DELETED
@@ -1,168 +0,0 @@
|
|
1 |
-
import os
|
2 |
-
import re
|
3 |
-
import json
|
4 |
-
import argparse
|
5 |
-
import logging
|
6 |
-
import csv
|
7 |
-
import numpy as np
|
8 |
-
|
9 |
-
logger = logging.getLogger(__name__)
|
10 |
-
|
11 |
-
def main():
|
12 |
-
my_parser = argparse.ArgumentParser()
|
13 |
-
my_parser.add_argument('--tsv-path',
|
14 |
-
metavar='path',
|
15 |
-
type=str,
|
16 |
-
help='the path to tsv file')
|
17 |
-
my_parser.add_argument('--wavs-path',
|
18 |
-
metavar='path',
|
19 |
-
type=str,
|
20 |
-
help='the path to wavs file')
|
21 |
-
my_parser.add_argument('--locutors',
|
22 |
-
metavar='N',
|
23 |
-
type=str,
|
24 |
-
help='list of speakers names/id separated with commas')
|
25 |
-
args = my_parser.parse_args()
|
26 |
-
locutors = args.locutors
|
27 |
-
locutors = locutors.replace(" ", "");
|
28 |
-
locutors = locutors.split(",")
|
29 |
-
tsv_path = args.tsv_path
|
30 |
-
wavs_path = args.wavs_path
|
31 |
-
|
32 |
-
for locutor in locutors:
|
33 |
-
# get durations
|
34 |
-
durations = get_durations_dict(wavs_path + '%s_sil_stats.csv'%locutor)
|
35 |
-
aggregate_duration = 0
|
36 |
-
rejected_duration = 0
|
37 |
-
large_duration = 0
|
38 |
-
total_duration = 0
|
39 |
-
tsv_name = "line_index_%s.tsv"%locutor
|
40 |
-
tsv_path = tsv_path + tsv_name
|
41 |
-
|
42 |
-
tsv_file = open(tsv_path)
|
43 |
-
read_tsv = csv.reader(tsv_file, delimiter="\t")
|
44 |
-
files = []
|
45 |
-
long_files = []
|
46 |
-
for row in read_tsv:
|
47 |
-
audio_filename = row[0] + ".wav"
|
48 |
-
#logger.warning(f"Audio_filename {audio_filename}")
|
49 |
-
sentence = row[-1]
|
50 |
-
if sentence:
|
51 |
-
target_path = 'ca_es_%s_22k_sil_pad'%locutor
|
52 |
-
target_path = wavs_path + target_path
|
53 |
-
source_filename = 'ca_es_%s_22k_sil/'%locutor+audio_filename ###
|
54 |
-
source_filename = wavs_path + source_filename
|
55 |
-
#logger.warning(f"source_filename {source_filename}")
|
56 |
-
total_duration += durations[audio_filename]
|
57 |
-
if os.path.isfile(source_filename):
|
58 |
-
if durations[audio_filename] < 10.0:
|
59 |
-
aggregate_duration += durations[audio_filename]
|
60 |
-
files.append((os.path.join(target_path,audio_filename), sentence))
|
61 |
-
#subprocess.call(['cp',source_filename, target_filename])
|
62 |
-
else:
|
63 |
-
long_files.append((audio_filename, sentence))
|
64 |
-
large_duration += durations[audio_filename]
|
65 |
-
else:
|
66 |
-
print(audio_filename)
|
67 |
-
else:
|
68 |
-
rejected_duration += durations[audio_filename]
|
69 |
-
|
70 |
-
speakers_id = find_speakers_id(wavs_path + '%s_sil_stats.csv'%locutor)
|
71 |
-
for id in speakers_id:
|
72 |
-
speaker_file = files_spliter(files = files, speaker_id = id)
|
73 |
-
if len(speaker_file) == 0:
|
74 |
-
continue
|
75 |
-
else:
|
76 |
-
out(args, speaker_id = id, files = speaker_file)
|
77 |
-
#print(f"mv {wavs_path}ca_{id}_test.txt {wavs_path}{locutor}")
|
78 |
-
#os.system(f"mv {wavs_path}ca_{id}_test.txt {wavs_path}{locutor}")
|
79 |
-
#os.system(f"mv {wavs_path}ca_{id}_val.txt {wavs_path}{locutor}")
|
80 |
-
#os.system(f"mv {wavs_path}ca_{id}_train.txt {wavs_path}{locutor}")
|
81 |
-
#out(args, locutor, files)
|
82 |
-
out_long(args, locutor, long_files)
|
83 |
-
out_long_json(args, locutor, long_files)
|
84 |
-
print(locutor, aggregate_duration/3600, 'hours')
|
85 |
-
print(locutor, 'rejected due to duration', large_duration/3600, 'hours')
|
86 |
-
print(locutor, 'rejected', rejected_duration/60, 'minutes')
|
87 |
-
print(locutor, total_duration, aggregate_duration+rejected_duration+large_duration)
|
88 |
-
|
89 |
-
def get_durations_dict(filename):
|
90 |
-
durations = {}
|
91 |
-
for line in open(filename).readlines():
|
92 |
-
d = line.split(',')
|
93 |
-
durations[d[0].split('/')[-1]] = float(d[1])
|
94 |
-
return durations
|
95 |
-
|
96 |
-
def get_sentence(filename):
|
97 |
-
utt_all = open(filename, encoding = "ISO-8859-1").read()
|
98 |
-
m = re.search('(\"\\\\\")(.+)(\\\\\"\")', utt_all)
|
99 |
-
sentence = m.groups()[1]
|
100 |
-
# delete interword dashes
|
101 |
-
sentence = re.sub('-(?=([A-Z]))', ' ', sentence)
|
102 |
-
if not re.search('\d', sentence):
|
103 |
-
return sentence
|
104 |
-
else:
|
105 |
-
print(filename, sentence)
|
106 |
-
return None
|
107 |
-
|
108 |
-
def out(args, speaker_id, files):
|
109 |
-
outname_length = [('ca_%s_test.txt'%speaker_id,0),
|
110 |
-
('ca_%s_val.txt'%speaker_id,0),
|
111 |
-
('ca_%s_train.txt'%speaker_id,len(files))]
|
112 |
-
l_sum = sum([el[1] for el in outname_length])
|
113 |
-
if len(files) != l_sum:
|
114 |
-
msg = 'train vs test val distribution wrong: %i'%l_sum
|
115 |
-
raise ValueError('msg')
|
116 |
-
|
117 |
-
for fout, l in outname_length:
|
118 |
-
open((args.wavs_path + fout), mode= 'a').close()
|
119 |
-
with open((args.wavs_path + fout), 'w') as out:
|
120 |
-
for i in range(l):
|
121 |
-
f, sentence = files.pop()
|
122 |
-
out.write('%s|%s\n'%(f.split("/")[-1].split(".")[-2],sentence))
|
123 |
-
print(len(files))
|
124 |
-
|
125 |
-
def out_long(args, locutor, files):
|
126 |
-
outname = '%s_longsentences.csv'%locutor
|
127 |
-
outname_path = args.wavs_path + outname
|
128 |
-
open(outname_path, mode= 'a').close()
|
129 |
-
with open(outname_path, 'w') as out:
|
130 |
-
for audio, text in files:
|
131 |
-
out.write('%s,"%s"\n'%(audio, text))
|
132 |
-
|
133 |
-
def out_long_json(args, locutor, files):
|
134 |
-
outname = '%s_longsentences.json'%locutor
|
135 |
-
source = args.wavs_path +'ca_es_%s_22k_sil/'%locutor
|
136 |
-
outname_path = args.wavs_path + outname
|
137 |
-
open(outname_path, mode= 'a').close()
|
138 |
-
interventions = []
|
139 |
-
for audio, text in files:
|
140 |
-
intervention = {}
|
141 |
-
intervention['text'] = [(locutor, text)]
|
142 |
-
intervention['urls'] = [(locutor, os.path.join(source,audio))]
|
143 |
-
interventions.append(intervention)
|
144 |
-
|
145 |
-
with open(outname_path, 'w') as out:
|
146 |
-
json.dump({'session': interventions}, out, indent=2)
|
147 |
-
|
148 |
-
def find_speakers_id(path_tsv):
|
149 |
-
durations = {}
|
150 |
-
for line in open(path_tsv).readlines():
|
151 |
-
d = line.split(',')
|
152 |
-
durations[d[0].split('/')[-1]] = float(d[1])
|
153 |
-
keysList = list(durations.keys())
|
154 |
-
for index in range(len(keysList)):
|
155 |
-
keysList[index] = keysList[index].split("_")[1]
|
156 |
-
keysList = np.ndarray.tolist(np.unique(np.array(keysList)))
|
157 |
-
return keysList
|
158 |
-
|
159 |
-
def files_spliter(files, speaker_id):
|
160 |
-
out_file = []
|
161 |
-
for element in files:
|
162 |
-
if element[0].split("/")[-1].split("_")[1] == speaker_id:
|
163 |
-
out_file.append(element)
|
164 |
-
return out_file
|
165 |
-
|
166 |
-
if __name__ == "__main__":
|
167 |
-
main()
|
168 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
data_processing/festcat_processing_test.sh
DELETED
@@ -1,152 +0,0 @@
|
|
1 |
-
#!/bin/sh
|
2 |
-
|
3 |
-
|
4 |
-
export FINAL_PATH=$1
|
5 |
-
export SOURCE_PATH=$2
|
6 |
-
export EXTRACT_PATH=$3
|
7 |
-
|
8 |
-
|
9 |
-
module load gcc/8.3.0 cuda/10.2 cudnn/7.6.4 nccl/2.4.8 tensorrt/6.0.1 openmpi/4.0.1 atlas scalapack/2.0.2 fftw/3.3.8 szip/2.1.1 ffmpeg/4.2.1 opencv/4.1.1 szip/2.1.1 ffmpeg/4.2.1 opencv/4.1.1 python/3.7.4_ML torch/1.9.0a0 fairseq/2021-10-04 llvm/10.0.1 mecab/0.996
|
10 |
-
|
11 |
-
for name in bet eli eva jan mar ona pau pep pol teo uri
|
12 |
-
do
|
13 |
-
echo "Processing $name data"
|
14 |
-
export SPEAKER_NAME=$name
|
15 |
-
export OUTPUT_CSV="${FINAL_PATH}/${SPEAKER_NAME}/${SPEAKER_NAME}_sil_stats.csv"
|
16 |
-
export UTTERANCE_PATH="${SOURCE_PATH}/${SPEAKER_NAME}/"
|
17 |
-
|
18 |
-
if [ -d "${FINAL_PATH}" ]; then
|
19 |
-
### Take action if $DIR exists ###
|
20 |
-
echo "Path ${FINAL_PATH} already created"
|
21 |
-
else
|
22 |
-
### Control will jump here if $DIR does NOT exists ###
|
23 |
-
mkdir ${FINAL_PATH}
|
24 |
-
echo "Crating: ${FINAL_PATH} "
|
25 |
-
fi
|
26 |
-
|
27 |
-
if [ -d "${FINAL_PATH}/${SPEAKER_NAME}" ]; then
|
28 |
-
### Take action if $DIR exists ###
|
29 |
-
echo "Path ${FINAL_PATH}/${SPEAKER_NAME} already created"
|
30 |
-
else
|
31 |
-
### Control will jump here if $DIR does NOT exists ###
|
32 |
-
mkdir ${FINAL_PATH}/${SPEAKER_NAME}
|
33 |
-
echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME} "
|
34 |
-
fi
|
35 |
-
|
36 |
-
|
37 |
-
if [ -d "${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav" ]; then
|
38 |
-
### Take action if $DIR exists ###
|
39 |
-
echo "Path ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav already created"
|
40 |
-
else
|
41 |
-
### Control will jump here if $DIR does NOT exists ###
|
42 |
-
mkdir ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav
|
43 |
-
echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav "
|
44 |
-
fi
|
45 |
-
|
46 |
-
|
47 |
-
if [ -z "$(ls -A ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav/)" ]; then
|
48 |
-
i=1
|
49 |
-
sp="/-\|"
|
50 |
-
for f in ${SOURCE_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_raw/recordings/*.raw; do
|
51 |
-
t=${f%.raw}.wav; g=${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav/${t##*/}; sox -t raw -r 48k -e signed -b 16 -c 1 $f $g;
|
52 |
-
printf "\r Converiting .raw audios to .wav ${sp:i++%${#sp}:1}"
|
53 |
-
sleep 0.05
|
54 |
-
done
|
55 |
-
else
|
56 |
-
echo "Already converted to .wav"
|
57 |
-
fi
|
58 |
-
|
59 |
-
if [ -d "${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k" ]; then
|
60 |
-
### Take action if $DIR exists ###
|
61 |
-
echo "Path ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k already created"
|
62 |
-
else
|
63 |
-
### Control will jump here if $DIR does NOT exists ###
|
64 |
-
mkdir ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k
|
65 |
-
echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k "
|
66 |
-
fi
|
67 |
-
|
68 |
-
if [ -z "$(ls -A ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k/)" ]; then
|
69 |
-
i=1
|
70 |
-
sp="/-\|"
|
71 |
-
for f in ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav/*.wav; do
|
72 |
-
t=${f##*/}; ffmpeg -i $f -ar 22050 ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k/$t -v error < /dev/null;
|
73 |
-
printf "\r Converiting audios of 48kHz to 22kHz ${sp:i++%${#sp}:1}"
|
74 |
-
sleep 0.05
|
75 |
-
done;
|
76 |
-
else
|
77 |
-
echo "Already converted to 22kHz file"
|
78 |
-
fi
|
79 |
-
|
80 |
-
if [ -d "${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil" ]; then
|
81 |
-
### Take action if $DIR exists ###
|
82 |
-
echo "Path ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil already created"
|
83 |
-
else
|
84 |
-
### Control will jump here if $DIR does NOT exists ###
|
85 |
-
mkdir ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil
|
86 |
-
echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil "
|
87 |
-
fi
|
88 |
-
|
89 |
-
if [ -z "$(ls -A ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil/)" ]; then
|
90 |
-
i=1
|
91 |
-
sp="/-\|"
|
92 |
-
for f in ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k/*.wav; do
|
93 |
-
t=${f##*/}; sox $f ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil/$t silence 1 0.02 0.5% reverse silence 1 0.02 0.5% reverse;
|
94 |
-
printf "\r Filtering silence ${sp:i++%${#sp}:1}"
|
95 |
-
sleep 0.05
|
96 |
-
done
|
97 |
-
else
|
98 |
-
echo "Silence already eliminated"
|
99 |
-
fi
|
100 |
-
|
101 |
-
if [ -f "${OUTPUT_CSV}" ]; then
|
102 |
-
### Take action if $DIR exists ###
|
103 |
-
echo "${OUTPUT_CSV} already exists!"
|
104 |
-
else
|
105 |
-
### Control will jump here if $DIR does NOT exists ###
|
106 |
-
echo "Crating ${OUTPUT_CSV}"
|
107 |
-
for f in ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil/*.wav; do
|
108 |
-
d=`ffprobe -i $f -show_entries format=duration -v quiet -of csv="p=0"`;
|
109 |
-
echo $f,$d;
|
110 |
-
done >> ${OUTPUT_CSV}
|
111 |
-
fi
|
112 |
-
|
113 |
-
if [ -f "${FINAL_PATH}/${SPEAKER_NAME}/upc_${SPEAKER_NAME}_train.txt" ]; then
|
114 |
-
### Take action if $DIR exists ###
|
115 |
-
echo "Splits already created!"
|
116 |
-
else
|
117 |
-
### Control will jump here if $DIR does NOT exists ###
|
118 |
-
echo "Crating splits..."
|
119 |
-
python ${EXTRACT_PATH} --wavs-path ${FINAL_PATH}/${SPEAKER_NAME}/ --utterance-path ${UTTERANCE_PATH} --locutors ${SPEAKER_NAME}
|
120 |
-
fi
|
121 |
-
|
122 |
-
if [ -d "${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil_pad" ]; then
|
123 |
-
### Take action if $DIR exists ###
|
124 |
-
echo "Path ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil_pad already created"
|
125 |
-
else
|
126 |
-
### Control will jump here if $DIR does NOT exists ###
|
127 |
-
mkdir ${FINAL_PATH}/${SPEAKER_NAME}/wavs
|
128 |
-
echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME}/wavs"
|
129 |
-
fi
|
130 |
-
|
131 |
-
if [ -z "$(ls -A ${FINAL_PATH}/${SPEAKER_NAME}/wavs/)" ]; then
|
132 |
-
i=1
|
133 |
-
sp="/-\|"
|
134 |
-
for f in ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil/*.wav; do
|
135 |
-
t=${f##*/}; sox $f ${FINAL_PATH}/${SPEAKER_NAME}/wavs/$t pad 0 0.058;
|
136 |
-
printf "\r Adding pad ${sp:i++%${#sp}:1}"
|
137 |
-
sleep 0.05
|
138 |
-
done
|
139 |
-
else
|
140 |
-
echo "Pad already added!"
|
141 |
-
fi
|
142 |
-
|
143 |
-
rm -r ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil
|
144 |
-
rm -r ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k
|
145 |
-
rm -r ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav
|
146 |
-
|
147 |
-
done
|
148 |
-
echo "Done!"
|
149 |
-
|
150 |
-
|
151 |
-
|
152 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
data_processing/google_tts_processing_test.sh
DELETED
@@ -1,124 +0,0 @@
|
|
1 |
-
#!/bin/sh
|
2 |
-
|
3 |
-
|
4 |
-
export FINAL_PATH=$1
|
5 |
-
export SOURCE_PATH=$2
|
6 |
-
export EXTRACT_PATH=$3
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
module load gcc/8.3.0 cuda/10.2 cudnn/7.6.4 nccl/2.4.8 tensorrt/6.0.1 openmpi/4.0.1 atlas scalapack/2.0.2 fftw/3.3.8 szip/2.1.1 ffmpeg/4.2.1 opencv/4.1.1 szip/2.1.1 ffmpeg/4.2.1 opencv/4.1.1 python/3.7.4_ML torch/1.9.0a0 fairseq/2021-10-04 llvm/10.0.1 mecab/0.996
|
11 |
-
|
12 |
-
for name in male female
|
13 |
-
do
|
14 |
-
export SPEAKER_NAME=$name
|
15 |
-
export OUTPUT_CSV="${FINAL_PATH}/${SPEAKER_NAME}/${SPEAKER_NAME}_sil_stats.csv"
|
16 |
-
export UTTERANCE_PATH="${SOURCE_PATH}/${SPEAKER_NAME}/"
|
17 |
-
|
18 |
-
if [ -d "${FINAL_PATH}" ]; then
|
19 |
-
### Take action if $DIR exists ###
|
20 |
-
echo "Path ${FINAL_PATH} already created"
|
21 |
-
else
|
22 |
-
### Control will jump here if $DIR does NOT exists ###
|
23 |
-
mkdir ${FINAL_PATH}
|
24 |
-
echo "Crating: ${FINAL_PATH} "
|
25 |
-
fi
|
26 |
-
|
27 |
-
if [ -d "${FINAL_PATH}/${SPEAKER_NAME}" ]; then
|
28 |
-
### Take action if $DIR exists ###
|
29 |
-
echo "Path ${FINAL_PATH}/${SPEAKER_NAME} already created"
|
30 |
-
else
|
31 |
-
### Control will jump here if $DIR does NOT exists ###
|
32 |
-
mkdir ${FINAL_PATH}/${SPEAKER_NAME}
|
33 |
-
echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME} "
|
34 |
-
fi
|
35 |
-
|
36 |
-
if [ -d "${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k" ]; then
|
37 |
-
### Take action if $DIR exists ###
|
38 |
-
echo "Path ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k already created"
|
39 |
-
else
|
40 |
-
### Control will jump here if $DIR does NOT exists ###
|
41 |
-
mkdir ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k
|
42 |
-
echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k "
|
43 |
-
fi
|
44 |
-
|
45 |
-
if [ -z "$(ls -A ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k/)" ]; then
|
46 |
-
i=1
|
47 |
-
sp="/-\|"
|
48 |
-
for f in ${SOURCE_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}/*.wav; do
|
49 |
-
t=${f##*/}; ffmpeg -i $f -ar 22050 ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k/$t -v error < /dev/null;
|
50 |
-
printf "\r Converiting audios of 48kHz to 22kHz ${sp:i++%${#sp}:1}"
|
51 |
-
sleep 0.05
|
52 |
-
done;
|
53 |
-
else
|
54 |
-
echo "Already converted to 22kHz file"
|
55 |
-
fi
|
56 |
-
|
57 |
-
if [ -d "${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil" ]; then
|
58 |
-
### Take action if $DIR exists ###
|
59 |
-
echo "Path ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil already created"
|
60 |
-
else
|
61 |
-
### Control will jump here if $DIR does NOT exists ###
|
62 |
-
mkdir ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil
|
63 |
-
echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil "
|
64 |
-
fi
|
65 |
-
|
66 |
-
if [ -z "$(ls -A ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil/)" ]; then
|
67 |
-
i=1
|
68 |
-
sp="/-\|"
|
69 |
-
for f in ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k/*.wav; do
|
70 |
-
t=${f##*/}; sox $f ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil/$t silence 1 0.02 0.5% reverse silence 1 0.02 0.5% reverse;
|
71 |
-
printf "\r Filtering silence ${sp:i++%${#sp}:1}"
|
72 |
-
sleep 0.05
|
73 |
-
done
|
74 |
-
else
|
75 |
-
echo "Silence has already been filtered!"
|
76 |
-
fi
|
77 |
-
|
78 |
-
if [ -f "${OUTPUT_CSV}" ]; then
|
79 |
-
### Take action if $DIR exists ###
|
80 |
-
echo "${OUTPUT_CSV} already exists!"
|
81 |
-
else
|
82 |
-
### Control will jump here if $DIR does NOT exists ###
|
83 |
-
echo "Crating ${OUTPUT_CSV}"
|
84 |
-
for f in ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil/*.wav; do
|
85 |
-
d=`ffprobe -i $f -show_entries format=duration -v quiet -of csv="p=0"`;
|
86 |
-
echo $f,$d;
|
87 |
-
done >> ${OUTPUT_CSV}
|
88 |
-
fi
|
89 |
-
|
90 |
-
if [ -f "${FINAL_PATH}/${SPEAKER_NAME}/ca_01591_train.txt" ]; then
|
91 |
-
### Take action if $DIR exists ###
|
92 |
-
echo "Splits already created!"
|
93 |
-
else
|
94 |
-
### Control will jump here if $DIR does NOT exists ###
|
95 |
-
echo "Crating splits..."
|
96 |
-
python ${EXTRACT_PATH} --wavs-path ${FINAL_PATH}/${SPEAKER_NAME}/ --tsv-path ${SOURCE_PATH}/${SPEAKER_NAME}/ --locutors ${SPEAKER_NAME}
|
97 |
-
fi
|
98 |
-
|
99 |
-
if [ -d "${FINAL_PATH}/${SPEAKER_NAME}/wavs" ]; then
|
100 |
-
### Take action if $DIR exists ###
|
101 |
-
echo "Path ${FINAL_PATH}/${SPEAKER_NAME}/wavs"
|
102 |
-
else
|
103 |
-
### Control will jump here if $DIR does NOT exists ###
|
104 |
-
mkdir ${FINAL_PATH}/${SPEAKER_NAME}/wavs
|
105 |
-
echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME}/wavs"
|
106 |
-
fi
|
107 |
-
|
108 |
-
if [ -z "$(ls -A ${FINAL_PATH}/${SPEAKER_NAME}/wavs/)" ]; then
|
109 |
-
i=1
|
110 |
-
sp="/-\|"
|
111 |
-
for f in ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil/*.wav; do
|
112 |
-
t=${f##*/}; sox $f ${FINAL_PATH}/${SPEAKER_NAME}/wavs/$t pad 0 0.058;
|
113 |
-
printf "\r Adding pad ${sp:i++%${#sp}:1}"
|
114 |
-
sleep 0.05
|
115 |
-
done
|
116 |
-
else
|
117 |
-
echo "Pad already added!"
|
118 |
-
fi
|
119 |
-
|
120 |
-
rm -r ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil
|
121 |
-
rm -r ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k
|
122 |
-
|
123 |
-
done
|
124 |
-
echo "Done!"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
data_processing/process_data.sh
DELETED
@@ -1,56 +0,0 @@
|
|
1 |
-
#!/bin/bash
|
2 |
-
|
3 |
-
### Festcat variables ###
|
4 |
-
export PATH_TO_FESTCAT_SHELL='/gpfs/scratch/bsc88/bsc88858/data_processing/festcat_processing_test.sh'
|
5 |
-
export PATH_TO_FESTCAT_PY='/gpfs/scratch/bsc88/bsc88858/data_processing/extract_festcat.py'
|
6 |
-
export PATH_TO_FESTCAT_DATA='/gpfs/scratch/bsc88/bsc88858/festcat/'
|
7 |
-
export FESTCAT_FINAL_PATH='/gpfs/scratch/bsc88/bsc88858/festcat_processed'
|
8 |
-
|
9 |
-
### Google_tts variables ###
|
10 |
-
export PATH_TO_GOOGLE_TTS_SHELL='/gpfs/scratch/bsc88/bsc88858/data_processing/google_tts_processing_test.sh'
|
11 |
-
export PATH_TO_GOOGLE_TTS_PY='/gpfs/scratch/bsc88/bsc88858/data_processing/extract_google_tts.py'
|
12 |
-
export PATH_TO_GOOGLE_TTS_DATA='/gpfs/scratch/bsc88/bsc88858/google_tts'
|
13 |
-
export GOOGLE_TTS_FINAL_PATH='/gpfs/scratch/bsc88/bsc88858/google_tts_processed'
|
14 |
-
|
15 |
-
### General variables ###
|
16 |
-
export VCTK_FORMATER_PATH='/gpfs/scratch/bsc88/bsc88858/data_processing/ca_multi2vckt.py'
|
17 |
-
export FINAL_PATH='/gpfs/scratch/bsc88/bsc88858/multispeaker_ca_test/'
|
18 |
-
|
19 |
-
|
20 |
-
if [ -d "${FESTCAT_FINAL_PATH}" ]; then
|
21 |
-
### Take action if $DIR exists ###
|
22 |
-
echo "Path ${FESTCAT_FINAL_PATH} already exists"
|
23 |
-
else
|
24 |
-
### Control will jump here if $DIR does NOT exists ###
|
25 |
-
if [ -d "${PATH_TO_FESTCAT_DATA}" ]; then
|
26 |
-
source ${PATH_TO_FESTCAT_SHELL} ${FESTCAT_FINAL_PATH} ${PATH_TO_FESTCAT_DATA} ${PATH_TO_FESTCAT_PY}
|
27 |
-
else
|
28 |
-
echo "Fescat data not found!"
|
29 |
-
fi
|
30 |
-
fi
|
31 |
-
|
32 |
-
if [ -d "${GOOGLE_TTS_FINAL_PATH}" ]; then
|
33 |
-
### Take action if $DIR exists ###
|
34 |
-
echo "Path ${GOOGLE_TTS_FINAL_PATH} already exists"
|
35 |
-
else
|
36 |
-
### Control will jump here if $DIR does NOT exists ###
|
37 |
-
if [ -d "${PATH_TO_GOOGLE_TTS_DATA}" ]; then
|
38 |
-
source ${PATH_TO_GOOGLE_TTS_SHELL} ${GOOGLE_TTS_FINAL_PATH} ${PATH_TO_GOOGLE_TTS_DATA} ${PATH_TO_GOOGLE_TTS_PY}
|
39 |
-
else
|
40 |
-
echo "Google TTS data not found!"
|
41 |
-
fi
|
42 |
-
fi
|
43 |
-
|
44 |
-
if [ -d "${FINAL_PATH}" ]; then
|
45 |
-
### Take action if $DIR exists ###
|
46 |
-
echo "Path ${FINAL_PATH} already created"
|
47 |
-
else
|
48 |
-
### Control will jump here if $DIR does NOT exists ###
|
49 |
-
mkdir ${FINAL_PATH}
|
50 |
-
mkdir ${FINAL_PATH}/txt/
|
51 |
-
mkdir ${FINAL_PATH}/wav/
|
52 |
-
echo "Crating: ${FINAL_PATH}"
|
53 |
-
python ${VCTK_FORMATER_PATH} --google-path ${GOOGLE_TTS_FINAL_PATH} --festcat-path ${FESTCAT_FINAL_PATH} --final-path ${FINAL_PATH}
|
54 |
-
fi
|
55 |
-
|
56 |
-
echo "Done!"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
model/best_model.pth
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:7281afd683f92a46feb9068f5dcd96038b0b64b453deee25d147064b34e2dbcf
|
3 |
+
size 1040801013
|
model/config.json
CHANGED
Binary files a/model/config.json and b/model/config.json differ
|
|