File size: 21,854 Bytes
adc9aa0 2d3f123 adc9aa0 2d3f123 adc9aa0 d8c7361 adc9aa0 d8c7361 adc9aa0 d8c7361 adc9aa0 2d3f123 adc9aa0 7261829 adc9aa0 d8c7361 adc9aa0 d8c7361 adc9aa0 d8c7361 adc9aa0 d8c7361 adc9aa0 d8c7361 adc9aa0 d8c7361 adc9aa0 d8c7361 adc9aa0 bfa2a0a 5326dd8 bfa2a0a 5326dd8 bfa2a0a 5326dd8 bfa2a0a 5326dd8 bfa2a0a 5326dd8 bfa2a0a 5326dd8 bfa2a0a 5326dd8 bfa2a0a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 |
---
language:
- eo
license: apache-2.0
tags:
- automatic-speech-recognition
- mozilla-foundation/common_voice_13_0
- generated_from_trainer
metrics:
- wer
model-index:
- name: wav2vec2-common_voice_13_0-eo-3
results: []
---
# wav2vec2-common_voice_13_0-eo-3, an Esperanto speech recognizer
This model is a fine-tuned version of [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on the [mozilla-foundation/common_voice_13_0](https://huggingface.co/datasets/mozilla-foundation/common_voice_13_0) Esperanto dataset.
It achieves the following results on the evaluation set:
- Loss: 0.2191
- Cer: 0.0208
- Wer: 0.0687
The first 10 samples in the test set:
| Actual<br>Predicted | CER |
|:--------------------|:----|
| `la orienta parto apud benino kaj niĝerio estis nomita sklavmarbordo`<br>`la orienta parto apud benino kaj niĝerio estis nomita sklavmarbordo` | 0.0 |
| `en la sekva jaro li ricevis premion`<br>`en la sekva jaro li ricevis prenion` | 0.02857142857142857 |
| `ŝi studis historion ĉe la universitato de brita kolumbio`<br>`ŝi studis historion ĉe la universitato de brita kolumbio` | 0.0 |
| `larĝaj ŝtupoj kuras al la fasado`<br>`larĝaj ŝtupoj kuras al la fasado` | 0.0 |
| `la municipo ĝuas duan epokon de etendo kaj disvolviĝo`<br>`la municipo ĝuas duonepokon de tendo kaj disvolviĝo` | 0.05660377358490566 |
| `li estis ankaŭ katedrestro kaj dekano`<br>`li estis ankaŭ katedresto kaj dekano` | 0.02702702702702703 |
| `librovendejo apartenas al la muzeo`<br>`librovendejo apartenas al la muzeo` | 0.0 |
| `ĝi estas kutime malfacile videbla kaj troviĝas en subkreskaĵaro de arbaroj`<br>`ĝi estas kutime malfacile videbla kaj troviĝas en subkreskaĵo de arbaroj` | 0.02702702702702703 |
| `unue ili estas ruĝaj poste brunaj`<br>`unue ili estas ruĝaj poste brunaj` | 0.0 |
| `la loĝantaro laboras en la proksima ĉefurbo`<br>`la loĝantaro laboras en la proksima ĉefurbo` | 0.0 |
## Model description
See [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53).
## Intended uses & limitations
Speech recognition for Esperanto. The base model was pretrained and finetuned on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16KHz.
## Training and evaluation data
The training split was set to `train[:15000]` while the eval split was set to `validation[:1500]`.
## Training procedure
I used [`run_speech_recognition_ctc.py`](https://github.com/huggingface/transformers/tree/main/examples/pytorch/speech-recognition) with the following `train.json` file passed to it:
```json
{
"dataset_name": "mozilla-foundation/common_voice_13_0",
"model_name_or_path": "facebook/wav2vec2-large-xlsr-53",
"dataset_config_name": "eo",
"output_dir": "./wav2vec2-common_voice_13_0-eo-3",
"train_split_name": "train[:15000]",
"eval_split_name": "validation[:1500]",
"eval_metrics": ["cer", "wer"],
"overwrite_output_dir": true,
"preprocessing_num_workers": 8,
"num_train_epochs": 100,
"per_device_train_batch_size": 8,
"gradient_accumulation_steps": 4,
"gradient_checkpointing": true,
"learning_rate": 3e-5,
"warmup_steps": 500,
"evaluation_strategy": "steps",
"text_column_name": "sentence",
"length_column_name": "input_length",
"save_steps": 1000,
"eval_steps": 1000,
"layerdrop": 0.1,
"save_total_limit": 3,
"freeze_feature_encoder": true,
"chars_to_ignore": "-!\"'(),.:;=?_`¨«¸»ʼ‑–—‘’“”„…‹›♫?",
"chars_to_substitute": {
"przy": "pŝe",
"byn": "bin",
"cx": "ĉ",
"sx": "ŝ",
"fi": "fi",
"fl": "fl",
"ǔ": "ŭ",
"ñ": "nj",
"á": "a",
"é": "e",
"ü": "ŭ",
"y": "j",
"qu": "ku"
},
"fp16": true,
"group_by_length": true,
"push_to_hub": true,
"do_train": true,
"do_eval": true
}
```
I went through the dataset to find non-speech characters, and these were placed in `chars_to_ignore`. In addition, there were character sequences that could be transcribed to Esperanto phonemes, and these were placed as a dictionary in `chars_to_substitute`. This required adding such an argument to the program:
```py
def dict_field(default=None, metadata=None):
return field(default_factory=lambda: default, metadata=metadata)
@dataclass
class DataTrainingArguments:
...
chars_to_substitute: Optional[Dict[str, str]] = dict_field(
default=None,
metadata={"help": "A dict of characters to replace."},
)
```
Then I copied `remove_special_characters` to do the actual substitution:
```py
def remove_special_characters(batch):
text = batch[text_column_name]
if chars_to_ignore_regex is not None:
text = re.sub(chars_to_ignore_regex, "", batch[text_column_name])
batch["target_text"] = text.lower() + " "
return batch
def substitute_characters(batch):
text: str = batch["target_text"]
if data_args.chars_to_substitute is not None:
for k, v in data_args.chars_to_substitute.items():
text.replace(k, v)
batch["target_text"] = text.lower()
return batch
with training_args.main_process_first(desc="dataset map special characters removal"):
raw_datasets = raw_datasets.map(
remove_special_characters,
remove_columns=[text_column_name],
desc="remove special characters from datasets",
)
with training_args.main_process_first(desc="dataset map special characters substitute"):
raw_datasets = raw_datasets.map(
substitute_characters,
desc="substitute special characters in datasets",
)
```
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- layerdrop: 0.1
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 100
- mixed_precision_training: Native AMP
### Training results
| Training Loss | Epoch | Step | Cer | Validation Loss | Wer |
|:-------------:|:-----:|:-----:|:------:|:---------------:|:------:|
| 2.6416 | 2.13 | 1000 | 0.1541 | 0.8599 | 0.6449 |
| 0.2633 | 4.27 | 2000 | 0.0335 | 0.1897 | 0.1431 |
| 0.1739 | 6.4 | 3000 | 0.0289 | 0.1732 | 0.1145 |
| 0.1378 | 8.53 | 4000 | 0.0276 | 0.1729 | 0.1066 |
| 0.1172 | 10.67 | 5000 | 0.0268 | 0.1773 | 0.1019 |
| 0.1049 | 12.8 | 6000 | 0.0255 | 0.1701 | 0.0937 |
| 0.0951 | 14.93 | 7000 | 0.0253 | 0.1718 | 0.0933 |
| 0.0851 | 17.07 | 8000 | 0.0239 | 0.1787 | 0.0834 |
| 0.0809 | 19.2 | 9000 | 0.0235 | 0.1802 | 0.0835 |
| 0.0756 | 21.33 | 10000 | 0.0239 | 0.1784 | 0.0855 |
| 0.0708 | 23.47 | 11000 | 0.0235 | 0.1748 | 0.0824 |
| 0.0657 | 25.6 | 12000 | 0.0228 | 0.1830 | 0.0796 |
| 0.0605 | 27.73 | 13000 | 0.0230 | 0.1896 | 0.0798 |
| 0.0583 | 29.87 | 14000 | 0.0224 | 0.1889 | 0.0778 |
| 0.0608 | 32.0 | 15000 | 0.0223 | 0.1849 | 0.0757 |
| 0.0556 | 34.13 | 16000 | 0.0223 | 0.1872 | 0.0767 |
| 0.0534 | 36.27 | 17000 | 0.0221 | 0.1893 | 0.0751 |
| 0.0523 | 38.4 | 18000 | 0.0218 | 0.1925 | 0.0729 |
| 0.0494 | 40.53 | 19000 | 0.0221 | 0.1957 | 0.0745 |
| 0.0475 | 42.67 | 20000 | 0.0217 | 0.1961 | 0.0740 |
| 0.048 | 44.8 | 21000 | 0.0214 | 0.1957 | 0.0714 |
| 0.0459 | 46.93 | 22000 | 0.0215 | 0.1968 | 0.0717 |
| 0.0435 | 49.07 | 23000 | 0.0217 | 0.2008 | 0.0717 |
| 0.0428 | 51.2 | 24000 | 0.0212 | 0.1991 | 0.0696 |
| 0.0418 | 53.33 | 25000 | 0.0215 | 0.2034 | 0.0714 |
| 0.0404 | 55.47 | 26000 | 0.0210 | 0.2014 | 0.0684 |
| 0.0394 | 57.6 | 27000 | 0.0210 | 0.2050 | 0.0681 |
| 0.0399 | 59.73 | 28000 | 0.0211 | 0.2039 | 0.0700 |
| 0.0389 | 61.87 | 29000 | 0.0214 | 0.2091 | 0.0694 |
| 0.038 | 64.0 | 30000 | 0.0210 | 0.2100 | 0.0702 |
| 0.0361 | 66.13 | 31000 | 0.0215 | 0.2119 | 0.0703 |
| 0.0359 | 68.27 | 32000 | 0.0213 | 0.2108 | 0.0714 |
| 0.0354 | 70.4 | 33000 | 0.0211 | 0.2120 | 0.0699 |
| 0.0364 | 72.53 | 34000 | 0.0211 | 0.2128 | 0.0688 |
| 0.0361 | 74.67 | 35000 | 0.0212 | 0.2134 | 0.0694 |
| 0.0332 | 76.8 | 36000 | 0.0210 | 0.2176 | 0.0698 |
| 0.0341 | 78.93 | 37000 | 0.0208 | 0.2170 | 0.0688 |
| 0.032 | 81.07 | 38000 | 0.0209 | 0.2157 | 0.0686 |
| 0.0318 | 83.33 | 39000 | 0.0209 | 0.2166 | 0.0685 |
| 0.0325 | 85.47 | 40000 | 0.0209 | 0.2172 | 0.0687 |
| 0.0316 | 87.6 | 41000 | 0.0208 | 0.2181 | 0.0678 |
| 0.0302 | 89.73 | 42000 | 0.0208 | 0.2171 | 0.0679 |
| 0.0318 | 91.87 | 43000 | 0.0211 | 0.2179 | 0.0702 |
| 0.0314 | 94.0 | 44000 | 0.0208 | 0.2186 | 0.0690 |
| 0.0309 | 96.13 | 45000 | 0.0210 | 0.2193 | 0.0696 |
| 0.031 | 98.27 | 46000 | 0.0208 | 0.2191 | 0.0686 |
### Framework versions
- Transformers 4.29.1
- Pytorch 2.0.1+cu118
- Datasets 2.12.0
- Tokenizers 0.13.3
## Discussion
### Nans and Infs
While debugging other training sessions where more data from the Esperanto Common Voice dataset was used -- some loss calculations were returning either `inf` or `nan` -- I found that some of the training set trained with this model had surprisingly high CER. Some examples:
| file | Actual<br>---<br>Predicted | CER | Comment |
|:-----|:--------------------|:----|:--------|
|common_voice_eo_25365027.mp3 | en la hansaj agentejoj komercistoj el la regiono renkontis kolegojn el aliaj regionoj<br>---<br>a taaj keo eoj eejn kigos eegoj eioeegiooj| 0.61 | No audio |
|common_voice_eo_25365472.mp3 | ili vendas armilojn kaj teknologiojn al la fanatikuloj por gajni monon monon monon<br>---<br>ila mamato aiil ajn kno ion a a aotigojn pu aiooo aj knon | 0.55 | Barely any audio, distorted |
|common_voice_eo_25365836.mp3 | industria apliko estas la kreado de modifitaj bakterioj kiuj produktas deziratan kemian substancon<br>---<br>iiti sieetas la eeadooddddooiooaotooeioj aiicenon | 0.67 | Barely any audio, distorted |
|2600 | ili akiras plenkreskan plumaron nur en la kvina jaro<br>---<br>ili aaros peetaj patato a a sia ro | 0.52 | It's literally someone saying 'injabum'. Thanks, troll. |
|7333 | poste sekvas difinoj de la termino<br>---<br>po | 0.94 | No audio |
|7334 | li gvidis multajn kursojn laŭ la csehmetodo<br>---<br>po | 0.98 | No audio |
|7429 | tamen pro la rekonstruo de kluzoj ne eblas trapasi komplete<br>---<br>po | 0.97 | No audio |
|11662 | lingvotesto estas postulata ekzemple por akceptiĝo en anglalingvaj altlernejoj<br>---<br>linkonteto estastitot etateerteito en pootaeaje lgijoj | 0.58 | No audio |
Some examples have no audio. All of these files in the dataset are completely useless, and should be removed from the training set.
You can see that the model is trying to hallucinate the target when there's little or no audio. This is terrible for realistically reporting what was said. I'd also hope that there is some measure of certainty, and maybe only go with transcriptions that have relatively high certainty. However, I can't find how to get at a certainty value.
The Common Voice dataset also contains upvotes and downvotes. Of the high CER sentences above, all had 2 upvotes, with some having 0 downvotes, and some having 1. So we cannot rely on upvotes or downvotes to detect quality.
So what to do?
### Alternative 1
Despite these zero- and low-quality files, training seems to work OK. However, we still need to address when loss becomes `nan` or `inf` because that ruins the calculation.
By running `run_speech_recognition_ctc` with `do_train=false`, setting `model_name_or_path="xekri/wav2vec2-common_voice_13_0-eo-3"`, setting `eval_split_name` to either `test`, `validation`, or `train`, and also modifying `trainer.py` as follows, I can check if any losses are nan or inf:
```py
# To be JSON-serializable, we need to remove numpy types or zero-d tensors
metrics = denumpify_detensorize(metrics)
if all_losses is not None:
loss_nan = np.where(np.isnan(all_losses))
if len(loss_nan) != 0:
print(f'LOSSES ARE NAN: {loss_nan}')
loss_inf = np.where(np.isinf(all_losses))
if len(loss_inf) != 0:
print(f'LOSSES ARE INF: {loss_inf}')
metrics[f"{metric_key_prefix}_loss"] = all_losses.mean().item()
```
Doing this shows that of the 14913 examples in `test`, the following example results in `inf` loss:
`common_voice_eo_25167318.mp3`
The audio on this is severly garbled. This should absolutely be filtered out of the test set.
No `validation` samples result in `inf` or `nan`.
The following 18 out of 143984 examples in `train` result in `inf` loss:
```txt
common_voice_eo_25467641.mp3
common_voice_eo_25467723.mp3
common_voice_eo_25467791.mp3
common_voice_eo_25467820.mp3
common_voice_eo_25467943.mp3
common_voice_eo_25478612.mp3
common_voice_eo_25478623.mp3
common_voice_eo_25478631.mp3
common_voice_eo_25478756.mp3
common_voice_eo_25478762.mp3
common_voice_eo_25478768.mp3
common_voice_eo_25478769.mp3
common_voice_eo_25479150.mp3
common_voice_eo_25479203.mp3
common_voice_eo_25479229.mp3
common_voice_eo_25517673.mp3
common_voice_eo_25517677.mp3
common_voice_eo_25527739.mp3
```
Those files have no audio.
### Alternative 2
Another possibility is just to go through the audio files and throw away any where the peak audio isn't above some threshold.
### Alternative 3
Since this model seems to work well enough, I could run inference on all samples, and just discard the ones where the CER (as determined by this model) is too high, say above 0.5. Then use that to filter the examples and train another model. These high-CER examples are:
#### Test set
71 of 14913 examples in the test set show high CER.
```txt
common_voice_eo_25214319.mp3
common_voice_eo_25006596.mp3
common_voice_eo_27472721.mp3
common_voice_eo_27715088.mp3
common_voice_eo_27715091.mp3
common_voice_eo_26677019.mp3
common_voice_eo_26677023.mp3
common_voice_eo_20555291.mp3
common_voice_eo_25001942.mp3
common_voice_eo_25457354.mp3
common_voice_eo_25457355.mp3
common_voice_eo_25457365.mp3
common_voice_eo_25457373.mp3
common_voice_eo_25457396.mp3
common_voice_eo_25457397.mp3
common_voice_eo_25457409.mp3
common_voice_eo_25457410.mp3
common_voice_eo_25457412.mp3
common_voice_eo_25457442.mp3
common_voice_eo_25457444.mp3
common_voice_eo_25457445.mp3
common_voice_eo_25457577.mp3
common_voice_eo_25457578.mp3
common_voice_eo_28064453.mp3
common_voice_eo_25047803.mp3
common_voice_eo_25048418.mp3
common_voice_eo_25048419.mp3
common_voice_eo_25048421.mp3
common_voice_eo_25048423.mp3
common_voice_eo_25048428.mp3
common_voice_eo_25048574.mp3
common_voice_eo_25885643.mp3
common_voice_eo_25885645.mp3
common_voice_eo_26794882.mp3
common_voice_eo_27356529.mp3
common_voice_eo_25012640.mp3
common_voice_eo_25303457.mp3
common_voice_eo_18153931.mp3
common_voice_eo_18776206.mp3
common_voice_eo_18776208.mp3
common_voice_eo_18776219.mp3
common_voice_eo_18776220.mp3
common_voice_eo_18776222.mp3
common_voice_eo_18776223.mp3
common_voice_eo_18776236.mp3
common_voice_eo_18776238.mp3
common_voice_eo_18776244.mp3
common_voice_eo_18776248.mp3
common_voice_eo_18776285.mp3
common_voice_eo_18776287.mp3
common_voice_eo_18776297.mp3
common_voice_eo_18776298.mp3
common_voice_eo_25047998.mp3
common_voice_eo_25047999.mp3
common_voice_eo_25048000.mp3
common_voice_eo_25048001.mp3
common_voice_eo_25048002.mp3
common_voice_eo_25053113.mp3
common_voice_eo_25068355.mp3
common_voice_eo_25333056.mp3
common_voice_eo_25371639.mp3
common_voice_eo_25371640.mp3
common_voice_eo_25371641.mp3
common_voice_eo_25371642.mp3
common_voice_eo_25371643.mp3
common_voice_eo_22441946.mp3
common_voice_eo_26622121.mp3
common_voice_eo_25167318.mp3
common_voice_eo_25252685.mp3
common_voice_eo_25252698.mp3
common_voice_eo_25518636.mp3
```
Note on two of the examples: We know that _saluton kiel vi fartas_ ("Hello, how are you") and _atendu momenton_ ("Wait a moment") is a good start in learning Esperanto, but if that's not the text to record, you're not really helping.
#### Validation set
17 of 14909 examples in the test set show high CER.
```txt
common_voice_eo_25392669.mp3
common_voice_eo_25392674.mp3
common_voice_eo_25392675.mp3
common_voice_eo_25392676.mp3
common_voice_eo_25392678.mp3
common_voice_eo_25392693.mp3
common_voice_eo_25392694.mp3
common_voice_eo_25392695.mp3
common_voice_eo_25392697.mp3
common_voice_eo_25392701.mp3
common_voice_eo_25392702.mp3
common_voice_eo_25392708.mp3
common_voice_eo_25392709.mp3
common_voice_eo_25408881.mp3
common_voice_eo_25408882.mp3
common_voice_eo_25408885.mp3
common_voice_eo_27380623.mp3
```
I didn't include some which had high CER because of hallucinations during a one-word recording with lots of silence before and after. The recording itself is fine on these.
#### Training set
135 of 143984 examples yielded high CER. I removed some from this list that had high CER but sounded fine.
```txt
common_voice_eo_25365027.mp3
common_voice_eo_25365472.mp3
common_voice_eo_25365480.mp3
common_voice_eo_25365532.mp3
common_voice_eo_25365695.mp3
common_voice_eo_25365744.mp3
common_voice_eo_25365804.mp3
common_voice_eo_25365836.mp3
common_voice_eo_25365855.mp3
common_voice_eo_25372587.mp3
common_voice_eo_25401060.mp3
common_voice_eo_25430837.mp3
common_voice_eo_25444509.mp3
common_voice_eo_25240777.mp3
common_voice_eo_24942754.mp3
common_voice_eo_24942755.mp3
common_voice_eo_24990372.mp3
common_voice_eo_24990385.mp3
common_voice_eo_24990390.mp3
common_voice_eo_24990397.mp3
common_voice_eo_24990413.mp3
common_voice_eo_24990427.mp3
common_voice_eo_24990429.mp3
common_voice_eo_24990435.mp3
common_voice_eo_24990441.mp3
common_voice_eo_24990454.mp3
common_voice_eo_24990457.mp3
common_voice_eo_24990459.mp3
common_voice_eo_24990490.mp3
common_voice_eo_25529345.mp3
common_voice_eo_25648750.mp3
common_voice_eo_28670472.mp3
common_voice_eo_27931966.mp3
common_voice_eo_28252265.mp3
common_voice_eo_25454951.mp3
common_voice_eo_25927616.mp3
common_voice_eo_25153203.mp3
common_voice_eo_25238543.mp3
common_voice_eo_25284237.mp3
common_voice_eo_25460131.mp3
common_voice_eo_25460185.mp3
common_voice_eo_25460186.mp3
common_voice_eo_25460188.mp3
common_voice_eo_25460189.mp3
common_voice_eo_25446723.mp3
common_voice_eo_26025150.mp3
common_voice_eo_26640189.mp3
common_voice_eo_26888468.mp3
common_voice_eo_24844824.mp3
common_voice_eo_25022506.mp3
common_voice_eo_25022507.mp3
common_voice_eo_25022516.mp3
common_voice_eo_25032858.mp3
common_voice_eo_25032859.mp3
common_voice_eo_25032865.mp3
common_voice_eo_25243988.mp3
common_voice_eo_25244009.mp3
common_voice_eo_25266094.mp3
common_voice_eo_25266141.mp3
common_voice_eo_25285278.mp3
common_voice_eo_25286768.mp3
common_voice_eo_25457171.mp3
common_voice_eo_25467641.mp3
common_voice_eo_25467723.mp3
common_voice_eo_25467791.mp3
common_voice_eo_25467820.mp3
common_voice_eo_25467943.mp3
common_voice_eo_25478612.mp3
common_voice_eo_25478623.mp3
common_voice_eo_25478631.mp3
common_voice_eo_25478756.mp3
common_voice_eo_25478762.mp3
common_voice_eo_25478768.mp3
common_voice_eo_25478769.mp3
common_voice_eo_25479150.mp3
common_voice_eo_25479203.mp3
common_voice_eo_25479229.mp3
common_voice_eo_25517673.mp3
common_voice_eo_25517677.mp3
common_voice_eo_25527739.mp3
common_voice_eo_25975149.mp3
common_voice_eo_26193748.mp3
common_voice_eo_28401039.mp3
common_voice_eo_28421315.mp3
common_voice_eo_28937347.mp3
common_voice_eo_24890414.mp3
common_voice_eo_25294479.mp3
common_voice_eo_25438966.mp3
common_voice_eo_28855568.mp3
common_voice_eo_29011007.mp3
common_voice_eo_24599888.mp3
common_voice_eo_26964252.mp3
common_voice_eo_26964496.mp3
common_voice_eo_26964510.mp3
common_voice_eo_25432789.mp3
common_voice_eo_26688158.mp3
common_voice_eo_28516354.mp3
common_voice_eo_24790865.mp3
common_voice_eo_24790897.mp3
common_voice_eo_24790898.mp3
common_voice_eo_24790899.mp3
common_voice_eo_24790900.mp3
common_voice_eo_25362713.mp3
common_voice_eo_27585084.mp3
common_voice_eo_24813131.mp3
common_voice_eo_25035262.mp3
common_voice_eo_26000289.mp3
common_voice_eo_26003943.mp3
common_voice_eo_26283983.mp3
common_voice_eo_28708931.mp3
common_voice_eo_28037217.mp3
common_voice_eo_29273106.mp3
common_voice_eo_26006657.mp3
common_voice_eo_25399924.mp3
common_voice_eo_27982431.mp3
common_voice_eo_25893779.mp3
common_voice_eo_27842061.mp3
common_voice_eo_25052385.mp3
common_voice_eo_25807395.mp3
common_voice_eo_25807985.mp3
common_voice_eo_25808039.mp3
common_voice_eo_25808407.mp3
common_voice_eo_25809036.mp3
common_voice_eo_27487795.mp3
common_voice_eo_28460556.mp3
common_voice_eo_28884851.mp3
common_voice_eo_24819719.mp3
common_voice_eo_25153594.mp3
common_voice_eo_25234585.mp3
common_voice_eo_25245164.mp3
common_voice_eo_27538877.mp3
common_voice_eo_24862771.mp3
common_voice_eo_25070167.mp3
common_voice_eo_26381720.mp3
common_voice_eo_28110376.mp3
```
### Alternative 3.1
Of those files that have no or distorted audio, maybe change their target to be empty? Except for 'injabum'.
### And also
Since one can sign up at Common Voice to review Esperanto audio files, I've done so in the hopes of making a small contribution in quality.
|