You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

wav2vec2-conformer-rel-pos-jv-openslr

This model is a fine-tuned version of facebook/wav2vec2-conformer-rel-pos-large on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2470
  • Wer: 0.1227

Model description

The model is a fine-tuned version of wav2vec2-conformer-rel-pos-large, specifically adapted using the OpenSLR 41 dataset, which is focused on the Javanese language domain. This adaptation enables the model to effectively recognize and process spoken Javanese, leveraging the robust capabilities of the wav2vec2-conformer-rel-pos-large architecture combined with domain-specific training data.

Intended uses & limitations

This model is intended for transcribing spoken Javanese language from audio recordings. It achieves a Word Error Rate (WER) of 12%, indicating that while the model performs reasonably well, it still produces significant transcription errors. Users should be aware that the accuracy may vary, particularly in cases with challenging audio conditions or less common dialects. Additionally, this model requires input audio at a sample rate of 16kHz, which may limit its applicability for recordings at different sample rates or lower quality audio files.

Training and evaluation data

The model use OpenSLR41 datasets, and split into 2 section (training and testing), then the model is trained using 1xA100 GPU with a training duration of NaN hours.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 1000
  • num_epochs: 75
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
0.5826 2.8329 2000 0.4733 0.4445
0.3478 5.6657 4000 0.3538 0.3191
0.2532 8.4986 6000 0.3085 0.2646
0.2028 11.3314 8000 0.2799 0.2467
0.1628 14.1643 10000 0.2623 0.2095
0.1407 16.9972 12000 0.2510 0.2068
0.1154 19.8300 14000 0.2922 0.1937
0.1044 22.6629 16000 0.2660 0.1730
0.0929 25.4958 18000 0.2818 0.1868
0.0798 28.3286 20000 0.2573 0.1633
0.074 31.1615 22000 0.2398 0.1647
0.0678 33.9943 24000 0.2601 0.1606
0.0628 36.8272 26000 0.2627 0.1613
0.057 39.6601 28000 0.2393 0.1468
0.0547 42.4929 30000 0.2662 0.1585
0.0512 45.3258 32000 0.2544 0.1502
0.0446 48.1586 34000 0.2542 0.1502
0.045 50.9915 36000 0.2624 0.1516
0.0403 53.8244 38000 0.2487 0.1420
0.0378 56.6572 40000 0.2498 0.1330
0.0353 59.4901 42000 0.2495 0.1309
0.0337 62.3229 44000 0.2505 0.1316
0.029 65.1558 46000 0.2373 0.1247
0.0277 67.9887 48000 0.2543 0.1282
0.0283 70.8215 50000 0.2547 0.1234
0.0275 73.6544 52000 0.2470 0.1227

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.2.1+cu118
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
0
Safetensors
Model size
619M params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for johaness14/wav2vec2-conformer-rel-pos-jv-openslr

Finetuned
(4)
this model

Dataset used to train johaness14/wav2vec2-conformer-rel-pos-jv-openslr

Collection including johaness14/wav2vec2-conformer-rel-pos-jv-openslr