metadata

library_name: transformers
language:
  - ja
license: apache-2.0
base_model: rinna/japanese-hubert-base
tags:
  - automatic-speech-recognition
  - mozilla-foundation/common_voice_13_0
  - generated_from_trainer
metrics:
  - wer
model-index:
  - name: Hubert-common_voice-phoneme-onlyJSUT
    results: []

Hubert-common_voice-phoneme-onlyJSUT

This model is a fine-tuned version of rinna/japanese-hubert-base on the MOZILLA-FOUNDATION/COMMON_VOICE_13_0 - JA dataset. It achieves the following results on the evaluation set:

Loss: 0.1563
Wer: 1.0
Cer: 0.1052

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 16
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 12500
num_epochs: 20.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
No log	0.7092	100	11.3614	1.054	0.9861
No log	1.4184	200	5.9358	1.0	0.9851
No log	2.1277	300	5.3101	1.0	0.9851
No log	2.8369	400	4.8953	1.0	0.9851
6.9061	3.5461	500	4.4021	1.0	0.9851
6.9061	4.2553	600	3.9323	1.0	0.9851
6.9061	4.9645	700	3.4932	1.0	0.9851
6.9061	5.6738	800	3.2092	1.0	0.9850
6.9061	6.3830	900	3.0484	1.0	0.9851
3.4303	7.0922	1000	2.9961	1.0	0.9850
3.4303	7.8014	1100	2.8000	1.0	0.9850
3.4303	8.5106	1200	1.9061	1.0	0.5949
3.4303	9.2199	1300	0.8767	1.0	0.1547
3.4303	9.9291	1400	0.5386	1.0	0.1268
1.6163	10.6383	1500	0.3820	1.0	0.1190
1.6163	11.3475	1600	0.2983	1.0	0.1138
1.6163	12.0567	1700	0.2524	1.0	0.1117
1.6163	12.7660	1800	0.2260	1.0	0.1104
1.6163	13.4752	1900	0.2096	1.0	0.1110
0.332	14.1844	2000	0.1896	0.998	0.1092
0.332	14.8936	2100	0.1838	1.0	0.1095
0.332	15.6028	2200	0.1766	1.0	0.1081
0.332	16.3121	2300	0.1688	0.998	0.1071
0.332	17.0213	2400	0.1667	0.998	0.1069
0.2296	17.7305	2500	0.1643	1.0	0.1069
0.2296	18.4397	2600	0.1602	1.0	0.1071
0.2296	19.1489	2700	0.1654	1.0	0.1068
0.2296	19.8582	2800	0.1617	0.998	0.1060

Framework versions

Transformers 4.47.0.dev0
Pytorch 2.5.1+cu124
Datasets 3.1.0
Tokenizers 0.20.3