JBZhang2342
/

speecht5_tts

@@ -1,26 +1,23 @@
 ---
-language:
-- en
 license: mit
 base_model: microsoft/speecht5_tts
 tags:
-- en_accent,mozilla,t5,common_voice_1_0
 - generated_from_trainer
 datasets:
-- mozilla-foundation/common_voice_1_0
 model-index:
-- name: SpeechT5 TTS English Accented
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# SpeechT5 TTS English Accented
-This model is a fine-tuned version of [microsoft/speecht5_tts](https://huggingface.co/microsoft/speecht5_tts) on the Common Voice dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.5103
 ## Model description
@@ -39,7 +36,7 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 1e-05
 - train_batch_size: 16
 - eval_batch_size: 16
 - seed: 42
@@ -48,18 +45,23 @@ The following hyperparameters were used during training:
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 500
-- training_steps: 5000
 - mixed_precision_training: Native AMP
 ### Training results
-| Training Loss | Epoch | Step | Validation Loss |
-|:-------------:|:-----:|:----:|:---------------:|
-| 0.5407        | 16.0  | 1000 | 0.4686          |
-| 0.4995        | 32.0  | 2000 | 0.4828          |
-| 0.4729        | 48.0  | 3000 | 0.4939          |
-| 0.4733        | 64.0  | 4000 | 0.5020          |
-| 0.4901        | 80.0  | 5000 | 0.5103          |
 ### Framework versions

 ---
 license: mit
 base_model: microsoft/speecht5_tts
 tags:
 - generated_from_trainer
 datasets:
+- common_voice_1_0
 model-index:
+- name: speecht5_tts
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# speecht5_tts
+This model is a fine-tuned version of [microsoft/speecht5_tts](https://huggingface.co/microsoft/speecht5_tts) on the common_voice_1_0 dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.5189
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 2e-05
 - train_batch_size: 16
 - eval_batch_size: 16
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 500
+- training_steps: 10000
 - mixed_precision_training: Native AMP
 ### Training results
+| Training Loss | Epoch | Step  | Validation Loss |
+|:-------------:|:-----:|:-----:|:---------------:|
+| 0.5395        | 16.0  | 1000  | 0.4726          |
+| 0.4727        | 32.0  | 2000  | 0.4819          |
+| 0.4513        | 48.0  | 3000  | 0.4871          |
+| 0.4526        | 64.0  | 4000  | 0.5006          |
+| 0.4474        | 80.0  | 5000  | 0.5022          |
+| 0.4147        | 96.0  | 6000  | 0.5039          |
+| 0.423         | 112.0 | 7000  | 0.5154          |
+| 0.4271        | 128.0 | 8000  | 0.5217          |
+| 0.4232        | 144.0 | 9000  | 0.5198          |
+| 0.4044        | 160.0 | 10000 | 0.5189          |
 ### Framework versions

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1c25c55020f4fb327a494f3feaa31d8370f321c5f674d4907ce991b6a96750a0
 size 577789320

 version https://git-lfs.github.com/spec/v1
+oid sha256:b8082db23bbe95cf935e6d6f3b9471d1cb2f00ef2da65dfea8da35263124bfcb
 size 577789320