alvanlii
/

whisper-small-cantonese

Automatic Speech Recognition

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

alvanli commited on May 15, 2024

Commit

8034c9c

·

1 Parent(s): b14b2d8

cer 7.93

Files changed (4) hide show

README.md +5 -4
config.json +1 -1
model.safetensors +1 -1
optimizer.pt +1 -1

README.md CHANGED Viewed

@@ -23,7 +23,7 @@ model-index:
     metrics:
     - name: Normalized CER
       type: cer
-      value: 8.94
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
@@ -41,17 +41,18 @@ For training,
 |--|--|
 |Common Voice 16.0 zh-HK Train|138|
 |Common Voice 16.0 yue Train|85|
 |Cantonese-ASR|72|
 |CantoMap|23|
 |[Pseudo-Labelled YouTube Data](https://huggingface.co/datasets/alvanlii/cantonese-youtube-pseudo-transcription)|438|
-|Total|756|
 For evaluation, Common Voice 16.0 yue Test set is used.
 ## Results
-- CER (lower is better): 0.1073
-  - down from 0.1581 in the previous version dated Jan 28, 2023
 - GPU Inference with Fast Attention (example below): 0.055s/sample
   - Note all GPU evaluations are done on RTX 3090 GPU
 - GPU Inference: 0.308s/sample

     metrics:
     - name: Normalized CER
       type: cer
+      value: 7.93
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
 |--|--|
 |Common Voice 16.0 zh-HK Train|138|
 |Common Voice 16.0 yue Train|85|
+|Common Voice 17.0 yue Train|178|
 |Cantonese-ASR|72|
 |CantoMap|23|
 |[Pseudo-Labelled YouTube Data](https://huggingface.co/datasets/alvanlii/cantonese-youtube-pseudo-transcription)|438|
 For evaluation, Common Voice 16.0 yue Test set is used.
 ## Results
+- CER (lower is better): 0.0972
+  - down from 0.1073, 0.1581 in the previous versions
+- CER (punctuations removed): 0.0793
 - GPU Inference with Fast Attention (example below): 0.055s/sample
   - Note all GPU evaluations are done on RTX 3090 GPU
 - GPU Inference: 0.308s/sample

config.json CHANGED Viewed

@@ -45,7 +45,7 @@
   "scale_embedding": false,
   "suppress_tokens": [],
   "torch_dtype": "float32",
-  "transformers_version": "4.38.0.dev0",
   "use_cache": false,
   "use_weighted_layer_sum": false,
   "vocab_size": 51865

   "scale_embedding": false,
   "suppress_tokens": [],
   "torch_dtype": "float32",
+  "transformers_version": "4.39.3",
   "use_cache": false,
   "use_weighted_layer_sum": false,
   "vocab_size": 51865

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4db701d0355f26da25cc888d1f18e026f4c437e125b3a53a713f5c4372764abc
 size 966995080

 version https://git-lfs.github.com/spec/v1
+oid sha256:fde842c011b43b2a7b988ad9b9f1a082731bc29c3d58b0cdffa3e71252ef7256
 size 966995080

optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8cae99f24f5d58ff74074719ecf5fe28cb34eee7314559ec80a678ca930dbe9b
 size 1925064044

 version https://git-lfs.github.com/spec/v1
+oid sha256:52d3d75f188b0e83816ddfe2c081aa5811fef9f12f90bcf95efbb6f22d4cb03a
 size 1925064044