add results on dev audio with step 24000

Files changed (5) hide show

README.md CHANGED Viewed

@@ -20,10 +20,10 @@ model-index:
     metrics:
        - name: Test WER
          type: wer
-         value: 21.65
        - name: Test CER
          type: cer
-         value: 6.52
   - task:
       name: Automatic Speech Recognition
       type: automatic-speech-recognition
@@ -34,10 +34,11 @@ model-index:
     metrics:
        - name: Test WER
          type: wer
-         value: 61.72
        - name: Test CER
          type: cer
-         value: 16.43
 ---
 ## Model description
@@ -83,8 +84,18 @@ The following hyperparameters were used during training:
 | 0.8488        | 4.59  | 16000 | inf             | 0.2187 |
 | 0.8359        | 4.87  | 17000 | inf             | 0.2172 |
-It achieves the best result on the validation set on Step 17000:
-- Wer: 0.2172
 Got some issue with validation loss calculation.

     metrics:
        - name: Test WER
          type: wer
+         value: to recompute with STEP 24000
        - name: Test CER
          type: cer
+         value: to recompute with STEP 24000
   - task:
       name: Automatic Speech Recognition
       type: automatic-speech-recognition
     metrics:
        - name: Test WER
          type: wer
+         value: 35.29
        - name: Test CER
          type: cer
+         value: 13.94
 ---
 ## Model description
 | 0.8488        | 4.59  | 16000 | inf             | 0.2187 |
 | 0.8359        | 4.87  | 17000 | inf             | 0.2172 |
+Training continued with checkpoint from STEP 17000:
+| /             | 5.16  | 18000 | inf             | 0.2176 |
+| /             | 5.45  | 19000 | inf             | 0.2181 |
+| /             | 5.73  | 20000 | inf             | 0.2155 |
+| /             | 6.02  | 21000 | inf             | 0.2140 |
+| /             | 6.31  | 22000 | inf             | 0.2124 |
+| /             | 6.59  | 23000 | inf             | 0.2117 |
+| /             | 6.88  | 24000 | inf             | 0.2116 |
+It achieves the best result on the validation set on Step 24000:
+- Wer: 0.2116
 Got some issue with validation loss calculation.

eval.py CHANGED Viewed

@@ -48,18 +48,15 @@ def log_results(result: Dataset, args: Dict[str, str]):
 def normalize_text(text: str) -> str:
     """DO ADAPT FOR YOUR USE CASE. this function normalizes the target text."""
-    chars_to_ignore_regex = '[^a-zàâäçéèêëîïôöùûüÿ\'’ ]'  # noqa: W605 IMPORTANT: this should correspond to the chars that were ignored during training
-    text = re.sub(chars_to_ignore_regex, "", text.lower()).replace('’', "'")
     # In addition, we can normalize the target text, e.g. removing new lines characters etc...
     # note that order is important here!
     token_sequences_to_ignore = ["\n\n", "\n", "   ", "  "]
     for t in token_sequences_to_ignore:
         text = " ".join(text.split(t))
     return text
@@ -68,7 +65,7 @@ def main(args):
     dataset = load_dataset(args.dataset, args.config, split=args.split, use_auth_token=True)
     # for testing: only process the first two examples as a test
-    # dataset = dataset.select(range(10))
     # load processor
     feature_extractor = AutoFeatureExtractor.from_pretrained(args.model_id)

 def normalize_text(text: str) -> str:
     """DO ADAPT FOR YOUR USE CASE. this function normalizes the target text."""
     # In addition, we can normalize the target text, e.g. removing new lines characters etc...
     # note that order is important here!
     token_sequences_to_ignore = ["\n\n", "\n", "   ", "  "]
     for t in token_sequences_to_ignore:
         text = " ".join(text.split(t))
+    chars_to_ignore_regex = '[^a-zàâäçéèêëîïôöùûüÿ\'’ ]'  # noqa: W605 IMPORTANT: this should correspond to the chars that were ignored during training
+    text = re.sub(chars_to_ignore_regex, "", text.lower()).replace('’', "'")
     return text
     dataset = load_dataset(args.dataset, args.config, split=args.split, use_auth_token=True)
     # for testing: only process the first two examples as a test
+#     dataset = dataset.select(range(2))
     # load processor
     feature_extractor = AutoFeatureExtractor.from_pretrained(args.model_id)

log_speech-recognition-community-v2_dev_data_fr_validation_predictions.txt CHANGED Viewed

The diff for this file is too large to render. See raw diff

log_speech-recognition-community-v2_dev_data_fr_validation_targets.txt CHANGED Viewed

The diff for this file is too large to render. See raw diff

speech-recognition-community-v2_dev_data_fr_validation_eval_results.txt CHANGED Viewed