Scrya
/

whisper-medium-zh-TW-augmented

@@ -1,12 +1,10 @@
 ---
-language:
-- zh-TW
 license: apache-2.0
 tags:
 - whisper-event
 - generated_from_trainer
 datasets:
-- mozilla-foundation/common_voice_11_0
 model-index:
 - name: Whisper Medium TW - Augmented
   results: []
@@ -17,7 +15,7 @@ should probably proofread and complete it, then remove this comment. -->
 # Whisper Medium TW - Augmented
-This model is a fine-tuned version of [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) on the mozilla-foundation/common_voice_11_0 dataset.
 It achieves the following results on the evaluation set:
 - eval_loss: 0.0951
 - eval_wer: 7.4865

 ---
 license: apache-2.0
 tags:
 - whisper-event
 - generated_from_trainer
 datasets:
+- common_voice_11_0
 model-index:
 - name: Whisper Medium TW - Augmented
   results: []
 # Whisper Medium TW - Augmented
+This model is a fine-tuned version of [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) on the common_voice_11_0 dataset.
 It achieves the following results on the evaluation set:
 - eval_loss: 0.0951
 - eval_wer: 7.4865

fine-tune-whisper-non-streaming-zh-TW.ipynb CHANGED Viewed

@@ -1241,7 +1241,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 27,
    "id": "c704f91e-241b-48c9-b8e0-f0da396a9663",
    "metadata": {
     "id": "c704f91e-241b-48c9-b8e0-f0da396a9663"
@@ -1249,8 +1249,8 @@
    "outputs": [],
    "source": [
     "kwargs = {\n",
-    "    \"dataset_tags\": \"mozilla-foundation/common_voice_11_0\",\n",
-    "    \"dataset\": \"mozilla-foundation/common_voice_11_0\",  # a 'pretty' name for the training dataset\n",
     "#     \"language\": \"zh\",\n",
     "    \"model_name\": \"Whisper Medium TW - Augmented\",  # a 'pretty' name for your model\n",
     "    \"finetuned_from\": \"openai/whisper-medium\",\n",
@@ -1271,7 +1271,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 28,
    "id": "d7030622-caf7-4039-939b-6195cdaa2585",
    "metadata": {
     "id": "d7030622-caf7-4039-939b-6195cdaa2585"
@@ -1284,39 +1284,7 @@
       "Saving model checkpoint to ./\n",
       "Configuration saved in ./config.json\n",
       "Model weights saved in ./pytorch_model.bin\n",
-      "Feature extractor saved in ./preprocessor_config.json\n",
-      "Several commits (2) will be pushed upstream.\n",
-      "The progress bars may be unreliable.\n",
-      "remote: ----------------------------------------------------------\u001b[0;31m        \n",
-      "remote: Sorry, your push was rejected during YAML metadata verification:        \n",
-      "remote: - Error: \"language[0]\" must only contain lowercase characters        \n",
-      "remote: - Error: \"language[0]\" with value \"zh-TW\" is not valid. It must be an ISO 639-1, 639-2 or 639-3 code (two/three letters), or a special value like \"code\", \"multilingual\". If you want to use BCP-47 identifiers, you can specify them in language_bcp47.\u001b[0;32m        \n",
-      "remote: ----------------------------------------------------------        \n",
-      "remote: Please find the documentation at:        \n",
-      "remote: https://huggingface.co/docs/hub/model-cards#model-card-metadata\u001b[0;0m        \n",
-      "remote: ----------------------------------------------------------        \n",
-      "To https://huggingface.co/Scrya/whisper-medium-zh-TW-augmented\n",
-      " ! [remote rejected] main -> main (pre-receive hook declined)\n",
-      "error: failed to push some refs to 'https://huggingface.co/Scrya/whisper-medium-zh-TW-augmented'\n",
-      "\n"
-     ]
-    },
-    {
-     "ename": "OSError",
-     "evalue": "remote: ----------------------------------------------------------\u001b[0;31m        \nremote: Sorry, your push was rejected during YAML metadata verification:        \nremote: - Error: \"language[0]\" must only contain lowercase characters        \nremote: - Error: \"language[0]\" with value \"zh-TW\" is not valid. It must be an ISO 639-1, 639-2 or 639-3 code (two/three letters), or a special value like \"code\", \"multilingual\". If you want to use BCP-47 identifiers, you can specify them in language_bcp47.\u001b[0;32m        \nremote: ----------------------------------------------------------        \nremote: Please find the documentation at:        \nremote: https://huggingface.co/docs/hub/model-cards#model-card-metadata\u001b[0;0m        \nremote: ----------------------------------------------------------        \nTo https://huggingface.co/Scrya/whisper-medium-zh-TW-augmented\n ! [remote rejected] main -> main (pre-receive hook declined)\nerror: failed to push some refs to 'https://huggingface.co/Scrya/whisper-medium-zh-TW-augmented'\n",
-     "output_type": "error",
-     "traceback": [
-      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
-      "\u001b[0;31mCalledProcessError\u001b[0m                        Traceback (most recent call last)",
-      "File \u001b[0;32m~/whisper/lib/python3.8/site-packages/huggingface_hub/repository.py:1207\u001b[0m, in \u001b[0;36mRepository.git_push\u001b[0;34m(self, upstream, blocking, auto_lfs_prune)\u001b[0m\n\u001b[1;32m   1206\u001b[0m             \u001b[38;5;28;01mif\u001b[39;00m return_code:\n\u001b[0;32m-> 1207\u001b[0m                 \u001b[38;5;28;01mraise\u001b[39;00m subprocess\u001b[38;5;241m.\u001b[39mCalledProcessError(\n\u001b[1;32m   1208\u001b[0m                     return_code, process\u001b[38;5;241m.\u001b[39margs, output\u001b[38;5;241m=\u001b[39mstdout, stderr\u001b[38;5;241m=\u001b[39mstderr\n\u001b[1;32m   1209\u001b[0m                 )\n\u001b[1;32m   1211\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m subprocess\u001b[38;5;241m.\u001b[39mCalledProcessError \u001b[38;5;28;01mas\u001b[39;00m exc:\n",
-      "\u001b[0;31mCalledProcessError\u001b[0m: Command '['git', 'push', '--set-upstream', 'origin', 'main']' returned non-zero exit status 1.",
-      "\nDuring handling of the above exception, another exception occurred:\n",
-      "\u001b[0;31mOSError\u001b[0m                                   Traceback (most recent call last)",
-      "Cell \u001b[0;32mIn[28], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mtrainer\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mpush_to_hub\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n",
-      "File \u001b[0;32m~/whisper/lib/python3.8/site-packages/transformers/trainer.py:3492\u001b[0m, in \u001b[0;36mTrainer.push_to_hub\u001b[0;34m(self, commit_message, blocking, **kwargs)\u001b[0m\n\u001b[1;32m   3489\u001b[0m     \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mpush_in_progress\u001b[38;5;241m.\u001b[39m_process\u001b[38;5;241m.\u001b[39mkill()\n\u001b[1;32m   3490\u001b[0m     \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mpush_in_progress \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[0;32m-> 3492\u001b[0m git_head_commit_url \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrepo\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mpush_to_hub\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m   3493\u001b[0m \u001b[43m    \u001b[49m\u001b[43mcommit_message\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcommit_message\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mblocking\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mblocking\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mauto_lfs_prune\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\n\u001b[1;32m   3494\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m   3495\u001b[0m \u001b[38;5;66;03m# push separately the model card to be independant from the rest of the model\u001b[39;00m\n\u001b[1;32m   3496\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39margs\u001b[38;5;241m.\u001b[39mshould_save:\n",
-      "File \u001b[0;32m~/whisper/lib/python3.8/site-packages/huggingface_hub/repository.py:1432\u001b[0m, in \u001b[0;36mRepository.push_to_hub\u001b[0;34m(self, commit_message, blocking, clean_ok, auto_lfs_prune)\u001b[0m\n\u001b[1;32m   1430\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mgit_add(auto_lfs_track\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m)\n\u001b[1;32m   1431\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mgit_commit(commit_message)\n\u001b[0;32m-> 1432\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mgit_push\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m   1433\u001b[0m \u001b[43m    \u001b[49m\u001b[43mupstream\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43mf\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43morigin \u001b[39;49m\u001b[38;5;132;43;01m{\u001b[39;49;00m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mcurrent_branch\u001b[49m\u001b[38;5;132;43;01m}\u001b[39;49;00m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1434\u001b[0m \u001b[43m    \u001b[49m\u001b[43mblocking\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mblocking\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1435\u001b[0m \u001b[43m    \u001b[49m\u001b[43mauto_lfs_prune\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mauto_lfs_prune\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1436\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n",
-      "File \u001b[0;32m~/whisper/lib/python3.8/site-packages/huggingface_hub/repository.py:1212\u001b[0m, in \u001b[0;36mRepository.git_push\u001b[0;34m(self, upstream, blocking, auto_lfs_prune)\u001b[0m\n\u001b[1;32m   1207\u001b[0m                 \u001b[38;5;28;01mraise\u001b[39;00m subprocess\u001b[38;5;241m.\u001b[39mCalledProcessError(\n\u001b[1;32m   1208\u001b[0m                     return_code, process\u001b[38;5;241m.\u001b[39margs, output\u001b[38;5;241m=\u001b[39mstdout, stderr\u001b[38;5;241m=\u001b[39mstderr\n\u001b[1;32m   1209\u001b[0m                 )\n\u001b[1;32m   1211\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m subprocess\u001b[38;5;241m.\u001b[39mCalledProcessError \u001b[38;5;28;01mas\u001b[39;00m exc:\n\u001b[0;32m-> 1212\u001b[0m     \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mEnvironmentError\u001b[39;00m(exc\u001b[38;5;241m.\u001b[39mstderr)\n\u001b[1;32m   1214\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m blocking:\n\u001b[1;32m   1216\u001b[0m     \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mstatus_method\u001b[39m():\n",
-      "\u001b[0;31mOSError\u001b[0m: remote: ----------------------------------------------------------\u001b[0;31m        \nremote: Sorry, your push was rejected during YAML metadata verification:        \nremote: - Error: \"language[0]\" must only contain lowercase characters        \nremote: - Error: \"language[0]\" with value \"zh-TW\" is not valid. It must be an ISO 639-1, 639-2 or 639-3 code (two/three letters), or a special value like \"code\", \"multilingual\". If you want to use BCP-47 identifiers, you can specify them in language_bcp47.\u001b[0;32m        \nremote: ----------------------------------------------------------        \nremote: Please find the documentation at:        \nremote: https://huggingface.co/docs/hub/model-cards#model-card-metadata\u001b[0;0m        \nremote: ----------------------------------------------------------        \nTo https://huggingface.co/Scrya/whisper-medium-zh-TW-augmented\n ! [remote rejected] main -> main (pre-receive hook declined)\nerror: failed to push some refs to 'https://huggingface.co/Scrya/whisper-medium-zh-TW-augmented'\n"
      ]
     }
    ],

   },
   {
    "cell_type": "code",
+   "execution_count": 31,
    "id": "c704f91e-241b-48c9-b8e0-f0da396a9663",
    "metadata": {
     "id": "c704f91e-241b-48c9-b8e0-f0da396a9663"
    "outputs": [],
    "source": [
     "kwargs = {\n",
+    "#     \"dataset_tags\": \"mozilla-foundation/common_voice_11_0\",\n",
+    "#     \"dataset\": \"mozilla-foundation/common_voice_11_0\",  # a 'pretty' name for the training dataset\n",
     "#     \"language\": \"zh\",\n",
     "    \"model_name\": \"Whisper Medium TW - Augmented\",  # a 'pretty' name for your model\n",
     "    \"finetuned_from\": \"openai/whisper-medium\",\n",
   },
   {
    "cell_type": "code",
+   "execution_count": null,
    "id": "d7030622-caf7-4039-939b-6195cdaa2585",
    "metadata": {
     "id": "d7030622-caf7-4039-939b-6195cdaa2585"
       "Saving model checkpoint to ./\n",
       "Configuration saved in ./config.json\n",
       "Model weights saved in ./pytorch_model.bin\n",
+      "Feature extractor saved in ./preprocessor_config.json\n"
      ]
     }
    ],