Scrya commited on
Commit
b68adb3
1 Parent(s): dbb333b

update model card README.md

Browse files
README.md CHANGED
@@ -1,12 +1,10 @@
1
  ---
2
- language:
3
- - zh-TW
4
  license: apache-2.0
5
  tags:
6
  - whisper-event
7
  - generated_from_trainer
8
  datasets:
9
- - mozilla-foundation/common_voice_11_0
10
  model-index:
11
  - name: Whisper Medium TW - Augmented
12
  results: []
@@ -17,7 +15,7 @@ should probably proofread and complete it, then remove this comment. -->
17
 
18
  # Whisper Medium TW - Augmented
19
 
20
- This model is a fine-tuned version of [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) on the mozilla-foundation/common_voice_11_0 dataset.
21
  It achieves the following results on the evaluation set:
22
  - eval_loss: 0.0951
23
  - eval_wer: 7.4865
 
1
  ---
 
 
2
  license: apache-2.0
3
  tags:
4
  - whisper-event
5
  - generated_from_trainer
6
  datasets:
7
+ - common_voice_11_0
8
  model-index:
9
  - name: Whisper Medium TW - Augmented
10
  results: []
 
15
 
16
  # Whisper Medium TW - Augmented
17
 
18
+ This model is a fine-tuned version of [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) on the common_voice_11_0 dataset.
19
  It achieves the following results on the evaluation set:
20
  - eval_loss: 0.0951
21
  - eval_wer: 7.4865
fine-tune-whisper-non-streaming-zh-TW.ipynb CHANGED
@@ -1241,7 +1241,7 @@
1241
  },
1242
  {
1243
  "cell_type": "code",
1244
- "execution_count": 27,
1245
  "id": "c704f91e-241b-48c9-b8e0-f0da396a9663",
1246
  "metadata": {
1247
  "id": "c704f91e-241b-48c9-b8e0-f0da396a9663"
@@ -1249,8 +1249,8 @@
1249
  "outputs": [],
1250
  "source": [
1251
  "kwargs = {\n",
1252
- " \"dataset_tags\": \"mozilla-foundation/common_voice_11_0\",\n",
1253
- " \"dataset\": \"mozilla-foundation/common_voice_11_0\", # a 'pretty' name for the training dataset\n",
1254
  "# \"language\": \"zh\",\n",
1255
  " \"model_name\": \"Whisper Medium TW - Augmented\", # a 'pretty' name for your model\n",
1256
  " \"finetuned_from\": \"openai/whisper-medium\",\n",
@@ -1271,7 +1271,7 @@
1271
  },
1272
  {
1273
  "cell_type": "code",
1274
- "execution_count": 28,
1275
  "id": "d7030622-caf7-4039-939b-6195cdaa2585",
1276
  "metadata": {
1277
  "id": "d7030622-caf7-4039-939b-6195cdaa2585"
@@ -1284,39 +1284,7 @@
1284
  "Saving model checkpoint to ./\n",
1285
  "Configuration saved in ./config.json\n",
1286
  "Model weights saved in ./pytorch_model.bin\n",
1287
- "Feature extractor saved in ./preprocessor_config.json\n",
1288
- "Several commits (2) will be pushed upstream.\n",
1289
- "The progress bars may be unreliable.\n",
1290
- "remote: ----------------------------------------------------------\u001b[0;31m \n",
1291
- "remote: Sorry, your push was rejected during YAML metadata verification: \n",
1292
- "remote: - Error: \"language[0]\" must only contain lowercase characters \n",
1293
- "remote: - Error: \"language[0]\" with value \"zh-TW\" is not valid. It must be an ISO 639-1, 639-2 or 639-3 code (two/three letters), or a special value like \"code\", \"multilingual\". If you want to use BCP-47 identifiers, you can specify them in language_bcp47.\u001b[0;32m \n",
1294
- "remote: ---------------------------------------------------------- \n",
1295
- "remote: Please find the documentation at: \n",
1296
- "remote: https://huggingface.co/docs/hub/model-cards#model-card-metadata\u001b[0;0m \n",
1297
- "remote: ---------------------------------------------------------- \n",
1298
- "To https://huggingface.co/Scrya/whisper-medium-zh-TW-augmented\n",
1299
- " ! [remote rejected] main -> main (pre-receive hook declined)\n",
1300
- "error: failed to push some refs to 'https://huggingface.co/Scrya/whisper-medium-zh-TW-augmented'\n",
1301
- "\n"
1302
- ]
1303
- },
1304
- {
1305
- "ename": "OSError",
1306
- "evalue": "remote: ----------------------------------------------------------\u001b[0;31m \nremote: Sorry, your push was rejected during YAML metadata verification: \nremote: - Error: \"language[0]\" must only contain lowercase characters \nremote: - Error: \"language[0]\" with value \"zh-TW\" is not valid. It must be an ISO 639-1, 639-2 or 639-3 code (two/three letters), or a special value like \"code\", \"multilingual\". If you want to use BCP-47 identifiers, you can specify them in language_bcp47.\u001b[0;32m \nremote: ---------------------------------------------------------- \nremote: Please find the documentation at: \nremote: https://huggingface.co/docs/hub/model-cards#model-card-metadata\u001b[0;0m \nremote: ---------------------------------------------------------- \nTo https://huggingface.co/Scrya/whisper-medium-zh-TW-augmented\n ! [remote rejected] main -> main (pre-receive hook declined)\nerror: failed to push some refs to 'https://huggingface.co/Scrya/whisper-medium-zh-TW-augmented'\n",
1307
- "output_type": "error",
1308
- "traceback": [
1309
- "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
1310
- "\u001b[0;31mCalledProcessError\u001b[0m Traceback (most recent call last)",
1311
- "File \u001b[0;32m~/whisper/lib/python3.8/site-packages/huggingface_hub/repository.py:1207\u001b[0m, in \u001b[0;36mRepository.git_push\u001b[0;34m(self, upstream, blocking, auto_lfs_prune)\u001b[0m\n\u001b[1;32m 1206\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m return_code:\n\u001b[0;32m-> 1207\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m subprocess\u001b[38;5;241m.\u001b[39mCalledProcessError(\n\u001b[1;32m 1208\u001b[0m return_code, process\u001b[38;5;241m.\u001b[39margs, output\u001b[38;5;241m=\u001b[39mstdout, stderr\u001b[38;5;241m=\u001b[39mstderr\n\u001b[1;32m 1209\u001b[0m )\n\u001b[1;32m 1211\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m subprocess\u001b[38;5;241m.\u001b[39mCalledProcessError \u001b[38;5;28;01mas\u001b[39;00m exc:\n",
1312
- "\u001b[0;31mCalledProcessError\u001b[0m: Command '['git', 'push', '--set-upstream', 'origin', 'main']' returned non-zero exit status 1.",
1313
- "\nDuring handling of the above exception, another exception occurred:\n",
1314
- "\u001b[0;31mOSError\u001b[0m Traceback (most recent call last)",
1315
- "Cell \u001b[0;32mIn[28], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mtrainer\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mpush_to_hub\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n",
1316
- "File \u001b[0;32m~/whisper/lib/python3.8/site-packages/transformers/trainer.py:3492\u001b[0m, in \u001b[0;36mTrainer.push_to_hub\u001b[0;34m(self, commit_message, blocking, **kwargs)\u001b[0m\n\u001b[1;32m 3489\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mpush_in_progress\u001b[38;5;241m.\u001b[39m_process\u001b[38;5;241m.\u001b[39mkill()\n\u001b[1;32m 3490\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mpush_in_progress \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[0;32m-> 3492\u001b[0m git_head_commit_url \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrepo\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mpush_to_hub\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 3493\u001b[0m \u001b[43m \u001b[49m\u001b[43mcommit_message\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcommit_message\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mblocking\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mblocking\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mauto_lfs_prune\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\n\u001b[1;32m 3494\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 3495\u001b[0m \u001b[38;5;66;03m# push separately the model card to be independant from the rest of the model\u001b[39;00m\n\u001b[1;32m 3496\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39margs\u001b[38;5;241m.\u001b[39mshould_save:\n",
1317
- "File \u001b[0;32m~/whisper/lib/python3.8/site-packages/huggingface_hub/repository.py:1432\u001b[0m, in \u001b[0;36mRepository.push_to_hub\u001b[0;34m(self, commit_message, blocking, clean_ok, auto_lfs_prune)\u001b[0m\n\u001b[1;32m 1430\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mgit_add(auto_lfs_track\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m)\n\u001b[1;32m 1431\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mgit_commit(commit_message)\n\u001b[0;32m-> 1432\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mgit_push\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 1433\u001b[0m \u001b[43m \u001b[49m\u001b[43mupstream\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43mf\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43morigin \u001b[39;49m\u001b[38;5;132;43;01m{\u001b[39;49;00m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mcurrent_branch\u001b[49m\u001b[38;5;132;43;01m}\u001b[39;49;00m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1434\u001b[0m \u001b[43m \u001b[49m\u001b[43mblocking\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mblocking\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1435\u001b[0m \u001b[43m \u001b[49m\u001b[43mauto_lfs_prune\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mauto_lfs_prune\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1436\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n",
1318
- "File \u001b[0;32m~/whisper/lib/python3.8/site-packages/huggingface_hub/repository.py:1212\u001b[0m, in \u001b[0;36mRepository.git_push\u001b[0;34m(self, upstream, blocking, auto_lfs_prune)\u001b[0m\n\u001b[1;32m 1207\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m subprocess\u001b[38;5;241m.\u001b[39mCalledProcessError(\n\u001b[1;32m 1208\u001b[0m return_code, process\u001b[38;5;241m.\u001b[39margs, output\u001b[38;5;241m=\u001b[39mstdout, stderr\u001b[38;5;241m=\u001b[39mstderr\n\u001b[1;32m 1209\u001b[0m )\n\u001b[1;32m 1211\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m subprocess\u001b[38;5;241m.\u001b[39mCalledProcessError \u001b[38;5;28;01mas\u001b[39;00m exc:\n\u001b[0;32m-> 1212\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mEnvironmentError\u001b[39;00m(exc\u001b[38;5;241m.\u001b[39mstderr)\n\u001b[1;32m 1214\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m blocking:\n\u001b[1;32m 1216\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mstatus_method\u001b[39m():\n",
1319
- "\u001b[0;31mOSError\u001b[0m: remote: ----------------------------------------------------------\u001b[0;31m \nremote: Sorry, your push was rejected during YAML metadata verification: \nremote: - Error: \"language[0]\" must only contain lowercase characters \nremote: - Error: \"language[0]\" with value \"zh-TW\" is not valid. It must be an ISO 639-1, 639-2 or 639-3 code (two/three letters), or a special value like \"code\", \"multilingual\". If you want to use BCP-47 identifiers, you can specify them in language_bcp47.\u001b[0;32m \nremote: ---------------------------------------------------------- \nremote: Please find the documentation at: \nremote: https://huggingface.co/docs/hub/model-cards#model-card-metadata\u001b[0;0m \nremote: ---------------------------------------------------------- \nTo https://huggingface.co/Scrya/whisper-medium-zh-TW-augmented\n ! [remote rejected] main -> main (pre-receive hook declined)\nerror: failed to push some refs to 'https://huggingface.co/Scrya/whisper-medium-zh-TW-augmented'\n"
1320
  ]
1321
  }
1322
  ],
 
1241
  },
1242
  {
1243
  "cell_type": "code",
1244
+ "execution_count": 31,
1245
  "id": "c704f91e-241b-48c9-b8e0-f0da396a9663",
1246
  "metadata": {
1247
  "id": "c704f91e-241b-48c9-b8e0-f0da396a9663"
 
1249
  "outputs": [],
1250
  "source": [
1251
  "kwargs = {\n",
1252
+ "# \"dataset_tags\": \"mozilla-foundation/common_voice_11_0\",\n",
1253
+ "# \"dataset\": \"mozilla-foundation/common_voice_11_0\", # a 'pretty' name for the training dataset\n",
1254
  "# \"language\": \"zh\",\n",
1255
  " \"model_name\": \"Whisper Medium TW - Augmented\", # a 'pretty' name for your model\n",
1256
  " \"finetuned_from\": \"openai/whisper-medium\",\n",
 
1271
  },
1272
  {
1273
  "cell_type": "code",
1274
+ "execution_count": null,
1275
  "id": "d7030622-caf7-4039-939b-6195cdaa2585",
1276
  "metadata": {
1277
  "id": "d7030622-caf7-4039-939b-6195cdaa2585"
 
1284
  "Saving model checkpoint to ./\n",
1285
  "Configuration saved in ./config.json\n",
1286
  "Model weights saved in ./pytorch_model.bin\n",
1287
+ "Feature extractor saved in ./preprocessor_config.json\n"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1288
  ]
1289
  }
1290
  ],