{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "v5hvo8QWN-a9" }, "source": [ "# Installing Whisper\n", "\n", "The commands below will install the Python packages needed to use Whisper models and evaluate the transcription results." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "ZsJUxc0aRsAf" }, "outputs": [], "source": [ "! pip install git+https://github.com/openai/whisper.git\n", "! pip install jiwer" ] }, { "cell_type": "markdown", "metadata": { "id": "1IMEkgyagYto" }, "source": [ "# Loading the LibriSpeech dataset\n", "\n", "The following will load the test-clean split of the LibriSpeech corpus using torchaudio." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "3CqtR2Fi5-vP" }, "outputs": [], "source": [ "import os\n", "import numpy as np\n", "\n", "try:\n", " import tensorflow # required in Colab to avoid protobuf compatibility issues\n", "except ImportError:\n", " pass\n", "\n", "import torch\n", "import pandas as pd\n", "import whisper\n", "import torchaudio\n", "\n", "from tqdm.notebook import tqdm\n", "\n", "\n", "DEVICE = \"cuda\" if torch.cuda.is_available() else \"cpu\"" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "id": "GuCCB2KYOJCE" }, "outputs": [], "source": [ "class LibriSpeech(torch.utils.data.Dataset):\n", " \"\"\"\n", " A simple class to wrap LibriSpeech and trim/pad the audio to 30 seconds.\n", " It will drop the last few seconds of a very small portion of the utterances.\n", " \"\"\"\n", " def __init__(self, split=\"test-clean\", device=DEVICE):\n", " self.dataset = torchaudio.datasets.LIBRISPEECH(\n", " root=os.path.expanduser(\"~/.cache\"),\n", " url=split,\n", " download=True,\n", " )\n", " self.device = device\n", "\n", " def __len__(self):\n", " return len(self.dataset)\n", "\n", " def __getitem__(self, item):\n", " audio, sample_rate, text, _, _, _ = self.dataset[item]\n", " assert sample_rate == 16000\n", " audio = whisper.pad_or_trim(audio.flatten()).to(self.device)\n", " mel = whisper.log_mel_spectrogram(audio)\n", " \n", " return (mel, text)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "id": "-YcRU5jqNqo2" }, "outputs": [], "source": [ "dataset = LibriSpeech(\"test-clean\")\n", "loader = torch.utils.data.DataLoader(dataset, batch_size=16)" ] }, { "cell_type": "markdown", "metadata": { "id": "0ljocCNuUAde" }, "source": [ "# Running inference on the dataset using a base Whisper model\n", "\n", "The following will take a few minutes to transcribe all utterances in the dataset." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "_PokfNJtOYNu", "outputId": "2c53ec44-bc93-4107-b4fa-214e3f71fe8e" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Model is English-only and has 71,825,408 parameters.\n" ] } ], "source": [ "model = whisper.load_model(\"base.en\")\n", "print(\n", " f\"Model is {'multilingual' if model.is_multilingual else 'English-only'} \"\n", " f\"and has {sum(np.prod(p.shape) for p in model.parameters()):,} parameters.\"\n", ")" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# predict without timestamps for short-form transcription\n", "options = whisper.DecodingOptions(language=\"en\", without_timestamps=True)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 49, "referenced_widgets": [ "09a29a91f58d4462942505a3cc415801", "83391f98a240490987c397048fc1a0d4", "06b9aa5f49fa44ba8c93b647dc7db224", "da9c231ee67047fb89073c95326b72a5", "48da931ebe7f4fd299f8c98c7d2460ff", "7a901f447c1d477bb49f954e0feacedd", "39f5a6ae8ba74c8598f9c6d5b8ad2d65", "a0d10a42c753453283e5219c22239337", "09f4cb79ff86465aaf48b0de24869af9", "1b9cecf5b3584fba8258a81d4279a25b", "039b53f2702c4179af7e0548018d0588" ] }, "id": "7OWTn_KvNk59", "outputId": "a813a792-3c91-4144-f11f-054fd6778023" }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "9df048b46f764cf68cbe0045b8ff73a8", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/164 [00:00, ?it/s]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "hypotheses = []\n", "references = []\n", "\n", "for mels, texts in tqdm(loader):\n", " results = model.decode(mels, options)\n", " hypotheses.extend([result.text for result in results])\n", " references.extend(texts)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 424 }, "id": "4nTyynELQ42j", "outputId": "1c72d25a-3e87-4c60-a8d1-1da9d2f73bd7" }, "outputs": [ { "data": { "text/html": [ "
\n", " | hypothesis | \n", "reference | \n", "
---|---|---|
0 | \n", "He hoped there would be stew for dinner, turni... | \n", "HE HOPED THERE WOULD BE STEW FOR DINNER TURNIP... | \n", "
1 | \n", "Stuffered into you, his belly counseled him. | \n", "STUFF IT INTO YOU HIS BELLY COUNSELLED HIM | \n", "
2 | \n", "After early nightfall the yellow lamps would l... | \n", "AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD L... | \n", "
3 | \n", "Hello Bertie, any good in your mind? | \n", "HELLO BERTIE ANY GOOD IN YOUR MIND | \n", "
4 | \n", "Number 10. Fresh Nelly is waiting on you. Good... | \n", "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD ... | \n", "
... | \n", "... | \n", "... | \n", "
2615 | \n", "Oh, to shoot my soul's full meaning into futur... | \n", "OH TO SHOOT MY SOUL'S FULL MEANING INTO FUTURE... | \n", "
2616 | \n", "Then I, long tried by natural ills, received t... | \n", "THEN I LONG TRIED BY NATURAL ILLS RECEIVED THE... | \n", "
2617 | \n", "I love thee freely as men strive for right. I ... | \n", "I LOVE THEE FREELY AS MEN STRIVE FOR RIGHT I L... | \n", "
2618 | \n", "I love thee with the passion put to use, in my... | \n", "I LOVE THEE WITH THE PASSION PUT TO USE IN MY ... | \n", "
2619 | \n", "I love thee with the love I seemed to lose wit... | \n", "I LOVE THEE WITH A LOVE I SEEMED TO LOSE WITH ... | \n", "
2620 rows × 2 columns
\n", "\n", " | hypothesis | \n", "reference | \n", "hypothesis_clean | \n", "reference_clean | \n", "
---|---|---|---|---|
0 | \n", "He hoped there would be stew for dinner, turni... | \n", "HE HOPED THERE WOULD BE STEW FOR DINNER TURNIP... | \n", "he hoped there would be stew for dinner turnip... | \n", "he hoped there would be stew for dinner turnip... | \n", "
1 | \n", "Stuffered into you, his belly counseled him. | \n", "STUFF IT INTO YOU HIS BELLY COUNSELLED HIM | \n", "stuffered into you his belly counseled him | \n", "stuff it into you his belly counseled him | \n", "
2 | \n", "After early nightfall the yellow lamps would l... | \n", "AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD L... | \n", "after early nightfall the yellow lamps would l... | \n", "after early nightfall the yellow lamps would l... | \n", "
3 | \n", "Hello Bertie, any good in your mind? | \n", "HELLO BERTIE ANY GOOD IN YOUR MIND | \n", "hello bertie any good in your mind | \n", "hello bertie any good in your mind | \n", "
4 | \n", "Number 10. Fresh Nelly is waiting on you. Good... | \n", "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD ... | \n", "number 10 fresh nelly is waiting on you good n... | \n", "number 10 fresh nelly is waiting on you good n... | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
2615 | \n", "Oh, to shoot my soul's full meaning into futur... | \n", "OH TO SHOOT MY SOUL'S FULL MEANING INTO FUTURE... | \n", "0 to shoot my soul is full meaning into future... | \n", "0 to shoot my soul is full meaning into future... | \n", "
2616 | \n", "Then I, long tried by natural ills, received t... | \n", "THEN I LONG TRIED BY NATURAL ILLS RECEIVED THE... | \n", "then i long tried by natural ills received the... | \n", "then i long tried by natural ills received the... | \n", "
2617 | \n", "I love thee freely as men strive for right. I ... | \n", "I LOVE THEE FREELY AS MEN STRIVE FOR RIGHT I L... | \n", "i love thee freely as men strive for right i l... | \n", "i love thee freely as men strive for right i l... | \n", "
2618 | \n", "I love thee with the passion put to use, in my... | \n", "I LOVE THEE WITH THE PASSION PUT TO USE IN MY ... | \n", "i love thee with the passion put to use in my ... | \n", "i love thee with the passion put to use in my ... | \n", "
2619 | \n", "I love thee with the love I seemed to lose wit... | \n", "I LOVE THEE WITH A LOVE I SEEMED TO LOSE WITH ... | \n", "i love thee with the love i seemed to lose wit... | \n", "i love thee with a love i seemed to lose with ... | \n", "
2620 rows × 4 columns
\n", "