{ "cells": [ { "cell_type": "markdown", "id": "75b58048-7d14-4fc6-8085-1fc08c81b4a6", "metadata": { "id": "75b58048-7d14-4fc6-8085-1fc08c81b4a6" }, "source": [ "# Fine-Tune Whisper With đ¤ Transformers and Streaming Mode" ] }, { "cell_type": "markdown", "id": "fbfa8ad5-4cdc-4512-9058-836cbbf65e1a", "metadata": { "id": "fbfa8ad5-4cdc-4512-9058-836cbbf65e1a" }, "source": [ "In this Colab, we present a step-by-step guide on fine-tuning Whisper with Hugging Face đ¤ Transformers on 400 hours of speech data! Using streaming mode, we'll show how you can train a speech recongition model on any dataset, irrespective of size. With streaming mode, storage requirements are no longer a consideration: you can train a model on whatever dataset you want, even if it's download size exceeds your devices disk space. How can this be possible? It simply seems too good to be true! Well, rest assured it's not đ Carry on reading to find out more." ] }, { "cell_type": "markdown", "id": "afe0d503-ae4e-4aa7-9af4-dbcba52db41e", "metadata": { "id": "afe0d503-ae4e-4aa7-9af4-dbcba52db41e" }, "source": [ "## Introduction" ] }, { "cell_type": "markdown", "id": "9ae91ed4-9c3e-4ade-938e-f4c2dcfbfdc0", "metadata": { "id": "9ae91ed4-9c3e-4ade-938e-f4c2dcfbfdc0" }, "source": [ "Speech recognition datasets are large. A typical speech dataset consists of approximately 100 hours of audio-transcription data, requiring upwards of 130GB of storage space for download and preparation. For most ASR researchers, this is already at the upper limit of what is feasible for disk space. So what happens when we want to train on a larger dataset? The full [LibriSpeech](https://huggingface.co/datasets/librispeech_asr) dataset consists of 960 hours of audio data. Kensho's [SPGISpeech](https://huggingface.co/datasets/kensho/spgispeech) contains 5,000 hours of audio data. ML Commons [People's Speech](https://huggingface.co/datasets/MLCommons/peoples_speech) contains **30,000+** hours of audio data! Do we need to bite the bullet and buy additional storage? Or is there a way we can train on all of these datasets with no disk drive requirements?\n", "\n", "When training machine learning systems, we rarely use the entire dataset at once. We typically _batch_ our data into smaller subsets of data, and pass these incrementally through our training pipeline. This is because we train our system on an accelerator device, such as a GPU or TPU, which has a memory limit typically around 16GB. We have to fit our model, optimiser and training data all on the same accelerator device, so we usually have to divide the dataset up into smaller batches and move them from the CPU to the GPU when required.\n", "\n", "Consequently, we don't require the entire dataset to be downloaded at once; we simply need the batch of data that we pass to our model at any one go. We can leverage this principle of partial dataset loading when preparing our dataset: rather than downloading the entire dataset at the start, we can load each piece of data as and when we need it. For each batch, we load the relevant data from a remote server and pass it through the training pipeline. For the next batch, we load the next items and again pass them through the training pipeline. At no point do we have to save data to our disk drive, we simply load them in memory and use them in our pipeline. In doing so, we only ever need as much memory as each individual batch requires.\n", "\n", "This is analogous to downloading a TV show versus streaming it đş When we download a TV show, we download the entire video offline and save it to our disk. Compare this to when we stream a TV show. Here, we don't download any part of the video to memory, but iterate over the video file and load each part in real-time as required. It's this same principle that we can apply to our ML training pipeline! We want to iterate over the dataset and load each sample of data as required.\n", "\n", "While the principle of partial dataset loading sounds ideal, it also seems **pretty** difficult to do. Luckily for us, đ¤ Datasets allows us to do this with minimal code changes! We'll make use of the principle of [_streaming_](https://huggingface.co/docs/datasets/stream), depicted graphically in Figure 1. Streaming does exactly this: the data is loaded progressively as we iterate over the dataset, meaning it is only loaded as and when we need it. If you're familiar with đ¤ Transformers and Datasets, the content of this notebook will be very familiar, with some small extensions to support streaming mode." ] }, { "cell_type": "markdown", "id": "1c87f76e-47be-4a5d-bc52-7b1c2e9d4f5a", "metadata": { "id": "1c87f76e-47be-4a5d-bc52-7b1c2e9d4f5a" }, "source": [ "" ] }, { "cell_type": "markdown", "id": "d44b85a2-3465-4cd5-bcca-8ddb302ab71b", "metadata": { "id": "d44b85a2-3465-4cd5-bcca-8ddb302ab71b", "tags": [] }, "source": [ "## Prepare Environment" ] }, { "cell_type": "code", "execution_count": 1, "id": "a0e8a3b5-2c0b-4ee6-98cc-21a571266a5d", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "a0e8a3b5-2c0b-4ee6-98cc-21a571266a5d", "outputId": "09b1863a-eb05-4610-b763-2a7b69cd77bf" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Reading package lists... Done\n", "Building dependency tree \n", "Reading state information... Done\n", "git-lfs is already the newest version (2.9.2-1).\n", "0 upgraded, 0 newly installed, 0 to remove and 155 not upgraded.\n", "Updated git hooks.\n", "Git LFS initialized.\n" ] } ], "source": [ "!sudo apt-get install git-lfs\n", "!sudo git lfs install\n" ] }, { "cell_type": "code", "execution_count": 2, "id": "QJBETye7FkvV", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "QJBETye7FkvV", "outputId": "e055cc0a-0a62-4a14-f360-2a64782a5a35" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Defaulting to user installation because normal site-packages is not writeable\n", "\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0mRequirement already satisfied: pip in ./.local/lib/python3.8/site-packages (22.3.1)\n", "\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0mDefaulting to user installation because normal site-packages is not writeable\n", "\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0mCollecting torch\n", " Using cached torch-1.13.0-cp38-cp38-manylinux1_x86_64.whl (890.2 MB)\n", "Collecting torchaudio\n", " Using cached torchaudio-0.13.0-cp38-cp38-manylinux1_x86_64.whl (4.2 MB)\n", "Collecting torchvision\n", " Using cached torchvision-0.14.0-cp38-cp38-manylinux1_x86_64.whl (24.3 MB)\n", "Collecting nvidia-cudnn-cu11==8.5.0.96\n", " Using cached nvidia_cudnn_cu11-8.5.0.96-2-py3-none-manylinux1_x86_64.whl (557.1 MB)\n", "Collecting nvidia-cublas-cu11==11.10.3.66\n", " Using cached nvidia_cublas_cu11-11.10.3.66-py3-none-manylinux1_x86_64.whl (317.1 MB)\n", "Collecting typing-extensions\n", " Using cached typing_extensions-4.4.0-py3-none-any.whl (26 kB)\n", "Collecting nvidia-cuda-runtime-cu11==11.7.99\n", " Using cached nvidia_cuda_runtime_cu11-11.7.99-py3-none-manylinux1_x86_64.whl (849 kB)\n", "Collecting nvidia-cuda-nvrtc-cu11==11.7.99\n", " Using cached nvidia_cuda_nvrtc_cu11-11.7.99-2-py3-none-manylinux1_x86_64.whl (21.0 MB)\n", "Collecting wheel\n", " Using cached wheel-0.38.4-py3-none-any.whl (36 kB)\n", "Collecting setuptools\n", " Using cached setuptools-65.6.3-py3-none-any.whl (1.2 MB)\n", "Collecting numpy\n", " Using cached numpy-1.24.0rc2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)\n", "Collecting pillow!=8.3.*,>=5.3.0\n", " Using cached Pillow-9.3.0-cp38-cp38-manylinux_2_28_x86_64.whl (3.3 MB)\n", "Collecting requests\n", " Using cached requests-2.28.1-py3-none-any.whl (62 kB)\n", "Collecting idna<4,>=2.5\n", " Using cached idna-3.4-py3-none-any.whl (61 kB)\n", "Collecting urllib3<1.27,>=1.21.1\n", " Using cached urllib3-1.26.13-py2.py3-none-any.whl (140 kB)\n", "Collecting charset-normalizer<3,>=2\n", " Using cached charset_normalizer-2.1.1-py3-none-any.whl (39 kB)\n", "Collecting certifi>=2017.4.17\n", " Using cached certifi-2022.12.7-py3-none-any.whl (155 kB)\n", "\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0mInstalling collected packages: wheel, urllib3, typing-extensions, setuptools, pillow, nvidia-cuda-nvrtc-cu11, numpy, idna, charset-normalizer, certifi, requests, nvidia-cuda-runtime-cu11, nvidia-cublas-cu11, nvidia-cudnn-cu11, torch, torchvision, torchaudio\n", " Attempting uninstall: wheel\n", "\u001b[33m WARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m Found existing installation: wheel 0.38.4\n", " Uninstalling wheel-0.38.4:\n", " Successfully uninstalled wheel-0.38.4\n", " Attempting uninstall: urllib3\n", "\u001b[33m WARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m Found existing installation: urllib3 1.26.13\n", " Uninstalling urllib3-1.26.13:\n", " Successfully uninstalled urllib3-1.26.13\n", " Attempting uninstall: typing-extensions\n", "\u001b[33m WARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m Found existing installation: typing_extensions 4.4.0\n", " Uninstalling typing_extensions-4.4.0:\n", " Successfully uninstalled typing_extensions-4.4.0\n", " Attempting uninstall: setuptools\n", "\u001b[33m WARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m Found existing installation: setuptools 65.6.3\n", " Uninstalling setuptools-65.6.3:\n", " Successfully uninstalled setuptools-65.6.3\n", " Attempting uninstall: pillow\n", "\u001b[33m WARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m Found existing installation: Pillow 9.3.0\n", " Uninstalling Pillow-9.3.0:\n", " Successfully uninstalled Pillow-9.3.0\n", " Attempting uninstall: nvidia-cuda-nvrtc-cu11\n", "\u001b[33m WARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m Found existing installation: nvidia-cuda-nvrtc-cu11 11.7.99\n", " Uninstalling nvidia-cuda-nvrtc-cu11-11.7.99:\n", " Successfully uninstalled nvidia-cuda-nvrtc-cu11-11.7.99\n", " Attempting uninstall: numpy\n", "\u001b[33m WARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m Found existing installation: numpy 1.23.5\n", " Uninstalling numpy-1.23.5:\n", " Successfully uninstalled numpy-1.23.5\n", " Attempting uninstall: idna\n", "\u001b[33m WARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m Found existing installation: idna 3.4\n", " Uninstalling idna-3.4:\n", " Successfully uninstalled idna-3.4\n", " Attempting uninstall: charset-normalizer\n", "\u001b[33m WARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m Found existing installation: charset-normalizer 2.1.1\n", " Uninstalling charset-normalizer-2.1.1:\n", " Successfully uninstalled charset-normalizer-2.1.1\n", " Attempting uninstall: certifi\n", "\u001b[33m WARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m Found existing installation: certifi 2022.12.7\n", " Uninstalling certifi-2022.12.7:\n", " Successfully uninstalled certifi-2022.12.7\n", " Attempting uninstall: requests\n", "\u001b[33m WARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m Found existing installation: requests 2.28.1\n", " Uninstalling requests-2.28.1:\n", " Successfully uninstalled requests-2.28.1\n", " Attempting uninstall: nvidia-cuda-runtime-cu11\n", "\u001b[33m WARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m Found existing installation: nvidia-cuda-runtime-cu11 11.7.99\n", " Uninstalling nvidia-cuda-runtime-cu11-11.7.99:\n", " Successfully uninstalled nvidia-cuda-runtime-cu11-11.7.99\n", " Attempting uninstall: nvidia-cublas-cu11\n", "\u001b[33m WARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m Found existing installation: nvidia-cublas-cu11 11.10.3.66\n", " Uninstalling nvidia-cublas-cu11-11.10.3.66:\n", " Successfully uninstalled nvidia-cublas-cu11-11.10.3.66\n", " Attempting uninstall: nvidia-cudnn-cu11\n", "\u001b[33m WARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m Found existing installation: nvidia-cudnn-cu11 8.5.0.96\n", " Uninstalling nvidia-cudnn-cu11-8.5.0.96:\n", " Successfully uninstalled nvidia-cudnn-cu11-8.5.0.96\n", " Attempting uninstall: torch\n", "\u001b[33m WARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m Found existing installation: torch 1.13.0\n", " Uninstalling torch-1.13.0:\n", " Successfully uninstalled torch-1.13.0\n", " Attempting uninstall: torchvision\n", "\u001b[33m WARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m Found existing installation: torchvision 0.14.0\n", " Uninstalling torchvision-0.14.0:\n", " Successfully uninstalled torchvision-0.14.0\n", " Attempting uninstall: torchaudio\n", "\u001b[33m WARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m Found existing installation: torchaudio 0.13.0\n", " Uninstalling torchaudio-0.13.0:\n", " Successfully uninstalled torchaudio-0.13.0\n", "\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", "launchpadlib 1.10.13 requires testresources, which is not installed.\n", "pandas-profiling 3.4.0 requires numpy<1.24,>=1.16.0, but you have numpy 1.24.0rc2 which is incompatible.\n", "numba 0.56.4 requires numpy<1.24,>=1.18, but you have numpy 1.24.0rc2 which is incompatible.\u001b[0m\u001b[31m\n", "\u001b[0mSuccessfully installed certifi-2022.12.7 charset-normalizer-2.1.1 idna-3.4 numpy-1.24.0rc2 nvidia-cublas-cu11-11.10.3.66 nvidia-cuda-nvrtc-cu11-11.7.99 nvidia-cuda-runtime-cu11-11.7.99 nvidia-cudnn-cu11-8.5.0.96 pillow-9.3.0 requests-2.28.1 setuptools-65.6.3 torch-1.13.0 torchaudio-0.13.0 torchvision-0.14.0 typing-extensions-4.4.0 urllib3-1.26.13 wheel-0.38.4\n", "\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m" ] } ], "source": [ "!pip3 install --upgrade pip\n", "!pip3 install --upgrade numpy>=1.18\n", "!pip3 install --upgrade packaging>=20.9\n", "!pip3 install --upgrade typing-extensions>=3.7.4.3\n", "\n", "!pip3 install --pre torch torchaudio torchvision --force-reinstall\n", "\n", "#!pip3 install bitsandbytes\n", "\n", "\n", "#!pip3 install --pre torch torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cu116\n", "#!pip3 install numpy --pre torch[dynamo] torchvision torchaudio --force-reinstall --extra-index-url https://download.pytorch.org/whl/nightly/cu116\n", "#!pip3 install numpy --pre torch[dynamo] torchaudio --force-reinstall --extra-index-url https://download.pytorch.org/whl/nightly/cu116\n", "\n", "#!pip3 install numpy --pre torch[dynamo] torchaudio --upgrade --extra-index-url https://download.pytorch.org/whl/nightly/cu117\n", "\n" ] }, { "cell_type": "markdown", "id": "a47bbac5-b44b-41ac-a948-1b57cec2b6f1", "metadata": { "id": "a47bbac5-b44b-41ac-a948-1b57cec2b6f1" }, "source": [ "First of all, let's try to secure a decent GPU for our Colab! Unfortunately, it's becoming much harder to get access to a good GPU with the free version of Google Colab. However, with Google Colab Pro / Pro+ one should have no issues in being allocated a V100 or P100 GPU.\n", "\n", "To get a GPU, click _Runtime_ -> _Change runtime type_, then change _Hardware accelerator_ from _None_ to _GPU_." ] }, { "cell_type": "markdown", "id": "47686bd5-cbb1-4352-81cf-0fcf7bbd45c3", "metadata": { "id": "47686bd5-cbb1-4352-81cf-0fcf7bbd45c3" }, "source": [ "We can verify that we've been assigned a GPU and view its specifications:" ] }, { "cell_type": "code", "execution_count": 2, "id": "d74b38c5-a1fb-4214-b4f4-b5bf0869f169", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "d74b38c5-a1fb-4214-b4f4-b5bf0869f169", "outputId": "18ca6853-0836-4cba-f06a-02fe1cecd715", "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Mon Dec 12 18:05:43 2022 \n", "+-----------------------------------------------------------------------------+\n", "| NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 |\n", "|-------------------------------+----------------------+----------------------+\n", "| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\n", "| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |\n", "| | | MIG M. |\n", "|===============================+======================+======================|\n", "| 0 NVIDIA A100-SXM... On | 00000000:06:00.0 Off | 0 |\n", "| N/A 74C P0 350W / 400W | 36965MiB / 40960MiB | 100% Default |\n", "| | | Disabled |\n", "+-------------------------------+----------------------+----------------------+\n", " \n", "+-----------------------------------------------------------------------------+\n", "| Processes: |\n", "| GPU GI CI PID Type Process name GPU Memory |\n", "| ID ID Usage |\n", "|=============================================================================|\n", "| 0 N/A N/A 3651714 C python 36963MiB |\n", "+-----------------------------------------------------------------------------+\n" ] } ], "source": [ "gpu_info = !nvidia-smi\n", "gpu_info = '\\n'.join(gpu_info)\n", "if gpu_info.find('failed') >= 0:\n", " print('Not connected to a GPU')\n", "else:\n", " print(gpu_info)" ] }, { "cell_type": "markdown", "id": "be67f92a-2f3b-4941-a1c0-5ed2de6e0a6a", "metadata": { "id": "be67f92a-2f3b-4941-a1c0-5ed2de6e0a6a", "tags": [] }, "source": [ "Next, we need to update the Unix package `ffmpeg` to version 4:" ] }, { "cell_type": "code", "execution_count": 4, "id": "15493a84-8b7c-4b35-9aeb-2b0a57a4e937", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "15493a84-8b7c-4b35-9aeb-2b0a57a4e937", "outputId": "9463f72d-a888-4980-abc4-2b6a2ece61b2", "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Get:1 https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64 InRelease [1484 B]\n", "Hit:2 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 InRelease\n", "Hit:3 http://security.ubuntu.com/ubuntu focal-security InRelease \n", "Hit:4 https://download.docker.com/linux/ubuntu focal InRelease \n", "Hit:5 https://packages.cloud.google.com/apt cloud-sdk InRelease \n", "Ign:6 http://ppa.launchpad.net/jonathonf/ffmpeg-4/ubuntu focal InRelease \n", "Hit:7 https://packages.microsoft.com/repos/azure-cli focal InRelease \n", "Hit:8 http://archive.ubuntu.com/ubuntu focal InRelease \n", "Hit:9 http://archive.lambdalabs.com/ubuntu focal InRelease \n", "Hit:10 https://pkg.cloudflare.com/cloudflared focal InRelease \n", "Err:11 http://ppa.launchpad.net/jonathonf/ffmpeg-4/ubuntu focal Release\n", " 404 Not Found [IP: 185.125.190.52 80]\n", "Hit:12 http://archive.ubuntu.com/ubuntu focal-updates InRelease\n", "Hit:13 http://archive.ubuntu.com/ubuntu focal-backports InRelease\n", "Hit:14 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu focal InRelease\n", "Reading package lists... Done \n", "E: The repository 'http://ppa.launchpad.net/jonathonf/ffmpeg-4/ubuntu focal Release' does not have a Release file.\n", "N: Updating from such a repository can't be done securely, and is therefore disabled by default.\n", "N: See apt-secure(8) manpage for repository creation and user configuration details.\n", "Hit:1 https://download.docker.com/linux/ubuntu focal InRelease\n", "Get:2 https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64 InRelease [1484 B]\n", "Hit:3 http://security.ubuntu.com/ubuntu focal-security InRelease \u001b[0m\n", "Hit:4 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 InRelease\n", "Hit:5 https://packages.cloud.google.com/apt cloud-sdk InRelease \u001b[0m\n", "Ign:6 http://ppa.launchpad.net/jonathonf/ffmpeg-4/ubuntu focal InRelease \u001b[0m\u001b[33m\u001b[33m\n", "Hit:7 http://archive.ubuntu.com/ubuntu focal InRelease \u001b[0m\n", "Hit:8 https://packages.microsoft.com/repos/azure-cli focal InRelease \u001b[0m\u001b[33m\n", "Hit:9 http://archive.lambdalabs.com/ubuntu focal InRelease \u001b[0m\n", "Hit:10 https://pkg.cloudflare.com/cloudflared focal InRelease \u001b[0m\u001b[33m\n", "Hit:11 http://archive.ubuntu.com/ubuntu focal-updates InRelease \u001b[0m\n", "Hit:12 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu focal InRelease \n", "Err:13 http://ppa.launchpad.net/jonathonf/ffmpeg-4/ubuntu focal Release\n", " 404 Not Found [IP: 185.125.190.52 80]\n", "Hit:14 http://archive.ubuntu.com/ubuntu focal-backports InRelease\n", "Reading package lists... Done\u001b[33m\u001b[33m\u001b[33m\u001b[33m\n", "\u001b[1;31mE: \u001b[0mThe repository 'http://ppa.launchpad.net/jonathonf/ffmpeg-4/ubuntu focal Release' does not have a Release file.\u001b[0m\n", "\u001b[33mN: \u001b[0mUpdating from such a repository can't be done securely, and is therefore disabled by default.\u001b[0m\n", "\u001b[33mN: \u001b[0mSee apt-secure(8) manpage for repository creation and user configuration details.\u001b[0m\n", "Reading package lists... Done\n", "Building dependency tree \n", "Reading state information... Done\n", "ffmpeg is already the newest version (7:4.2.7-0ubuntu0.1).\n", "0 upgraded, 0 newly installed, 0 to remove and 154 not upgraded.\n" ] } ], "source": [ "!sudo add-apt-repository -y ppa:jonathonf/ffmpeg-4\n", "!sudo apt update\n", "!sudo apt install -y ffmpeg" ] }, { "cell_type": "markdown", "id": "ab471347-a547-4d14-9d11-f151dc9547a7", "metadata": { "id": "ab471347-a547-4d14-9d11-f151dc9547a7" }, "source": [ "We'll employ several popular Python packages to fine-tune the Whisper model.\n", "We'll use `datasets` to download and prepare our training data and \n", "`transformers` to load and train our Whisper model. We'll also require\n", "the `soundfile` package to pre-process audio files, `evaluate` and `jiwer` to\n", "assess the performance of our model. Finally, we'll\n", "use `gradio` to build a flashy demo of our fine-tuned model." ] }, { "cell_type": "code", "execution_count": 5, "id": "4e106846-3620-46aa-989d-5e35e27c8057", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "4e106846-3620-46aa-989d-5e35e27c8057", "outputId": "6bcef5d6-c7de-45de-abd4-ab5883dfaab4" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Defaulting to user installation because normal site-packages is not writeable\n", "\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0mCollecting git+https://github.com/huggingface/datasets\n", " Cloning https://github.com/huggingface/datasets to /tmp/pip-req-build-2byfy50h\n", " Running command git clone --filter=blob:none --quiet https://github.com/huggingface/datasets /tmp/pip-req-build-2byfy50h\n", " Resolved https://github.com/huggingface/datasets to commit 5266c81430628edc175013692f02f5f2747ff29e\n", " Installing build dependencies ... \u001b[?25ldone\n", "\u001b[?25h Getting requirements to build wheel ... \u001b[?25ldone\n", "\u001b[?25h Preparing metadata (pyproject.toml) ... \u001b[?25ldone\n", "\u001b[?25hRequirement already satisfied: xxhash in /usr/local/lib/python3.8/dist-packages (from datasets==2.7.1.dev0) (3.1.0)\n", "Requirement already satisfied: huggingface-hub<1.0.0,>=0.2.0 in /usr/local/lib/python3.8/dist-packages (from datasets==2.7.1.dev0) (0.11.1)\n", "Requirement already satisfied: multiprocess in /usr/local/lib/python3.8/dist-packages (from datasets==2.7.1.dev0) (0.70.14)\n", "Requirement already satisfied: fsspec[http]>=2021.11.1 in /usr/local/lib/python3.8/dist-packages (from datasets==2.7.1.dev0) (2022.11.0)\n", "Requirement already satisfied: packaging in ./.local/lib/python3.8/site-packages (from datasets==2.7.1.dev0) (22.0)\n", "Requirement already satisfied: responses<0.19 in /usr/local/lib/python3.8/dist-packages (from datasets==2.7.1.dev0) (0.18.0)\n", "Requirement already satisfied: tqdm>=4.62.1 in ./.local/lib/python3.8/site-packages (from datasets==2.7.1.dev0) (4.64.1)\n", "Requirement already satisfied: aiohttp in /usr/local/lib/python3.8/dist-packages (from datasets==2.7.1.dev0) (3.8.3)\n", "Requirement already satisfied: numpy>=1.17 in ./.local/lib/python3.8/site-packages (from datasets==2.7.1.dev0) (1.24.0rc2)\n", "Requirement already satisfied: pandas in ./.local/lib/python3.8/site-packages (from datasets==2.7.1.dev0) (1.5.1)\n", "Requirement already satisfied: requests>=2.19.0 in ./.local/lib/python3.8/site-packages (from datasets==2.7.1.dev0) (2.28.1)\n", "Requirement already satisfied: pyyaml>=5.1 in /usr/lib/python3/dist-packages (from datasets==2.7.1.dev0) (5.3.1)\n", "Requirement already satisfied: dill<0.3.7 in /usr/local/lib/python3.8/dist-packages (from datasets==2.7.1.dev0) (0.3.6)\n", "Requirement already satisfied: pyarrow>=6.0.0 in /usr/local/lib/python3.8/dist-packages (from datasets==2.7.1.dev0) (10.0.1)\n", "Requirement already satisfied: attrs>=17.3.0 in /usr/lib/python3/dist-packages (from aiohttp->datasets==2.7.1.dev0) (19.3.0)\n", "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets==2.7.1.dev0) (1.3.1)\n", "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets==2.7.1.dev0) (1.3.3)\n", "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets==2.7.1.dev0) (6.0.3)\n", "Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets==2.7.1.dev0) (4.0.2)\n", "Requirement already satisfied: charset-normalizer<3.0,>=2.0 in ./.local/lib/python3.8/site-packages (from aiohttp->datasets==2.7.1.dev0) (2.1.1)\n", "Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets==2.7.1.dev0) (1.8.2)\n", "Requirement already satisfied: typing-extensions>=3.7.4.3 in ./.local/lib/python3.8/site-packages (from huggingface-hub<1.0.0,>=0.2.0->datasets==2.7.1.dev0) (4.4.0)\n", "Requirement already satisfied: filelock in /usr/lib/python3/dist-packages (from huggingface-hub<1.0.0,>=0.2.0->datasets==2.7.1.dev0) (3.0.12)\n", "Requirement already satisfied: certifi>=2017.4.17 in ./.local/lib/python3.8/site-packages (from requests>=2.19.0->datasets==2.7.1.dev0) (2022.12.7)\n", "Requirement already satisfied: idna<4,>=2.5 in ./.local/lib/python3.8/site-packages (from requests>=2.19.0->datasets==2.7.1.dev0) (3.4)\n", "Requirement already satisfied: urllib3<1.27,>=1.21.1 in ./.local/lib/python3.8/site-packages (from requests>=2.19.0->datasets==2.7.1.dev0) (1.26.13)\n", "Requirement already satisfied: pytz>=2020.1 in ./.local/lib/python3.8/site-packages (from pandas->datasets==2.7.1.dev0) (2022.5)\n", "Requirement already satisfied: python-dateutil>=2.8.1 in ./.local/lib/python3.8/site-packages (from pandas->datasets==2.7.1.dev0) (2.8.2)\n", "Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.8.1->pandas->datasets==2.7.1.dev0) (1.14.0)\n", "\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0mDefaulting to user installation because normal site-packages is not writeable\n", "\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0mCollecting git+https://github.com/huggingface/transformers\n", " Cloning https://github.com/huggingface/transformers to /tmp/pip-req-build-rez3g9mu\n", " Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers /tmp/pip-req-build-rez3g9mu\n", " Resolved https://github.com/huggingface/transformers to commit 799cea64ac1029d66e9e58f18bc6f47892270723\n", " Installing build dependencies ... \u001b[?25ldone\n", "\u001b[?25h Getting requirements to build wheel ... \u001b[?25ldone\n", "\u001b[?25h Preparing metadata (pyproject.toml) ... \u001b[?25ldone\n", "\u001b[?25hRequirement already satisfied: huggingface-hub<1.0,>=0.10.0 in /usr/local/lib/python3.8/dist-packages (from transformers==4.26.0.dev0) (0.11.1)\n", "Requirement already satisfied: filelock in /usr/lib/python3/dist-packages (from transformers==4.26.0.dev0) (3.0.12)\n", "Requirement already satisfied: numpy>=1.17 in ./.local/lib/python3.8/site-packages (from transformers==4.26.0.dev0) (1.24.0rc2)\n", "Requirement already satisfied: requests in ./.local/lib/python3.8/site-packages (from transformers==4.26.0.dev0) (2.28.1)\n", "Requirement already satisfied: tqdm>=4.27 in ./.local/lib/python3.8/site-packages (from transformers==4.26.0.dev0) (4.64.1)\n", "Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.8/dist-packages (from transformers==4.26.0.dev0) (2022.10.31)\n", "Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in /usr/local/lib/python3.8/dist-packages (from transformers==4.26.0.dev0) (0.13.2)\n", "Requirement already satisfied: pyyaml>=5.1 in /usr/lib/python3/dist-packages (from transformers==4.26.0.dev0) (5.3.1)\n", "Requirement already satisfied: packaging>=20.0 in ./.local/lib/python3.8/site-packages (from transformers==4.26.0.dev0) (22.0)\n", "Requirement already satisfied: typing-extensions>=3.7.4.3 in ./.local/lib/python3.8/site-packages (from huggingface-hub<1.0,>=0.10.0->transformers==4.26.0.dev0) (4.4.0)\n", "Requirement already satisfied: charset-normalizer<3,>=2 in ./.local/lib/python3.8/site-packages (from requests->transformers==4.26.0.dev0) (2.1.1)\n", "Requirement already satisfied: urllib3<1.27,>=1.21.1 in ./.local/lib/python3.8/site-packages (from requests->transformers==4.26.0.dev0) (1.26.13)\n", "Requirement already satisfied: idna<4,>=2.5 in ./.local/lib/python3.8/site-packages (from requests->transformers==4.26.0.dev0) (3.4)\n", "Requirement already satisfied: certifi>=2017.4.17 in ./.local/lib/python3.8/site-packages (from requests->transformers==4.26.0.dev0) (2022.12.7)\n", "\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0mDefaulting to user installation because normal site-packages is not writeable\n", "\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0mRequirement already satisfied: librosa in /usr/local/lib/python3.8/dist-packages (0.9.2)\n", "Requirement already satisfied: packaging>=20.0 in ./.local/lib/python3.8/site-packages (from librosa) (22.0)\n", "Requirement already satisfied: soundfile>=0.10.2 in /usr/local/lib/python3.8/dist-packages (from librosa) (0.11.0)\n", "Requirement already satisfied: audioread>=2.1.9 in /usr/local/lib/python3.8/dist-packages (from librosa) (3.0.0)\n", "Requirement already satisfied: resampy>=0.2.2 in /usr/local/lib/python3.8/dist-packages (from librosa) (0.4.2)\n", "Requirement already satisfied: decorator>=4.0.10 in /usr/lib/python3/dist-packages (from librosa) (4.4.2)\n", "Requirement already satisfied: pooch>=1.0 in /usr/local/lib/python3.8/dist-packages (from librosa) (1.6.0)\n", "Requirement already satisfied: numba>=0.45.1 in /usr/local/lib/python3.8/dist-packages (from librosa) (0.56.4)\n", "Requirement already satisfied: numpy>=1.17.0 in ./.local/lib/python3.8/site-packages (from librosa) (1.24.0rc2)\n", "Requirement already satisfied: joblib>=0.14 in ./.local/lib/python3.8/site-packages (from librosa) (1.2.0)\n", "Requirement already satisfied: scikit-learn>=0.19.1 in /usr/lib/python3/dist-packages (from librosa) (0.22.2.post1)\n", "Requirement already satisfied: scipy>=1.2.0 in ./.local/lib/python3.8/site-packages (from librosa) (1.9.3)\n", "Requirement already satisfied: importlib-metadata in ./.local/lib/python3.8/site-packages (from numba>=0.45.1->librosa) (5.0.0)\n", "Requirement already satisfied: setuptools in ./.local/lib/python3.8/site-packages (from numba>=0.45.1->librosa) (65.6.3)\n", "Collecting numpy>=1.17.0\n", " Using cached numpy-1.23.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.1 MB)\n", "Requirement already satisfied: llvmlite<0.40,>=0.39.0dev0 in /usr/local/lib/python3.8/dist-packages (from numba>=0.45.1->librosa) (0.39.1)\n", "Requirement already satisfied: appdirs>=1.3.0 in /usr/lib/python3/dist-packages (from pooch>=1.0->librosa) (1.4.3)\n", "Requirement already satisfied: requests>=2.19.0 in ./.local/lib/python3.8/site-packages (from pooch>=1.0->librosa) (2.28.1)\n", "Requirement already satisfied: cffi>=1.0 in /usr/lib/python3/dist-packages (from soundfile>=0.10.2->librosa) (1.14.0)\n", "Requirement already satisfied: idna<4,>=2.5 in ./.local/lib/python3.8/site-packages (from requests>=2.19.0->pooch>=1.0->librosa) (3.4)\n", "Requirement already satisfied: certifi>=2017.4.17 in ./.local/lib/python3.8/site-packages (from requests>=2.19.0->pooch>=1.0->librosa) (2022.12.7)\n", "Requirement already satisfied: urllib3<1.27,>=1.21.1 in ./.local/lib/python3.8/site-packages (from requests>=2.19.0->pooch>=1.0->librosa) (1.26.13)\n", "Requirement already satisfied: charset-normalizer<3,>=2 in ./.local/lib/python3.8/site-packages (from requests>=2.19.0->pooch>=1.0->librosa) (2.1.1)\n", "Requirement already satisfied: zipp>=0.5 in /usr/lib/python3/dist-packages (from importlib-metadata->numba>=0.45.1->librosa) (1.0.0)\n", "\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0mInstalling collected packages: numpy\n", " Attempting uninstall: numpy\n", "\u001b[33m WARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m Found existing installation: numpy 1.24.0rc2\n", " Uninstalling numpy-1.24.0rc2:\n", " Successfully uninstalled numpy-1.24.0rc2\n", "\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0mSuccessfully installed numpy-1.23.5\n", "\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0mDefaulting to user installation because normal site-packages is not writeable\n", "\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0mRequirement already satisfied: jiwer in /usr/local/lib/python3.8/dist-packages (2.5.1)\n", "Requirement already satisfied: levenshtein==0.20.2 in /usr/local/lib/python3.8/dist-packages (from jiwer) (0.20.2)\n", "Requirement already satisfied: rapidfuzz<3.0.0,>=2.3.0 in /usr/local/lib/python3.8/dist-packages (from levenshtein==0.20.2->jiwer) (2.13.5)\n", "\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0mDefaulting to user installation because normal site-packages is not writeable\n", "\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0mRequirement already satisfied: gradio in ./.local/lib/python3.8/site-packages (3.12.0)\n", "Requirement already satisfied: h11<0.13,>=0.11 in ./.local/lib/python3.8/site-packages (from gradio) (0.12.0)\n", "Requirement already satisfied: pydantic in ./.local/lib/python3.8/site-packages (from gradio) (1.10.2)\n", "Requirement already satisfied: jinja2 in ./.local/lib/python3.8/site-packages (from gradio) (3.1.2)\n", "Requirement already satisfied: markdown-it-py[linkify,plugins] in ./.local/lib/python3.8/site-packages (from gradio) (2.1.0)\n", "Requirement already satisfied: fsspec in /usr/local/lib/python3.8/dist-packages (from gradio) (2022.11.0)\n", "Requirement already satisfied: aiohttp in /usr/local/lib/python3.8/dist-packages (from gradio) (3.8.3)\n", "Requirement already satisfied: pandas in ./.local/lib/python3.8/site-packages (from gradio) (1.5.1)\n", "Requirement already satisfied: httpx in ./.local/lib/python3.8/site-packages (from gradio) (0.23.1)\n", "Requirement already satisfied: numpy in ./.local/lib/python3.8/site-packages (from gradio) (1.23.5)\n", "Requirement already satisfied: ffmpy in ./.local/lib/python3.8/site-packages (from gradio) (0.3.0)\n", "Requirement already satisfied: pycryptodome in ./.local/lib/python3.8/site-packages (from gradio) (3.16.0)\n", "Requirement already satisfied: python-multipart in ./.local/lib/python3.8/site-packages (from gradio) (0.0.5)\n", "Requirement already satisfied: websockets>=10.0 in ./.local/lib/python3.8/site-packages (from gradio) (10.4)\n", "Requirement already satisfied: pyyaml in /usr/lib/python3/dist-packages (from gradio) (5.3.1)\n", "Requirement already satisfied: matplotlib in ./.local/lib/python3.8/site-packages (from gradio) (3.5.3)\n", "Requirement already satisfied: orjson in ./.local/lib/python3.8/site-packages (from gradio) (3.8.3)\n", "Requirement already satisfied: requests in ./.local/lib/python3.8/site-packages (from gradio) (2.28.1)\n", "Requirement already satisfied: uvicorn in ./.local/lib/python3.8/site-packages (from gradio) (0.20.0)\n", "Requirement already satisfied: paramiko in ./.local/lib/python3.8/site-packages (from gradio) (2.12.0)\n", "Requirement already satisfied: pydub in ./.local/lib/python3.8/site-packages (from gradio) (0.25.1)\n", "Requirement already satisfied: fastapi in ./.local/lib/python3.8/site-packages (from gradio) (0.88.0)\n", "Requirement already satisfied: pillow in ./.local/lib/python3.8/site-packages (from gradio) (9.3.0)\n", "Requirement already satisfied: attrs>=17.3.0 in /usr/lib/python3/dist-packages (from aiohttp->gradio) (19.3.0)\n", "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.8/dist-packages (from aiohttp->gradio) (1.3.1)\n", "Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.8/dist-packages (from aiohttp->gradio) (1.8.2)\n", "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.8/dist-packages (from aiohttp->gradio) (6.0.3)\n", "Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /usr/local/lib/python3.8/dist-packages (from aiohttp->gradio) (4.0.2)\n", "Requirement already satisfied: charset-normalizer<3.0,>=2.0 in ./.local/lib/python3.8/site-packages (from aiohttp->gradio) (2.1.1)\n", "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from aiohttp->gradio) (1.3.3)\n", "Requirement already satisfied: starlette==0.22.0 in ./.local/lib/python3.8/site-packages (from fastapi->gradio) (0.22.0)\n", "Requirement already satisfied: typing-extensions>=3.10.0 in ./.local/lib/python3.8/site-packages (from starlette==0.22.0->fastapi->gradio) (4.4.0)\n", "Requirement already satisfied: anyio<5,>=3.4.0 in ./.local/lib/python3.8/site-packages (from starlette==0.22.0->fastapi->gradio) (3.6.2)\n", "Requirement already satisfied: sniffio in ./.local/lib/python3.8/site-packages (from httpx->gradio) (1.3.0)\n", "Requirement already satisfied: certifi in ./.local/lib/python3.8/site-packages (from httpx->gradio) (2022.12.7)\n", "Requirement already satisfied: httpcore<0.17.0,>=0.15.0 in ./.local/lib/python3.8/site-packages (from httpx->gradio) (0.15.0)\n", "Requirement already satisfied: rfc3986[idna2008]<2,>=1.3 in ./.local/lib/python3.8/site-packages (from httpx->gradio) (1.5.0)\n", "Requirement already satisfied: MarkupSafe>=2.0 in ./.local/lib/python3.8/site-packages (from jinja2->gradio) (2.1.1)\n", "Requirement already satisfied: mdurl~=0.1 in ./.local/lib/python3.8/site-packages (from markdown-it-py[linkify,plugins]->gradio) (0.1.2)\n", "Requirement already satisfied: linkify-it-py~=1.0 in ./.local/lib/python3.8/site-packages (from markdown-it-py[linkify,plugins]->gradio) (1.0.3)\n", "Requirement already satisfied: mdit-py-plugins in ./.local/lib/python3.8/site-packages (from markdown-it-py[linkify,plugins]->gradio) (0.3.3)\n", "Requirement already satisfied: cycler>=0.10 in /usr/lib/python3/dist-packages (from matplotlib->gradio) (0.10.0)\n", "Requirement already satisfied: pyparsing>=2.2.1 in /usr/lib/python3/dist-packages (from matplotlib->gradio) (2.4.6)\n", "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/lib/python3/dist-packages (from matplotlib->gradio) (1.0.1)\n", "Requirement already satisfied: packaging>=20.0 in ./.local/lib/python3.8/site-packages (from matplotlib->gradio) (22.0)\n", "Requirement already satisfied: python-dateutil>=2.7 in ./.local/lib/python3.8/site-packages (from matplotlib->gradio) (2.8.2)\n", "Requirement already satisfied: fonttools>=4.22.0 in ./.local/lib/python3.8/site-packages (from matplotlib->gradio) (4.38.0)\n", "Requirement already satisfied: pytz>=2020.1 in ./.local/lib/python3.8/site-packages (from pandas->gradio) (2022.5)\n", "Requirement already satisfied: pynacl>=1.0.1 in /usr/lib/python3/dist-packages (from paramiko->gradio) (1.3.0)\n", "Requirement already satisfied: cryptography>=2.5 in /usr/lib/python3/dist-packages (from paramiko->gradio) (2.8)\n", "Requirement already satisfied: six in /usr/lib/python3/dist-packages (from paramiko->gradio) (1.14.0)\n", "Requirement already satisfied: bcrypt>=3.1.3 in ./.local/lib/python3.8/site-packages (from paramiko->gradio) (4.0.1)\n", "Requirement already satisfied: idna<4,>=2.5 in ./.local/lib/python3.8/site-packages (from requests->gradio) (3.4)\n", "Requirement already satisfied: urllib3<1.27,>=1.21.1 in ./.local/lib/python3.8/site-packages (from requests->gradio) (1.26.13)\n", "Requirement already satisfied: click>=7.0 in /usr/lib/python3/dist-packages (from uvicorn->gradio) (7.0)\n", "Requirement already satisfied: uc-micro-py in ./.local/lib/python3.8/site-packages (from linkify-it-py~=1.0->markdown-it-py[linkify,plugins]->gradio) (1.0.1)\n", "\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0mDefaulting to user installation because normal site-packages is not writeable\n", "\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0mRequirement already satisfied: more-itertools in /usr/local/lib/python3.8/dist-packages (9.0.0)\n", "\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -orch (/home/ubuntu/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m" ] } ], "source": [ "!pip install git+https://github.com/huggingface/datasets\n", "!pip install git+https://github.com/huggingface/transformers\n", "!pip3 install numexpr>=2.7.3\n", "!pip install librosa\n", "!pip install evaluate>=0.3.0\n", "!pip install jiwer\n", "!pip install gradio\n", "!pip install more-itertools" ] }, { "cell_type": "markdown", "id": "5b185650-af09-48c6-a67b-0e4368b74b3b", "metadata": { "id": "5b185650-af09-48c6-a67b-0e4368b74b3b", "tags": [] }, "source": [ "Linking the notebook to the Hugging Face Hub is straightforward - it simply \n", "\n", "\n", "requires entering your \n", "Hub authentication token when prompted. Find your Hub authentication token [here](https://huggingface.co/settings/tokens):" ] }, { "cell_type": "code", "execution_count": 1, "id": "dff27c76-575c-432b-8916-b1b810efef4a", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 331, "referenced_widgets": [ "7f16af38d92e4cac84284de5e3756ce6", "87a7ee7ff6d44cd7881ea185796628a6", "e1bb1e29bac248f793290223324a4c4b", "3ae98fe05f88443d821d5df7488a123b", "21188bee327c4aedb689117ac9587842", "3b4a918dcadb4b18903907fcab930dfe", "50b3d6dda7504241961ab0bf9c9c033a", "b57d12eff9424744bd7e79cc039a22e5", "38135b51abf54c749ccb3db099f10b1d", "5782da4956ce4aeeafff328e0b821936", "481fdd626960471e94d31f00e20344ac", "40d85caeaa614d918e5f199e1c5136ef", "88dc4572ad5c495cb0489a5bc6467ee2", "a011f026bff54c2cbfcf32d839f69a38", "c8f6afbce8ca417d979c579760fe7311", "ed3ad08826e24e03ba6d611550249160", "f736a64d34b94efabfdd36c297177aed" ] }, "id": "dff27c76-575c-432b-8916-b1b810efef4a", "outputId": "5beb152e-9d4f-4581-8063-c5752890c4fa" }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "6a2bfa275b6f4cdba66b6abdab11e859", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(HTML(value='