{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "1LrJXl6HY3xO"
},
"source": [
"# **Boosting Wav2Vec2 with n-grams in 🤗 Transformers**\n",
"\n",
"**Wav2Vec2** is a popular pre-trained model for speech recognition. Released in [September 2020](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) by Meta AI Research, the novel architecture catalyzed progress in self-supervised pretraining for speech recognition, *e.g.* [*G. Ng et al.*, 2021](https://arxiv.org/pdf/2104.03416.pdf), [*Chen et al*, 2021](https://arxiv.org/abs/2110.13900), [*Hsu et al.*, 2021](https://arxiv.org/abs/2106.07447) and [*Babu et al.*, 2021](https://arxiv.org/abs/2111.09296). On the Hugging Face Hub, Wav2Vec2's most popular pre-trained checkpoint currently amounts to over [**250,000** monthly downloads](https://huggingface.co/facebook/wav2vec2-base-960h).\n",
"\n",
"Using Connectionist Temporal Classification (CTC), pre-trained Wav2Vec2-like checkpoints are extremely easy to fine-tune on downstream speech recognition tasks.\n",
"In a nutshell, fine-tuning pre-trained Wav2Vec2 checkpoints works as follows: \n",
"\n",
"A single randomly initialized linear layer is stacked on top of the pre-trained checkpoint and trained to classify raw audio input to a sequence of letters. It does so by:\n",
"\n",
"1. extracting audio representations from the raw audio (using CNN layers),\n",
"2. processing the sequence of audio representations with a stack of transformer layers, and,\n",
"3. classifying the processed audio representations into a sequence of output letters.\n",
"\n",
"Previously audio classification models required an additional language model (LM) and a dictionary to transform the sequence of classified audio frames to a coherent transcription.\n",
"Wav2Vec2's architecture is based on transformer layers, thus giving each processed audio representation context \n",
"from all other audio representations. In addition, \n",
"Wav2Vec2 leverages the [CTC algorithm](https://distill.pub/2017/ctc/) for fine-tuning, which solves the problem of alignment between a varying \"input audio length\"-to-\"output text length\" ratio.\n",
"\n",
"Having contextualized audio classifications and no alignment problems, Wav2Vec2 does not require \n",
"an external language model or dictionary to yield acceptable audio transcriptions.\n",
"\n",
"As can be seen in Appendix C of the [official paper](https://arxiv.org/abs/2006.11477), Wav2Vec2 gives impressive downstream performances on [LibriSpeech]((https://huggingface.co/datasets/librispeech_asr)) without using a language model at all. However, from the appendix, it also becomes clear that using Wav2Vec2 in combination with a language model can yield a significant improvement, especially when the model was trained on only 10 minutes of transcribed audio.\n",
"\n",
"Until recently, the 🤗 Transformers library did not offer a simple user interface to decode audio files with a fine-tuned Wav2Vec2 **and** a language model. This has thankfully changed. 🤗 Transformers now offers an easy-to-use integration with *Kensho Technologies'* [pyctcdecode library](https://github.com/kensho-technologies/pyctcdecode). This blog post is a step-by-step **technical** guide to explain how one can create an **n-gram** language model and combine it with an existing fine-tuned Wav2Vec2 checkpoint using 🤗 Datasets and 🤗 Transformers.\n",
"\n",
"We start by:\n",
"\n",
"1. How does decoding audio with an LM differ from decoding audio without an LM?\n",
"2. How to get suitable data for a language model?\n",
"3. How to build an *n-gram* with KenLM?\n",
"4. How to combine the *n-gram* with a fine-tuned Wav2Vec2 checkpoint?\n",
"\n",
"For a deep dive into how Wav2Vec2 functions - which is not necessary for this blog post - the reader is advised to consult the following material:\n",
"\n",
"- [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477)\n",
"- [Fine-Tune Wav2Vec2 for English ASR with 🤗 Transformers](https://huggingface.co/blog/fine-tune-wav2vec2-english)\n",
"- [An Illustrated Tour of Wav2vec 2.0](https://jonathanbgn.com/2021/09/30/illustrated-wav2vec-2.html)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "nu5oeVSvprSp"
},
"source": [
"## **1. Decoding audio data with Wav2Vec2 and a language model**\n",
"\n",
"As shown in 🤗 Transformers [exemple docs of Wav2Vec2](https://huggingface.co/docs/transformers/master/en/model_doc/wav2vec2#transformers.Wav2Vec2ForCTC), audio can be transcribed as follows.\n",
"\n",
"First, we install `datasets` and `transformers`.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"id": "OWGc_zfyq5_T",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "4cc791f5-6a7c-4c21-c880-bd47df479744"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Collecting datasets\n",
" Downloading datasets-1.18.3-py3-none-any.whl (311 kB)\n",
"\u001b[K |████████████████████████████████| 311 kB 6.6 MB/s \n",
"\u001b[?25hCollecting transformers\n",
" Downloading transformers-4.16.2-py3-none-any.whl (3.5 MB)\n",
"\u001b[K |████████████████████████████████| 3.5 MB 45.3 MB/s \n",
"\u001b[?25hRequirement already satisfied: packaging in /usr/local/lib/python3.7/dist-packages (from datasets) (21.3)\n",
"Collecting aiohttp\n",
" Downloading aiohttp-3.8.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB)\n",
"\u001b[K |████████████████████████████████| 1.1 MB 55.6 MB/s \n",
"\u001b[?25hRequirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.7/dist-packages (from datasets) (1.19.5)\n",
"Requirement already satisfied: multiprocess in /usr/local/lib/python3.7/dist-packages (from datasets) (0.70.12.2)\n",
"Requirement already satisfied: dill in /usr/local/lib/python3.7/dist-packages (from datasets) (0.3.4)\n",
"Requirement already satisfied: tqdm>=4.62.1 in /usr/local/lib/python3.7/dist-packages (from datasets) (4.62.3)\n",
"Requirement already satisfied: pyarrow!=4.0.0,>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from datasets) (6.0.1)\n",
"Collecting xxhash\n",
" Downloading xxhash-2.0.2-cp37-cp37m-manylinux2010_x86_64.whl (243 kB)\n",
"\u001b[K |████████████████████████████████| 243 kB 53.5 MB/s \n",
"\u001b[?25hCollecting fsspec[http]>=2021.05.0\n",
" Downloading fsspec-2022.1.0-py3-none-any.whl (133 kB)\n",
"\u001b[K |████████████████████████████████| 133 kB 57.5 MB/s \n",
"\u001b[?25hRequirement already satisfied: importlib-metadata in /usr/local/lib/python3.7/dist-packages (from datasets) (4.10.1)\n",
"Requirement already satisfied: requests>=2.19.0 in /usr/local/lib/python3.7/dist-packages (from datasets) (2.23.0)\n",
"Requirement already satisfied: pandas in /usr/local/lib/python3.7/dist-packages (from datasets) (1.3.5)\n",
"Collecting huggingface-hub<1.0.0,>=0.1.0\n",
" Downloading huggingface_hub-0.4.0-py3-none-any.whl (67 kB)\n",
"\u001b[K |████████████████████████████████| 67 kB 5.3 MB/s \n",
"\u001b[?25hRequirement already satisfied: filelock in /usr/local/lib/python3.7/dist-packages (from huggingface-hub<1.0.0,>=0.1.0->datasets) (3.4.2)\n",
"Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.7/dist-packages (from huggingface-hub<1.0.0,>=0.1.0->datasets) (3.10.0.2)\n",
"Requirement already satisfied: pyyaml in /usr/local/lib/python3.7/dist-packages (from huggingface-hub<1.0.0,>=0.1.0->datasets) (3.13)\n",
"Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging->datasets) (3.0.7)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->datasets) (2021.10.8)\n",
"Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->datasets) (1.24.3)\n",
"Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->datasets) (2.10)\n",
"Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->datasets) (3.0.4)\n",
"Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.7/dist-packages (from transformers) (2019.12.20)\n",
"Collecting sacremoses\n",
" Downloading sacremoses-0.0.47-py2.py3-none-any.whl (895 kB)\n",
"\u001b[K |████████████████████████████████| 895 kB 58.4 MB/s \n",
"\u001b[?25hCollecting pyyaml\n",
" Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)\n",
"\u001b[K |████████████████████████████████| 596 kB 50.0 MB/s \n",
"\u001b[?25hCollecting tokenizers!=0.11.3,>=0.10.1\n",
" Downloading tokenizers-0.11.4-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.8 MB)\n",
"\u001b[K |████████████████████████████████| 6.8 MB 51.9 MB/s \n",
"\u001b[?25hCollecting aiosignal>=1.1.2\n",
" Downloading aiosignal-1.2.0-py3-none-any.whl (8.2 kB)\n",
"Collecting yarl<2.0,>=1.0\n",
" Downloading yarl-1.7.2-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (271 kB)\n",
"\u001b[K |████████████████████████████████| 271 kB 45.3 MB/s \n",
"\u001b[?25hCollecting frozenlist>=1.1.1\n",
" Downloading frozenlist-1.3.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (144 kB)\n",
"\u001b[K |████████████████████████████████| 144 kB 77.4 MB/s \n",
"\u001b[?25hCollecting async-timeout<5.0,>=4.0.0a3\n",
" Downloading async_timeout-4.0.2-py3-none-any.whl (5.8 kB)\n",
"Collecting multidict<7.0,>=4.5\n",
" Downloading multidict-6.0.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (94 kB)\n",
"\u001b[K |████████████████████████████████| 94 kB 3.9 MB/s \n",
"\u001b[?25hCollecting asynctest==0.13.0\n",
" Downloading asynctest-0.13.0-py3-none-any.whl (26 kB)\n",
"Requirement already satisfied: charset-normalizer<3.0,>=2.0 in /usr/local/lib/python3.7/dist-packages (from aiohttp->datasets) (2.0.11)\n",
"Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.7/dist-packages (from aiohttp->datasets) (21.4.0)\n",
"Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata->datasets) (3.7.0)\n",
"Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.7/dist-packages (from pandas->datasets) (2018.9)\n",
"Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas->datasets) (2.8.2)\n",
"Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.7.3->pandas->datasets) (1.15.0)\n",
"Requirement already satisfied: joblib in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers) (1.1.0)\n",
"Requirement already satisfied: click in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers) (7.1.2)\n",
"Installing collected packages: multidict, frozenlist, yarl, asynctest, async-timeout, aiosignal, pyyaml, fsspec, aiohttp, xxhash, tokenizers, sacremoses, huggingface-hub, transformers, datasets\n",
" Attempting uninstall: pyyaml\n",
" Found existing installation: PyYAML 3.13\n",
" Uninstalling PyYAML-3.13:\n",
" Successfully uninstalled PyYAML-3.13\n",
"Successfully installed aiohttp-3.8.1 aiosignal-1.2.0 async-timeout-4.0.2 asynctest-0.13.0 datasets-1.18.3 frozenlist-1.3.0 fsspec-2022.1.0 huggingface-hub-0.4.0 multidict-6.0.2 pyyaml-6.0 sacremoses-0.0.47 tokenizers-0.11.4 transformers-4.16.2 xxhash-2.0.2 yarl-1.7.2\n"
]
}
],
"source": [
"!pip install datasets transformers"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZzcM5yC5rICZ"
},
"source": [
"Let's load a small excerpt of the [Librispeech dataset](https://huggingface.co/datasets/librispeech_asr) to demonstrate Wav2Vec2's speech transcription capabilities."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "dAerOhydrNFR",
"outputId": "4ba5ed61-f6ef-40ae-828d-7e4f5c0f4f87"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Reusing dataset librispeech_asr (/root/.cache/huggingface/datasets/hf-internal-testing___librispeech_asr/clean/2.1.0/f2c70a4d03ab4410954901bde48c54b85ca1b7f9bf7d616e7e2a72b5ee6ddbfc)\n"
]
},
{
"data": {
"text/plain": [
"Dataset({\n",
" features: ['file', 'audio', 'text', 'speaker_id', 'chapter_id', 'id'],\n",
" num_rows: 73\n",
"})"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from datasets import load_dataset\n",
"\n",
"dataset = load_dataset(\"hf-internal-testing/librispeech_asr_demo\", \"clean\", split=\"validation\")\n",
"dataset"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-AkdbAlyrszm"
},
"source": [
"We can pick one of the 73 audio samples and listen to it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 93
},
"id": "pSomT7k_r1QX",
"outputId": "8475f442-56e6-4c51-bd11-edf5d212ec48"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"he tells us that at this festive season of the year with christmas and roast beef looming before us similes drawn from eating and its results occur most readily to the mind\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import IPython.display as ipd\n",
"\n",
"audio_sample = dataset[2]\n",
"print(audio_sample[\"text\"].lower())\n",
"ipd.Audio(data=audio_sample[\"audio\"][\"array\"], autoplay=True, rate=audio_sample[\"audio\"][\"sampling_rate\"])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZbBGOqQusdO6"
},
"source": [
"Having chosen a data sample, we now load the fine-tuned model and processor."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "5FUPvu0crmsY"
},
"outputs": [],
"source": [
"from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC\n",
"\n",
"processor = Wav2Vec2Processor.from_pretrained(\"facebook/wav2vec2-base-100h\")\n",
"model = Wav2Vec2ForCTC.from_pretrained(\"facebook/wav2vec2-base-100h\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "7L8Th_yTslta"
},
"source": [
"Next, we process the data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "6NoMX8qfssuw"
},
"outputs": [],
"source": [
"inputs = processor(audio_sample[\"audio\"][\"array\"], sampling_rate=16_000, return_tensors=\"pt\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "xyM2MqwEs1p7"
},
"source": [
"forward it to the model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "4KWDRrG0s27p"
},
"outputs": [],
"source": [
"import torch\n",
"\n",
"with torch.no_grad():\n",
" logits = model(**inputs).logits"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "pEHhf_1os4rZ"
},
"source": [
"and decode it"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"id": "aJREUIdqs5ak",
"outputId": "40de5ef4-5afd-4518-e3bc-c81cb976b46e"
},
"outputs": [
{
"data": {
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "string"
},
"text/plain": [
"'he tells us that at this festive season of the year with christmaus and rose beef looming before us simalyis drawn from eating and its results occur most readily to the mind'"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"predicted_ids = torch.argmax(logits, dim=-1)\n",
"transcription = processor.batch_decode(predicted_ids)\n",
"\n",
"transcription[0].lower()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ifBW-tM0yWhS"
},
"source": [
"Comparing the transcription to the target transcription above, we can see that some words *sound* correct, but are not *spelled* correctly, *e.g.*:\n",
"\n",
"- *christmaus* vs. *christmas*\n",
"- *rose* vs. *roast*\n",
"- *simalyis* vs. *similes*\n",
"\n",
"Let's see whether combining Wav2Vec2 with an ***n-gram*** lnguage model can help here."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "JC1FrBDnzTJ5"
},
"source": [
"First, we need to install `pyctcdecode` and `kenlm`."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"id": "TvDJ7CYpzSJQ",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "cb8e254d-7e9e-4549-ad29-55eded55c172"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Collecting https://github.com/kpu/kenlm/archive/master.zip\n",
" Downloading https://github.com/kpu/kenlm/archive/master.zip\n",
"\u001b[K / 541 kB 485 kB/s\n",
"\u001b[?25hCollecting pyctcdecode\n",
" Downloading pyctcdecode-0.3.0-py2.py3-none-any.whl (43 kB)\n",
"\u001b[K |████████████████████████████████| 43 kB 1.9 MB/s \n",
"\u001b[?25hRequirement already satisfied: numpy<2.0.0,>=1.15.0 in /usr/local/lib/python3.7/dist-packages (from pyctcdecode) (1.19.5)\n",
"Collecting pygtrie<3.0,>=2.1\n",
" Downloading pygtrie-2.4.2.tar.gz (35 kB)\n",
"Collecting hypothesis<7,>=6.14\n",
" Downloading hypothesis-6.36.1-py3-none-any.whl (376 kB)\n",
"\u001b[K |████████████████████████████████| 376 kB 14.9 MB/s \n",
"\u001b[?25hRequirement already satisfied: sortedcontainers<3.0.0,>=2.1.0 in /usr/local/lib/python3.7/dist-packages (from hypothesis<7,>=6.14->pyctcdecode) (2.4.0)\n",
"Requirement already satisfied: attrs>=19.2.0 in /usr/local/lib/python3.7/dist-packages (from hypothesis<7,>=6.14->pyctcdecode) (21.4.0)\n",
"Building wheels for collected packages: kenlm, pygtrie\n",
" Building wheel for kenlm (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
" Created wheel for kenlm: filename=kenlm-0.0.0-cp37-cp37m-linux_x86_64.whl size=2338537 sha256=00590bf10c1f6baef4b43760064a3eed792028fec2bc61b98877f769a956eade\n",
" Stored in directory: /tmp/pip-ephem-wheel-cache-b0pd4bbm/wheels/3d/aa/02/7b4a2eab5d7a2a9391bd9680dbad6270808a147bc3b7047e4e\n",
" Building wheel for pygtrie (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
" Created wheel for pygtrie: filename=pygtrie-2.4.2-py3-none-any.whl size=19063 sha256=9609d5f563aebd3d79976bde7e6f56af5bf8b5b005fc806979692da6b3b88cea\n",
" Stored in directory: /root/.cache/pip/wheels/d3/f8/ba/1d828b1603ea422686eb694253a43cb3a5901ea4696c1e0603\n",
"Successfully built kenlm pygtrie\n",
"Installing collected packages: pygtrie, hypothesis, pyctcdecode, kenlm\n",
"Successfully installed hypothesis-6.36.1 kenlm-0.0.0 pyctcdecode-0.3.0 pygtrie-2.4.2\n"
]
}
],
"source": [
"!pip install https://github.com/kpu/kenlm/archive/master.zip pyctcdecode"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ak_X8SHqzjOY"
},
"source": [
"For demonstration purposes, we have prepared a new model repository [patrickvonplaten/wav2vec2-base-100h-with-lm](https://huggingface.co/patrickvonplaten/wav2vec2-base-100h-with-lm) which contains the same Wav2Vec2 checkpoint but has an additional **4-gram** language model for English.\n",
"\n",
"Instead of using `Wav2Vec2Processor`, this time we use `Wav2Vec2ProcessorWithLM` to load the **4-gram** model in addition to the feature extractor and tokenizer."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "UydQ00uI0OEG"
},
"outputs": [],
"source": [
"from transformers import Wav2Vec2ProcessorWithLM\n",
"\n",
"processor = Wav2Vec2ProcessorWithLM.from_pretrained(\"patrickvonplaten/wav2vec2-base-100h-with-lm\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "cYZrzLQ02U4y"
},
"source": [
"In constrast to decoding the audio without language model, the processor now directly receives the model's output `logits` instead of the `argmax(logits)` (called `predicted_ids`) above. The reason is that when decoding with a language model, at each time step, the processor takes the probabilities of all possible output characters into account. Let's take a look at the dimension of the `logits` output."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "WJvldi0C5OaW",
"outputId": "ef6210a9-9652-4fc0-dc61-007dc85de5f5"
},
"outputs": [
{
"data": {
"text/plain": [
"torch.Size([1, 624, 32])"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"logits.shape"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ySU0i3oZ5Vdm"
},
"source": [
"We can see that the `logits` correspond to a sequence of 624 vectors each having 32 entries. Each of the 32 entries thereby stands for the logit probability of one of the 32 possible output characters of the model:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"id": "rRz0_vm95i6E",
"outputId": "0b076acf-c839-4a66-8059-e7261e9f0024"
},
"outputs": [
{
"data": {
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "string"
},
"text/plain": [
"\"' A B C D E F G H I J K L M N O P Q R S T U V W X Y Z |\""
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"\" \".join(sorted(processor.tokenizer.get_vocab()))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "u60UcdWL5ub-"
},
"source": [
"Intuitively, one can understand the decoding process of `Wav2Vec2ProcessorWithLM` as applying beam search through a matrix of size 624 $\\times$ 32 probabilities while leveraging the probabilities of the next letters as given by the *n-gram* language model.\n",
"\n",
"OK, let's run the decoding step again. `pyctcdecode` language model decoder does not automatically convert `torch` tensors to `numpy` so we'll have to convert them ourselves before."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"id": "DFRgWyuUAZI4",
"outputId": "45b35f41-bc95-4297-e8a7-30d7c3980ac9"
},
"outputs": [
{
"data": {
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "string"
},
"text/plain": [
"'he tells us that at this festive season of the year with christmas and rose beef looming before us similes drawn from eating and its results occur most readily to the mind'"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"transcription = processor.batch_decode(logits.numpy()).text\n",
"transcription[0].lower()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "E4CiVIftDEMd"
},
"source": [
"Cool! Recalling the words `facebook/wav2vec2-base-100h` without a language model transcribed incorrectly previously, *e.g.*,\n",
"\n",
"> - *christmaus* vs. *christmas*\n",
"- *rose* vs. *roast*\n",
"- *simalyis* vs. *similes*\n",
"\n",
"we can take another look at the transcription of `facebook/wav2vec2-base-100h` **with** a 4-gram language model. 2 out of 3 errors are corrected; *christmas* and *similes* have been correctly transcribed.\n",
"\n",
"Interestingly, the incorrect transcription of *rose* persists. However, this should not surprise us very much. Decoding audio without a language model is much more prone to yield spelling mistakes, such as *christmaus* or *similes* (those words don't exist in the English language as far as I know). This is because the speech recognition system almost solely bases its prediction on the acoustic input it was given and not really on the language modeling context of previous and successive predicted letters ${}^1$. \n",
"If on the other hand, we add a language model, we can be fairly sure that the speech recognition system will heavily reduce spelling errors since a well-trained *n-gram* model will surely not predict a word that has spelling errors. But the word *rose* is a valid English word and therefore the 4-gram will predict this word with a probability that is not insignificant. \n",
"\n",
"The language model on its own most likely does favor the correct word *roast* since the word sequence *roast beef* is much more common in English than *rose beef*. Because the final transcription is derived from a weighted combination of `facebook/wav2vec2-base-100h` output probabilities and those of the *n-gram* language model, it is quite common to see incorrectly transcribed words such as *rose*.\n",
"\n",
"For more information on how you can tweak different parameters when decoding with `Wav2Vec2ProcessorWithLM`, please take a look at the official documentation [here](https://huggingface.co/docs/transformers/master/en/model_doc/wav2vec2#transformers.Wav2Vec2ProcessorWithLM.batch_decode).\n",
"\n",
"---\n",
"${}^1$ Some research shows that a model such as `facebook/wav2vec2-base-100h` - when sufficiently large and trained on enough data - can learn language modeling dependencies between intermediate audio representations similar to a language model.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9FKW9cqzKMMo"
},
"source": [
"Great, now that you have seen the advantages adding an *n-gram* language model can bring, let's dive into how to create an *n-gram* and `Wav2Vec2ProcessorWithLM` from scratch."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "QEz599PfmPeG"
},
"source": [
"## **2. Getting data for your language model**\n",
"\n",
"A language model that is useful for a speech recognition system should support the acoustic model, *e.g.* Wav2Vec2, in predicting the next word (or token, letter) and therefore model the following distribution:\n",
"\n",
"$\\mathbf{P}(w_n | \\mathbf{w}_0^{t-1})$ with $w_n$ being the next word and $\\mathbf{w}_0^{t-1}$ being the sequence of all previous words since the beginning of the utterance. Simply said, the language model should be good at predicting the next word given all previously transcribed words regardless of the audio input given to the speech recognition system.\n",
"\n",
"As always a language model is only as good as the data it is trained on. In the case of speech recognition, we should therefore ask ourselves for what kind of data, the speech recognition will be used for: *conversations*, *audiobooks*, *movies*, *speeches*, *, etc*, ...?\n",
"\n",
"The language model should be good at modeling language that corresponds to the \n",
"target transcriptions of the speech recognition system. \n",
"For demonstration purposes, we assume here that we have fine-tuned a pre-trained [`facebook/wav2vec2-xls-r-300m`](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on [Common Voice 7](https://huggingface.co/datasets/mozilla-foundation/common_voice_7_0) in Swedish. The fine-tuned checkpoint can \n",
"be found [here](https://huggingface.co/hf-test/xls-r-300m-sv).\n",
"Common Voice 7 is a relatively crowd-sourced read-out audio dataset and we will evaluate the model on its test data.\n",
"\n",
"Let's now look for suitable text data on the Hugging Face Hub. We search all datasets for those [that contain Swedish data](https://huggingface.co/datasets?languages=languages:sv&sort=downloads). \n",
"Browsing a bit through the datasets, we are looking for a dataset that is similar to Common Voice's read-out audio data. The obvious choices of [oscar](https://huggingface.co/datasets/oscar) and [mc4](https://huggingface.co/datasets/mc4) might not be the most suitable here because they:\n",
"\n",
"- are generated from crawling the web, which might not be very clean and correspond well to spoken language\n",
"- require a lot of pre-processing\n",
"- are very large which is not ideal for demonstration purposes here 😉\n",
"\n",
"A dataset that seems sensible here and which is relatively clean and easy to pre-process is [europarl_bilingual](https://huggingface.co/datasets/europarl_bilingual) as it's a dataset that is based on discussions and talks of the European parliament. It should therefore be relatively clean and correspond well to read-out audio data. The dataset is originally designed for machine translation and can therefore only be accessed in translation pairs. We will only extract the text of the target language, Swedish (`sv`), from the *English-to-Swedish* translations."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"id": "chZw03lUVAnr"
},
"outputs": [],
"source": [
"target_lang=\"en\" # change to your target lang"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "cOLruPJsVS98"
},
"source": [
"Let's download the data."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 255,
"referenced_widgets": [
"c5eb721fe1b841168a49b5bc22435791",
"b9861ef1c3534e90b472be9ed8862f17",
"77db336868d84805937445c51ea72df4",
"fb4d2faef4da4dc0ae1203f9376053af",
"edbdbc0a82854bde92e5abf6c8f87534",
"eb4ed31888734fe39dd96f9642abd38b",
"d050daf64b5944fdad48b963ad482f7c",
"88e5417fad3d442585c799cfd9d2f8f4",
"5a0353a8a1be47a1a42f1e05eb72eb68",
"40a6127b87464cddba87f6c97c308594",
"c1931c4022d84d28815d460f72ef908b",
"880b6bbcaad54eb8ba377377b1858ffd",
"d37fb9ad76e64370a426f306919bedd0",
"850edc7115a34dde9ddf6e36b5f1e8dd",
"3f7f23c3d40a4a629c26115bad8b19c6",
"ef6e382ddb8f475981b9f2ea8312bf46",
"c8b82acda88245b686d57b496466a3e2",
"ed723bd499654388b4717624baa2a288",
"dfcd53bb9a6f4cf3b6cdc84f40bbcd51",
"3752bd8127d140829a62045098151b37",
"d73548881b594fd88f074b947b07c408",
"1963287764bb48fa86c7437eb2dad75b",
"95d736dd67b849ebae4433bff689cd06",
"8584eaf09efb49c995f5a2b1c25fe089",
"a377ec0ebad0485987b93ae5212ab19f",
"00b92ed648e94e8b975f4c66cc329c19",
"5f132d8f475345f7acef6a4996a026cf",
"90ec52eebee44eeea4cdead5bf665b67",
"16a8075896df4ca181a70973a59d1fd7",
"4e562a18f1a0471ea0ebd4007a2d674e",
"e3dafae56ffb4a5cbc84d80bc7f30c87",
"9681133982cc42048815e2c30e1b4436",
"be4fd6ba2c024224a10571a3a2c44637",
"73e64b85117b4bc69d013fad313da9c6",
"678d73e72aa941b2b1f3d1e85ed0d952",
"5bb04fe5c73b4bc09f1aa05952244413",
"d4adf3cee88e4482b20f9f0d3b87a36a",
"0f21362b91214406a29a82fcffc34bf7",
"f692fa3e8bad4f359baaff49a70d5cd5",
"3717640fa6e740c6a73e061fd9cd59e8",
"6433d64ddce540f3acacb9990c6f8b20",
"1b4a50a4aeea429795ff12ccf7e25ecc",
"d8d482db758d4052afe8019ba59d816a",
"a0ec12647b784b2d94a298db506e1ca4"
]
},
"id": "IrAzjWc3Ok2l",
"outputId": "591b75dd-b38f-4129-a748-21fefa0cf7b6"
},
"outputs": [
{
"output_type": "display_data",
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "c5eb721fe1b841168a49b5bc22435791",
"version_minor": 0,
"version_major": 2
},
"text/plain": [
"Downloading: 0%| | 0.00/2.08k [00:00, ?B/s]"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "880b6bbcaad54eb8ba377377b1858ffd",
"version_minor": 0,
"version_major": 2
},
"text/plain": [
"Downloading: 0%| | 0.00/7.41k [00:00, ?B/s]"
]
},
"metadata": {}
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"Using custom data configuration ar-en-lang1=ar,lang2=en\n"
]
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"Downloading and preparing dataset news_commentary/ar-en (download: 23.57 MiB, generated: 76.92 MiB, post-processed: Unknown size, total: 100.49 MiB) to /root/.cache/huggingface/datasets/news_commentary/ar-en-lang1=ar,lang2=en/0.0.0/cfab724ce975dc2da51cdae45302389860badc88b74db8570d561ced6004f8b4...\n"
]
},
{
"output_type": "display_data",
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "95d736dd67b849ebae4433bff689cd06",
"version_minor": 0,
"version_major": 2
},
"text/plain": [
"Downloading: 0%| | 0.00/24.7M [00:00, ?B/s]"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "73e64b85117b4bc69d013fad313da9c6",
"version_minor": 0,
"version_major": 2
},
"text/plain": [
"0 examples [00:00, ? examples/s]"
]
},
"metadata": {}
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"Dataset news_commentary downloaded and prepared to /root/.cache/huggingface/datasets/news_commentary/ar-en-lang1=ar,lang2=en/0.0.0/cfab724ce975dc2da51cdae45302389860badc88b74db8570d561ced6004f8b4. Subsequent calls will reuse this data.\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"Dataset({\n",
" features: ['id', 'translation'],\n",
" num_rows: 83187\n",
"})"
]
},
"metadata": {},
"execution_count": 4
}
],
"source": [
"from datasets import load_dataset\n",
"\n",
"dataset = load_dataset(\"news_commentary\", lang1=\"ar\", lang2=target_lang, split=\"train\")\n",
"dataset"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9eNZuRBxqpo4"
},
"source": [
"We see that the data is quite large - it has over a million translations. Since it's only text data, it should be relatively easy to process though.\n",
"\n",
"Next, let's look at how the data was preprocessed when training the fine-tuned *XLS-R* checkpoint in Swedish. Looking at the [`run.sh` file](https://huggingface.co/hf-test/xls-r-300m-sv/blob/main/run.sh), we can see that the following characters were removed from the official transcriptions:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "GGkF904aNoJG"
},
"outputs": [],
"source": [
""
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"id": "xu_ijSi7X4C4"
},
"outputs": [],
"source": [
"chars_to_ignore_regex = '[,?.!\\-\\;\\:\\\"“%‘”�—’…–]' # change to the ignored characters of your fine-tuned model"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "pqSMaIG-rLKG"
},
"source": [
"Let's do the same here so that the alphabet of our language model matches the one of the fine-tuned acoustic checkpoints.\n",
"\n",
"We can write a single map function to extract the Swedish text and process it right away."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"id": "WFIU8e8AR3tW"
},
"outputs": [],
"source": [
"import re\n",
"alpha_numerical = \"[0-9A-Za-z]\"\n",
"def extract_text(batch):\n",
" text = batch[\"translation\"]['ar']\n",
" batch[\"text\"] = re.sub(chars_to_ignore_regex, \"\", text.lower())\n",
" batch[\"text\"] = re.sub(alpha_numerical, \"\",batch[\"text\"])\n",
" batch[\"text\"] = re.sub(\"[—]\", '', batch[\"text\"])\n",
" batch[\"text\"] = re.sub(\"[_]\", '', batch[\"text\"])\n",
" batch[\"text\"] = re.sub(\"[«]\", '', batch[\"text\"])\n",
" batch[\"text\"] = re.sub(\"[»]\", '', batch[\"text\"])\n",
" # batch[\"sentence\"] = re.sub(\"['ِ]\", '', batch[\"sentence\"])\n",
" batch[\"text\"] = re.sub(\"[،]\", '', batch[\"text\"])\n",
" batch[\"text\"] = re.sub(\"[؛]\", '', batch[\"text\"])\n",
" batch[\"text\"] = re.sub(\"[؟]\", '', batch[\"text\"])\n",
" batch[\"text\"] = re.sub(\"['ۖ]\", '', batch[\"text\"])\n",
" batch[\"text\"] = re.sub(\"[ـ]\", '', batch[\"text\"])\n",
" batch[\"text\"] = re.sub(\"[☭]\", '', batch[\"text\"])\n",
" batch[\"text\"] = re.sub(\"[…]\", '', batch[\"text\"])\n",
" batch[\"text\"] = re.sub(\"['ۗ]\", '', batch[\"text\"])\n",
" batch[\"text\"] = re.sub(\"[،]\", '', batch[\"text\"])\n",
" batch[\"text\"] = re.sub(\"[؛]\", '', batch[\"text\"])\n",
" batch[\"text\"] = re.sub(\"['ۘ]\", '', batch[\"text\"])\n",
" batch[\"text\"] = re.sub(\"['ۚ]\", '', batch[\"text\"])\n",
" batch[\"text\"] = re.sub(\"['ۛ]\", '', batch[\"text\"])\n",
" batch[\"text\"] = re.sub(\"['ً]\", '', batch[\"text\"])\n",
" batch[\"text\"] = re.sub(\"['ٌ]\", '', batch[\"text\"])\n",
" batch[\"text\"] = re.sub(\"['ٍ]\", '', batch[\"text\"])\n",
" batch[\"text\"] = re.sub(\"['ّ]\", '', batch[\"text\"])\n",
" batch[\"text\"] = re.sub(\"['ٰ]\", '', batch[\"text\"])\n",
" return batch"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "pJ1HrbGBrdUT"
},
"source": [
"Let's apply the `.map()` function. This should take roughly 5 minutes."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 118,
"referenced_widgets": [
"9c52b398782b40cb99a35d2bf5f079b8",
"3137ac50cbae426e9031cc79d9c3443b",
"86a4292b30ec47c0a875ada683c00886",
"ac78dc635d2448dea7fdba88d7df067f",
"d4595639599e4233856655c8fe255deb",
"72b8ff227663406f88069ed51847447d",
"a8604194873b4defadb70736fe913d6e",
"3dcfd3664dab4bc39dda69085533423d",
"04b1c8bb81894178ba1e3e6f63f30a2a",
"6997e0504b9d4cd6b2f16f41ce144d3c",
"2d23fb97540040bda093ff13dccaa713"
]
},
"id": "fniFT4aARiBf",
"outputId": "3a2411cf-ce8a-4be9-d3d5-c2673e859b5a"
},
"outputs": [
{
"output_type": "display_data",
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "9c52b398782b40cb99a35d2bf5f079b8",
"version_minor": 0,
"version_major": 2
},
"text/plain": [
"0ex [00:00, ?ex/s]"
]
},
"metadata": {}
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"Dataset({\n",
" features: ['text'],\n",
" num_rows: 83187\n",
"})"
]
},
"metadata": {},
"execution_count": 7
}
],
"source": [
"dataset = dataset.map(extract_text, remove_columns=dataset.column_names)\n",
"dataset"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "a-S979j9rh43"
},
"source": [
"Great. Our dataset is already finished. Let's upload it to the Hub so that we can inspect and reuse it better.\n",
"\n",
"You can log in by executing the following cell."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 387,
"referenced_widgets": [
"5d00bf261e08411ab9c1a6127723f344",
"dc353060f608487fb9a9bf6e71d4c1c1",
"e08f20628d6e4553a05a30ede0cd4f5c",
"fc9eb8784a9a4636a79e6f538e599727",
"be93f4ff43ee4ba9b7930044e74c56dd",
"8c627fc877fa42e2aa2eeeafdc86a078",
"4e82782bab1b4871a2f499ac33a3b4cf",
"8533cf21f06b4fd289b60fe6cc9add37",
"962ef3466f0d4c779bf7f492956f8b77",
"f663c5883c7243578f59e274fe6e0000",
"e51ef080fb3d4189ab87e9c3a8030830",
"645415cf5a7f4c0482fb7944aa41da62",
"259afd5137d847a586716be29c5e26c6",
"4066d4a6efec4986b08b27ac23bf0248",
"cba65a7677624eaca65ab461e633dc0e",
"cf248ff18ac9473890936393caddf805",
"0fe85e546f7a435d922f318c70c72922"
]
},
"id": "JHTeonOGXiGq",
"outputId": "279caffa-a449-4a56-e025-ba6905e623c7"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Login successful\n",
"Your token has been saved to /root/.huggingface/token\n",
"\u001b[1m\u001b[31mAuthenticated through git-credential store but this isn't the helper defined on your machine.\n",
"You might have to re-authenticate when pushing to the Hugging Face Hub. Run the following command in your terminal in case you want to set this credential helper as the default\n",
"\n",
"git config --global credential.helper store\u001b[0m\n"
]
}
],
"source": [
"from huggingface_hub import notebook_login\n",
"\n",
"notebook_login()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "_93BHiJyrtKa"
},
"source": [
"Next, we call 🤗 Hugging Face's [`push_to_hub`](https://huggingface.co/docs/datasets/package_reference/main_classes.html?highlight=push#datasets.Dataset.push_to_hub) method to upload the dataset to the repo `\"swedish_corpora_parliament_processed\"`."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 66,
"referenced_widgets": [
"4fcd74b216e04fa08933a89c88f94b2f",
"5eaa47526b3d4e0d8c4b50823e3c2e0a",
"d224e3c405ed44ec91ed6bf5f2fff975",
"24e2c359cc9648f49ea0425ea64db84f",
"c970626699134c519ec6b5defc4cbc95",
"f742ad0bbe694f719774d708a98c2a58",
"b9de9edf003545b4bfc0664ca69f06e5",
"a521cbd3f7044e1fa1b28a9326f445a6",
"6166316067034e549b353634d3b9607b",
"51b1843f81e642c5ae0cb8c95240b1c4",
"9e34107cd8864135ab07a414f94bd163"
]
},
"id": "Lv9qvqXob-HS",
"outputId": "51cc0e0d-668f-4bf9-cf81-2aa60c50a372"
},
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": [
"The repository already exists: the `private` keyword argument will be ignored.\n"
]
},
{
"output_type": "display_data",
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "4fcd74b216e04fa08933a89c88f94b2f",
"version_minor": 0,
"version_major": 2
},
"text/plain": [
"Pushing dataset shards to the dataset hub: 0%| | 0/1 [00:00, ?it/s]"
]
},
"metadata": {}
}
],
"source": [
"dataset.push_to_hub(f\"ar_corpora_parliament_processed\", split=\"train\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "A_lrje4kr6u6"
},
"source": [
"That was easy! The dataset viewer is automatically enabled when uploading a new dataset, which is very convenient. You can now directly inspect the dataset online. \n",
"\n",
"Feel free to look through our preprocessed dataset directly on [`hf-test/sv_corpora_parliament_processed`](https://huggingface.co/datasets/hf-test/sv_corpora_parliament_processed). Even if we are not a native speaker in Swedish, we can see that the data is well processed and seems clean.\n",
"\n",
"Next, let's use the data to build a language model."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "OHQXHWZIFN6_"
},
"source": [
"## **3. Build an *n-gram* with KenLM**\n",
"\n",
"While large language models based on the [Transformer architecture](https://jalammar.github.io/illustrated-transformer/) have become the standard in NLP, it is still very common to use an ***n-gram*** LM to boost speech recognition systems - as shown in Section 1.\n",
"\n",
"Looking again at Table 9 of Appendix C of the [official Wav2Vec2 paper](https://arxiv.org/abs/2006.11477), it can be noticed that using a *Transformer*-based LM for decoding clearly yields better results than using an *n-gram* model, but the difference between *n-gram* and *Transformer*-based LM is much less significant than the difference between *n-gram* and no LM. \n",
"\n",
"*E.g.*, for the large Wav2Vec2 checkpoint that was fine-tuned on 10min only, an *n-gram* reduces the word error rate (WER) compared to no LM by *ca.* 80% while a *Transformer*-based LM *only* reduces the WER by another 23% compared to the *n-gram*. This relative WER reduction becomes less, the more data the acoustic model has been trained on. *E.g.*, for the large checkpoint a *Transformer*-based LM reduces the WER by merely 8% compared to an *n-gram* LM whereas the *n-gram* still yields a 21% WER reduction compared to no language model.\n",
"\n",
"The reason why an *n-gram* is preferred over a *Transformer*-based LM is that *n-grams* come at a significantly smaller computational cost. For an *n-gram*, retrieving the probability of a word given previous words is almost only as computationally expensive as querying a look-up table or tree-like data storage - *i.e.* it's very fast compared to modern *Transformer*-based language models that would require a full forward pass to retrieve the next word probabilities.\n",
"\n",
"For more information on how *n-grams* function and why they are (still) so useful for speech recognition, the reader is advised to take a look at [this excellent summary](https://web.stanford.edu/~jurafsky/slp3/3.pdf) from Stanford."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "B4pX7mEXOH_7"
},
"source": [
"Great, let's see step-by-step how to build an *n-gram*. We will use the popular [KenLM library](https://github.com/kpu/kenlm) to do so. Let's start by installing the Ubuntu library prerequisites:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "lseJ_uYmNoJM",
"outputId": "168eb896-f722-4d56-a26f-6b7c13923987"
},
"outputs": [
{
"name": "stdin",
"output_type": "stream",
"text": [
" ·····································\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/bin/sh: line 1: sudo: command not found\n"
]
},
{
"data": {
"text/plain": [
"38"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import getpass\n",
"import os\n",
"\n",
"password = getpass.getpass()\n",
"command = \"sudo -S apt-get update\" # can be any command but don't forget -S as it enables input from stdin\n",
"os.popen(command, 'w').write(password+'\\n') # newline char is important otherwise prompt will wait for you to manually perform newline"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"id": "FKMMWfVQp_gP",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "065e80c8-5f87-46aa-846b-9fb3a9b108be"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Reading package lists... Done\n",
"Building dependency tree \n",
"Reading state information... Done\n",
"build-essential is already the newest version (12.4ubuntu1).\n",
"libboost-program-options-dev is already the newest version (1.65.1.0ubuntu1).\n",
"libboost-program-options-dev set to manually installed.\n",
"libboost-system-dev is already the newest version (1.65.1.0ubuntu1).\n",
"libboost-system-dev set to manually installed.\n",
"libboost-thread-dev is already the newest version (1.65.1.0ubuntu1).\n",
"libboost-thread-dev set to manually installed.\n",
"liblzma-dev is already the newest version (5.2.2-1.3).\n",
"liblzma-dev set to manually installed.\n",
"zlib1g-dev is already the newest version (1:1.2.11.dfsg-0ubuntu2).\n",
"zlib1g-dev set to manually installed.\n",
"libboost-test-dev is already the newest version (1.65.1.0ubuntu1).\n",
"libboost-test-dev set to manually installed.\n",
"cmake is already the newest version (3.10.2-1ubuntu2.18.04.2).\n",
"libbz2-dev is already the newest version (1.0.6-8.1ubuntu0.2).\n",
"libbz2-dev set to manually installed.\n",
"The following packages were automatically installed and are no longer required:\n",
" cuda-command-line-tools-10-0 cuda-command-line-tools-10-1\n",
" cuda-command-line-tools-11-0 cuda-compiler-10-0 cuda-compiler-10-1\n",
" cuda-compiler-11-0 cuda-cuobjdump-10-0 cuda-cuobjdump-10-1\n",
" cuda-cuobjdump-11-0 cuda-cupti-10-0 cuda-cupti-10-1 cuda-cupti-11-0\n",
" cuda-cupti-dev-11-0 cuda-documentation-10-0 cuda-documentation-10-1\n",
" cuda-documentation-11-0 cuda-documentation-11-1 cuda-gdb-10-0 cuda-gdb-10-1\n",
" cuda-gdb-11-0 cuda-gpu-library-advisor-10-0 cuda-gpu-library-advisor-10-1\n",
" cuda-libraries-10-0 cuda-libraries-10-1 cuda-libraries-11-0\n",
" cuda-memcheck-10-0 cuda-memcheck-10-1 cuda-memcheck-11-0 cuda-nsight-10-0\n",
" cuda-nsight-10-1 cuda-nsight-11-0 cuda-nsight-11-1 cuda-nsight-compute-10-0\n",
" cuda-nsight-compute-10-1 cuda-nsight-compute-11-0 cuda-nsight-compute-11-1\n",
" cuda-nsight-systems-10-1 cuda-nsight-systems-11-0 cuda-nsight-systems-11-1\n",
" cuda-nvcc-10-0 cuda-nvcc-10-1 cuda-nvcc-11-0 cuda-nvdisasm-10-0\n",
" cuda-nvdisasm-10-1 cuda-nvdisasm-11-0 cuda-nvml-dev-10-0 cuda-nvml-dev-10-1\n",
" cuda-nvml-dev-11-0 cuda-nvprof-10-0 cuda-nvprof-10-1 cuda-nvprof-11-0\n",
" cuda-nvprune-10-0 cuda-nvprune-10-1 cuda-nvprune-11-0 cuda-nvtx-10-0\n",
" cuda-nvtx-10-1 cuda-nvtx-11-0 cuda-nvvp-10-0 cuda-nvvp-10-1 cuda-nvvp-11-0\n",
" cuda-nvvp-11-1 cuda-samples-10-0 cuda-samples-10-1 cuda-samples-11-0\n",
" cuda-samples-11-1 cuda-sanitizer-11-0 cuda-sanitizer-api-10-1\n",
" cuda-toolkit-10-0 cuda-toolkit-10-1 cuda-toolkit-11-0 cuda-toolkit-11-1\n",
" cuda-tools-10-0 cuda-tools-10-1 cuda-tools-11-0 cuda-tools-11-1\n",
" cuda-visual-tools-10-0 cuda-visual-tools-10-1 cuda-visual-tools-11-0\n",
" cuda-visual-tools-11-1 default-jre dkms freeglut3 freeglut3-dev\n",
" keyboard-configuration libargon2-0 libcap2 libcryptsetup12\n",
" libdevmapper1.02.1 libfontenc1 libidn11 libip4tc0 libjansson4\n",
" libnvidia-cfg1-510 libnvidia-common-460 libnvidia-common-510\n",
" libnvidia-extra-510 libnvidia-fbc1-510 libnvidia-gl-510 libpam-systemd\n",
" libpolkit-agent-1-0 libpolkit-backend-1-0 libpolkit-gobject-1-0 libxfont2\n",
" libxi-dev libxkbfile1 libxmu-dev libxmu-headers libxnvctrl0 libxtst6\n",
" nsight-compute-2020.2.1 nsight-compute-2022.1.0 nsight-systems-2020.3.2\n",
" nsight-systems-2020.3.4 nsight-systems-2021.5.2 nvidia-dkms-510\n",
" nvidia-kernel-common-510 nvidia-kernel-source-510 nvidia-modprobe\n",
" nvidia-settings openjdk-11-jre policykit-1 policykit-1-gnome python3-xkit\n",
" screen-resolution-extra systemd systemd-sysv udev x11-xkb-utils\n",
" xserver-common xserver-xorg-core-hwe-18.04 xserver-xorg-video-nvidia-510\n",
"Use 'sudo apt autoremove' to remove them.\n",
"Suggested packages:\n",
" libeigen3-doc libmrpt-dev\n",
"The following NEW packages will be installed:\n",
" libeigen3-dev\n",
"0 upgraded, 1 newly installed, 0 to remove and 39 not upgraded.\n",
"Need to get 810 kB of archives.\n",
"After this operation, 7,128 kB of additional disk space will be used.\n",
"Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 libeigen3-dev all 3.3.4-4 [810 kB]\n",
"Fetched 810 kB in 1s (661 kB/s)\n",
"debconf: unable to initialize frontend: Dialog\n",
"debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76, <> line 1.)\n",
"debconf: falling back to frontend: Readline\n",
"debconf: unable to initialize frontend: Readline\n",
"debconf: (This frontend requires a controlling tty.)\n",
"debconf: falling back to frontend: Teletype\n",
"dpkg-preconfigure: unable to re-open stdin: \n",
"Selecting previously unselected package libeigen3-dev.\n",
"(Reading database ... 155113 files and directories currently installed.)\n",
"Preparing to unpack .../libeigen3-dev_3.3.4-4_all.deb ...\n",
"Unpacking libeigen3-dev (3.3.4-4) ...\n",
"Setting up libeigen3-dev (3.3.4-4) ...\n"
]
}
],
"source": [
"!sudo apt-get install build-essential cmake libboost-system-dev libboost-thread-dev libboost-program-options-dev libboost-test-dev libeigen3-dev zlib1g-dev libbz2-dev liblzma-dev"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "JzHiJPg6OqvA"
},
"source": [
"before downloading and unpacking the KenLM repo."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"id": "J8mm4ExzqIaZ",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "3977f0d0-f360-40cf-b59e-230abd70db44"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"--2022-02-05 04:53:46-- https://kheafield.com/code/kenlm.tar.gz\n",
"Resolving kheafield.com (kheafield.com)... 35.196.63.85\n",
"Connecting to kheafield.com (kheafield.com)|35.196.63.85|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 491090 (480K) [application/x-gzip]\n",
"Saving to: ‘STDOUT’\n",
"\n",
"- 100%[===================>] 479.58K 1.67MB/s in 0.3s \n",
"\n",
"2022-02-05 04:53:47 (1.67 MB/s) - written to stdout [491090/491090]\n",
"\n"
]
}
],
"source": [
"!wget -O - https://kheafield.com/code/kenlm.tar.gz | tar xz"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "TKpjSxiDPKK-"
},
"source": [
"KenLM is written in C++, so we'll make use of `cmake` to build the binaries."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"id": "MS4mqMyZqVAI",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "8fc4a4f2-8b08-4526-fabf-bbd882038c29"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"-- The C compiler identification is GNU 7.5.0\n",
"-- The CXX compiler identification is GNU 7.5.0\n",
"-- Check for working C compiler: /usr/bin/cc\n",
"-- Check for working C compiler: /usr/bin/cc -- works\n",
"-- Detecting C compiler ABI info\n",
"-- Detecting C compiler ABI info - done\n",
"-- Detecting C compile features\n",
"-- Detecting C compile features - done\n",
"-- Check for working CXX compiler: /usr/bin/c++\n",
"-- Check for working CXX compiler: /usr/bin/c++ -- works\n",
"-- Detecting CXX compiler ABI info\n",
"-- Detecting CXX compiler ABI info - done\n",
"-- Detecting CXX compile features\n",
"-- Detecting CXX compile features - done\n",
"-- Looking for pthread.h\n",
"-- Looking for pthread.h - found\n",
"-- Looking for pthread_create\n",
"-- Looking for pthread_create - not found\n",
"-- Looking for pthread_create in pthreads\n",
"-- Looking for pthread_create in pthreads - not found\n",
"-- Looking for pthread_create in pthread\n",
"-- Looking for pthread_create in pthread - found\n",
"-- Found Threads: TRUE \n",
"-- Boost version: 1.65.1\n",
"-- Found the following Boost libraries:\n",
"-- program_options\n",
"-- system\n",
"-- thread\n",
"-- unit_test_framework\n",
"-- chrono\n",
"-- date_time\n",
"-- atomic\n",
"-- Check if compiler accepts -pthread\n",
"-- Check if compiler accepts -pthread - yes\n",
"-- Found ZLIB: /usr/lib/x86_64-linux-gnu/libz.so (found version \"1.2.11\") \n",
"-- Found BZip2: /usr/lib/x86_64-linux-gnu/libbz2.so (found version \"1.0.6\") \n",
"-- Looking for BZ2_bzCompressInit\n",
"-- Looking for BZ2_bzCompressInit - found\n",
"-- Looking for lzma_auto_decoder in /usr/lib/x86_64-linux-gnu/liblzma.so\n",
"-- Looking for lzma_auto_decoder in /usr/lib/x86_64-linux-gnu/liblzma.so - found\n",
"-- Looking for lzma_easy_encoder in /usr/lib/x86_64-linux-gnu/liblzma.so\n",
"-- Looking for lzma_easy_encoder in /usr/lib/x86_64-linux-gnu/liblzma.so - found\n",
"-- Looking for lzma_lzma_preset in /usr/lib/x86_64-linux-gnu/liblzma.so\n",
"-- Looking for lzma_lzma_preset in /usr/lib/x86_64-linux-gnu/liblzma.so - found\n",
"-- Found LibLZMA: /usr/include (found version \"5.2.2\") \n",
"-- Found OpenMP_C: -fopenmp (found version \"4.5\") \n",
"-- Found OpenMP_CXX: -fopenmp (found version \"4.5\") \n",
"-- Found OpenMP: TRUE (found version \"4.5\") \n",
"-- Configuring done\n",
"-- Generating done\n",
"-- Build files have been written to: /content/kenlm/build\n",
"\u001b[35m\u001b[1mScanning dependencies of target kenlm_util\u001b[0m\n",
"[ 1%] \u001b[32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/bignum-dtoa.cc.o\u001b[0m\n",
"[ 2%] \u001b[32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/bignum.cc.o\u001b[0m\n",
"[ 3%] \u001b[32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/cached-powers.cc.o\u001b[0m\n",
"[ 4%] \u001b[32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/diy-fp.cc.o\u001b[0m\n",
"[ 5%] \u001b[32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/double-conversion.cc.o\u001b[0m\n",
"[ 6%] \u001b[32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/fast-dtoa.cc.o\u001b[0m\n",
"[ 7%] \u001b[32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/fixed-dtoa.cc.o\u001b[0m\n",
"[ 8%] \u001b[32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/strtod.cc.o\u001b[0m\n",
"[ 9%] \u001b[32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/stream/chain.cc.o\u001b[0m\n",
"[ 10%] \u001b[32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/stream/count_records.cc.o\u001b[0m\n",
"[ 11%] \u001b[32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/stream/io.cc.o\u001b[0m\n",
"[ 12%] \u001b[32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/stream/line_input.cc.o\u001b[0m\n",
"[ 13%] \u001b[32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/stream/multi_progress.cc.o\u001b[0m\n",
"[ 14%] \u001b[32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/stream/rewindable_stream.cc.o\u001b[0m\n",
"[ 15%] \u001b[32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/bit_packing.cc.o\u001b[0m\n",
"[ 16%] \u001b[32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/ersatz_progress.cc.o\u001b[0m\n",
"[ 17%] \u001b[32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/exception.cc.o\u001b[0m\n",
"[ 18%] \u001b[32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/file.cc.o\u001b[0m\n",
"[ 19%] \u001b[32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/file_piece.cc.o\u001b[0m\n",
"[ 20%] \u001b[32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/float_to_string.cc.o\u001b[0m\n",
"[ 21%] \u001b[32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/integer_to_string.cc.o\u001b[0m\n",
"[ 22%] \u001b[32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/mmap.cc.o\u001b[0m\n",
"[ 23%] \u001b[32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/murmur_hash.cc.o\u001b[0m\n",
"[ 25%] \u001b[32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/parallel_read.cc.o\u001b[0m\n",
"[ 26%] \u001b[32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/pool.cc.o\u001b[0m\n",
"[ 27%] \u001b[32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/read_compressed.cc.o\u001b[0m\n",
"[ 28%] \u001b[32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/scoped.cc.o\u001b[0m\n",
"[ 29%] \u001b[32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/spaces.cc.o\u001b[0m\n",
"[ 31%] \u001b[32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/usage.cc.o\u001b[0m\n",
"[ 31%] \u001b[32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/string_piece.cc.o\u001b[0m\n",
"[ 32%] \u001b[32m\u001b[1mLinking CXX static library ../lib/libkenlm_util.a\u001b[0m\n",
"[ 32%] Built target kenlm_util\n",
"\u001b[35m\u001b[1mScanning dependencies of target probing_hash_table_benchmark\u001b[0m\n",
"\u001b[35m\u001b[1mScanning dependencies of target kenlm\u001b[0m\n",
"[ 33%] \u001b[32mBuilding CXX object util/CMakeFiles/probing_hash_table_benchmark.dir/probing_hash_table_benchmark_main.cc.o\u001b[0m\n",
"[ 34%] \u001b[32mBuilding CXX object lm/CMakeFiles/kenlm.dir/bhiksha.cc.o\u001b[0m\n",
"[ 35%] \u001b[32mBuilding CXX object lm/CMakeFiles/kenlm.dir/binary_format.cc.o\u001b[0m\n",
"[ 36%] \u001b[32mBuilding CXX object lm/CMakeFiles/kenlm.dir/config.cc.o\u001b[0m\n",
"[ 37%] \u001b[32mBuilding CXX object lm/CMakeFiles/kenlm.dir/lm_exception.cc.o\u001b[0m\n",
"[ 38%] \u001b[32mBuilding CXX object lm/CMakeFiles/kenlm.dir/model.cc.o\u001b[0m\n",
"[ 39%] \u001b[32mBuilding CXX object lm/CMakeFiles/kenlm.dir/quantize.cc.o\u001b[0m\n",
"[ 40%] \u001b[32mBuilding CXX object lm/CMakeFiles/kenlm.dir/read_arpa.cc.o\u001b[0m\n",
"[ 41%] \u001b[32m\u001b[1mLinking CXX executable ../bin/probing_hash_table_benchmark\u001b[0m\n",
"[ 41%] Built target probing_hash_table_benchmark\n",
"\u001b[35m\u001b[1mScanning dependencies of target kenlm_filter\u001b[0m\n",
"[ 42%] \u001b[32mBuilding CXX object lm/filter/CMakeFiles/kenlm_filter.dir/arpa_io.cc.o\u001b[0m\n",
"[ 43%] \u001b[32mBuilding CXX object lm/CMakeFiles/kenlm.dir/search_hashed.cc.o\u001b[0m\n",
"[ 44%] \u001b[32mBuilding CXX object lm/filter/CMakeFiles/kenlm_filter.dir/phrase.cc.o\u001b[0m\n",
"[ 45%] \u001b[32mBuilding CXX object lm/CMakeFiles/kenlm.dir/search_trie.cc.o\u001b[0m\n",
"[ 46%] \u001b[32mBuilding CXX object lm/filter/CMakeFiles/kenlm_filter.dir/vocab.cc.o\u001b[0m\n",
"[ 47%] \u001b[32m\u001b[1mLinking CXX static library ../../lib/libkenlm_filter.a\u001b[0m\n",
"[ 47%] Built target kenlm_filter\n",
"[ 48%] \u001b[32mBuilding CXX object lm/CMakeFiles/kenlm.dir/sizes.cc.o\u001b[0m\n",
"[ 50%] \u001b[32mBuilding CXX object lm/CMakeFiles/kenlm.dir/trie.cc.o\u001b[0m\n",
"[ 51%] \u001b[32mBuilding CXX object lm/CMakeFiles/kenlm.dir/trie_sort.cc.o\u001b[0m\n",
"[ 52%] \u001b[32mBuilding CXX object lm/CMakeFiles/kenlm.dir/value_build.cc.o\u001b[0m\n",
"[ 53%] \u001b[32mBuilding CXX object lm/CMakeFiles/kenlm.dir/virtual_interface.cc.o\u001b[0m\n",
"[ 54%] \u001b[32mBuilding CXX object lm/CMakeFiles/kenlm.dir/vocab.cc.o\u001b[0m\n",
"[ 55%] \u001b[32mBuilding CXX object lm/CMakeFiles/kenlm.dir/common/model_buffer.cc.o\u001b[0m\n",
"[ 56%] \u001b[32mBuilding CXX object lm/CMakeFiles/kenlm.dir/common/print.cc.o\u001b[0m\n",
"[ 57%] \u001b[32mBuilding CXX object lm/CMakeFiles/kenlm.dir/common/renumber.cc.o\u001b[0m\n",
"[ 58%] \u001b[32mBuilding CXX object lm/CMakeFiles/kenlm.dir/common/size_option.cc.o\u001b[0m\n",
"[ 59%] \u001b[32m\u001b[1mLinking CXX static library ../lib/libkenlm.a\u001b[0m\n",
"[ 59%] Built target kenlm\n",
"\u001b[35m\u001b[1mScanning dependencies of target fragment\u001b[0m\n",
"\u001b[35m\u001b[1mScanning dependencies of target build_binary\u001b[0m\n",
"[ 60%] \u001b[32mBuilding CXX object lm/CMakeFiles/fragment.dir/fragment_main.cc.o\u001b[0m\n",
"[ 61%] \u001b[32mBuilding CXX object lm/CMakeFiles/build_binary.dir/build_binary_main.cc.o\u001b[0m\n",
"[ 62%] \u001b[32m\u001b[1mLinking CXX executable ../bin/fragment\u001b[0m\n",
"[ 62%] Built target fragment\n",
"\u001b[35m\u001b[1mScanning dependencies of target kenlm_benchmark\u001b[0m\n",
"[ 63%] \u001b[32m\u001b[1mLinking CXX executable ../bin/build_binary\u001b[0m\n",
"[ 64%] \u001b[32mBuilding CXX object lm/CMakeFiles/kenlm_benchmark.dir/kenlm_benchmark_main.cc.o\u001b[0m\n",
"[ 64%] Built target build_binary\n",
"\u001b[35m\u001b[1mScanning dependencies of target query\u001b[0m\n",
"[ 65%] \u001b[32mBuilding CXX object lm/CMakeFiles/query.dir/query_main.cc.o\u001b[0m\n",
"[ 66%] \u001b[32m\u001b[1mLinking CXX executable ../bin/query\u001b[0m\n",
"[ 66%] Built target query\n",
"\u001b[35m\u001b[1mScanning dependencies of target kenlm_builder\u001b[0m\n",
"[ 67%] \u001b[32mBuilding CXX object lm/builder/CMakeFiles/kenlm_builder.dir/adjust_counts.cc.o\u001b[0m\n",
"[ 68%] \u001b[32mBuilding CXX object lm/builder/CMakeFiles/kenlm_builder.dir/corpus_count.cc.o\u001b[0m\n",
"[ 69%] \u001b[32mBuilding CXX object lm/builder/CMakeFiles/kenlm_builder.dir/initial_probabilities.cc.o\u001b[0m\n",
"[ 70%] \u001b[32mBuilding CXX object lm/builder/CMakeFiles/kenlm_builder.dir/interpolate.cc.o\u001b[0m\n",
"[ 71%] \u001b[32m\u001b[1mLinking CXX executable ../bin/kenlm_benchmark\u001b[0m\n",
"[ 71%] Built target kenlm_benchmark\n",
"\u001b[35m\u001b[1mScanning dependencies of target phrase_table_vocab\u001b[0m\n",
"[ 72%] \u001b[32mBuilding CXX object lm/filter/CMakeFiles/phrase_table_vocab.dir/phrase_table_vocab_main.cc.o\u001b[0m\n",
"[ 73%] \u001b[32mBuilding CXX object lm/builder/CMakeFiles/kenlm_builder.dir/output.cc.o\u001b[0m\n",
"[ 75%] \u001b[32m\u001b[1mLinking CXX executable ../../bin/phrase_table_vocab\u001b[0m\n",
"[ 75%] Built target phrase_table_vocab\n",
"\u001b[35m\u001b[1mScanning dependencies of target filter\u001b[0m\n",
"[ 76%] \u001b[32mBuilding CXX object lm/filter/CMakeFiles/filter.dir/filter_main.cc.o\u001b[0m\n",
"[ 77%] \u001b[32mBuilding CXX object lm/builder/CMakeFiles/kenlm_builder.dir/pipeline.cc.o\u001b[0m\n",
"[ 78%] \u001b[32m\u001b[1mLinking CXX static library ../../lib/libkenlm_builder.a\u001b[0m\n",
"[ 78%] Built target kenlm_builder\n",
"\u001b[35m\u001b[1mScanning dependencies of target kenlm_interpolate\u001b[0m\n",
"[ 79%] \u001b[32mBuilding CXX object lm/interpolate/CMakeFiles/kenlm_interpolate.dir/backoff_reunification.cc.o\u001b[0m\n",
"[ 80%] \u001b[32m\u001b[1mLinking CXX executable ../../bin/filter\u001b[0m\n",
"[ 81%] \u001b[32mBuilding CXX object lm/interpolate/CMakeFiles/kenlm_interpolate.dir/bounded_sequence_encoding.cc.o\u001b[0m\n",
"[ 81%] Built target filter\n",
"\u001b[35m\u001b[1mScanning dependencies of target count_ngrams\u001b[0m\n",
"[ 82%] \u001b[32mBuilding CXX object lm/builder/CMakeFiles/count_ngrams.dir/count_ngrams_main.cc.o\u001b[0m\n",
"[ 83%] \u001b[32mBuilding CXX object lm/interpolate/CMakeFiles/kenlm_interpolate.dir/merge_probabilities.cc.o\u001b[0m\n",
"[ 84%] \u001b[32mBuilding CXX object lm/interpolate/CMakeFiles/kenlm_interpolate.dir/merge_vocab.cc.o\u001b[0m\n",
"[ 85%] \u001b[32mBuilding CXX object lm/interpolate/CMakeFiles/kenlm_interpolate.dir/normalize.cc.o\u001b[0m\n",
"[ 86%] \u001b[32m\u001b[1mLinking CXX executable ../../bin/count_ngrams\u001b[0m\n",
"[ 87%] \u001b[32mBuilding CXX object lm/interpolate/CMakeFiles/kenlm_interpolate.dir/pipeline.cc.o\u001b[0m\n",
"[ 87%] Built target count_ngrams\n",
"\u001b[35m\u001b[1mScanning dependencies of target lmplz\u001b[0m\n",
"[ 88%] \u001b[32mBuilding CXX object lm/builder/CMakeFiles/lmplz.dir/lmplz_main.cc.o\u001b[0m\n",
"[ 89%] \u001b[32m\u001b[1mLinking CXX executable ../../bin/lmplz\u001b[0m\n",
"[ 89%] Built target lmplz\n",
"[ 90%] \u001b[32mBuilding CXX object lm/interpolate/CMakeFiles/kenlm_interpolate.dir/split_worker.cc.o\u001b[0m\n",
"[ 91%] \u001b[32mBuilding CXX object lm/interpolate/CMakeFiles/kenlm_interpolate.dir/tune_derivatives.cc.o\u001b[0m\n",
"[ 92%] \u001b[32mBuilding CXX object lm/interpolate/CMakeFiles/kenlm_interpolate.dir/tune_instances.cc.o\u001b[0m\n",
"[ 93%] \u001b[32mBuilding CXX object lm/interpolate/CMakeFiles/kenlm_interpolate.dir/tune_weights.cc.o\u001b[0m\n",
"[ 94%] \u001b[32mBuilding CXX object lm/interpolate/CMakeFiles/kenlm_interpolate.dir/universal_vocab.cc.o\u001b[0m\n",
"[ 95%] \u001b[32m\u001b[1mLinking CXX static library ../../lib/libkenlm_interpolate.a\u001b[0m\n",
"[ 95%] Built target kenlm_interpolate\n",
"\u001b[35m\u001b[1mScanning dependencies of target streaming_example\u001b[0m\n",
"\u001b[35m\u001b[1mScanning dependencies of target interpolate\u001b[0m\n",
"[ 96%] \u001b[32mBuilding CXX object lm/interpolate/CMakeFiles/streaming_example.dir/streaming_example_main.cc.o\u001b[0m\n",
"[ 97%] \u001b[32mBuilding CXX object lm/interpolate/CMakeFiles/interpolate.dir/interpolate_main.cc.o\u001b[0m\n",
"[ 98%] \u001b[32m\u001b[1mLinking CXX executable ../../bin/interpolate\u001b[0m\n",
"[ 98%] Built target interpolate\n",
"[100%] \u001b[32m\u001b[1mLinking CXX executable ../../bin/streaming_example\u001b[0m\n",
"[100%] Built target streaming_example\n",
"build_binary fragment\t lmplz\t\t\t query\n",
"count_ngrams interpolate phrase_table_vocab\t streaming_example\n",
"filter\t kenlm_benchmark probing_hash_table_benchmark\n"
]
}
],
"source": [
"!mkdir kenlm/build && cd kenlm/build && cmake .. && make -j2\n",
"!ls kenlm/build/bin"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "N9D7JvVuPTOz"
},
"source": [
"Great, as we can see, the executable functions have successfully been built under `kenlm/build/bin/`.\n",
"\n",
"KenLM by default computes an *n-gram* with [Kneser-Ney smooting](https://en.wikipedia.org/wiki/Kneser%E2%80%93Ney_smoothing). All text data used to create the *n-gram* is expected to be stored in a text file.\n",
"We download our dataset and save it as a `.txt` file."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"id": "VIgErMqApENm",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 217,
"referenced_widgets": [
"c600c14d43bc4ce087292ab7ecba7b82",
"b56947879e764c59a50dd8029f7ba024",
"4390697c0a3c4b8e99981c3b6c481730",
"5a582babdfcc4069b751ed8873c75d57",
"3055fb1d66bc44c2af2151833de0c08f",
"6ae456c84fd047388640754082802eca",
"84a82690b68f4bf284f4f6808db41efb",
"eda966270a8042f68f9e16db670f13fe",
"fa0aeb1c2ce149c8bfded42526893694",
"d8e87a5cbcbe421b9f1723280db3b821",
"3da1354f62f94f8eb38f6f64b089b600",
"4b332599027046f881e5166a43a0ea06",
"bd24df614b3746909a7b4ede68615ff5",
"da9014bd6d9f452a9af2e3d7019bcc62",
"7a8bc9fd2b91477f95943f2658bcb6be",
"6078c7718ef041a4964a6a6638c8ebc4",
"bc2c75c27ed84a8394ec2e88b642f39d",
"576a3745924e49148323cf37fdf23501",
"406bd04592b4428db058aed30b1ea657",
"f6d9a4de2f4b43baaddd1860f39603d5",
"d9d8f94214874beab17893c14f34029d",
"340ffa930c5948c0a906e83900835078",
"3acc38512cf247d1b642c35c7be84afc",
"881a35658b1442eda6fcf46249363a44",
"e91a935aedfd4dd9a21b70494ff6f542",
"d1b8438266994972a4be7bd20bcb7fc3",
"42e00e6e6e4b4b679c74b1818024cb43",
"cb4012a29b144b0cb252c1976f5cd275",
"45215ac3e54a4cb08e83fa03b4d49976",
"8e83e3877fb04eb6a158adceb983a85c",
"4b2d00e46a7a4fe896703a52c010aaa6",
"cbcc10764b604098951e6371fbd6f480",
"cb90869a6bc248008e2bc31e068d68d0",
"a8c9482eab0f4738b4f45c045a0a1074",
"cf8518414b484434a82b5933f5709cad",
"1a217cc1d9bf49bdaac6e63392acbc48",
"803ecbcbf8bd4292a0a0eb84466da050",
"b8983f9825a74d9287ba38da0fa2fd49",
"9828498cd0ae4df2886e8b7ec89a897b",
"56a885f1a3ee41558db5fc264d3c1b25",
"97050804252e4319be2ac806ccd5106b",
"ee8e480112e443c3b5646a36d0eb8d6b",
"7118b4831e714b63a0ca36458eb29426",
"ebaad2c1a3d24f318b9906d74ac20f04"
]
},
"outputId": "99038cf6-fe03-41e7-ace6-1b28e909ef21"
},
"outputs": [
{
"output_type": "display_data",
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "c600c14d43bc4ce087292ab7ecba7b82",
"version_minor": 0,
"version_major": 2
},
"text/plain": [
"Downloading: 0%| | 0.00/1.67k [00:00, ?B/s]"
]
},
"metadata": {}
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"Using custom data configuration kingabzpro--ar_corpora_parliament_processed-7ee75506d2f6dcc4\n"
]
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"Downloading and preparing dataset news_commentary/ar-en (download: 24.36 MiB, generated: 50.92 MiB, post-processed: Unknown size, total: 75.28 MiB) to /root/.cache/huggingface/datasets/parquet/kingabzpro--ar_corpora_parliament_processed-7ee75506d2f6dcc4/0.0.0/0b6d5799bb726b24ad7fc7be720c170d8e497f575d02d47537de9a5bac074901...\n"
]
},
{
"output_type": "display_data",
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "4b332599027046f881e5166a43a0ea06",
"version_minor": 0,
"version_major": 2
},
"text/plain": [
" 0%| | 0/1 [00:00, ?it/s]"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "3acc38512cf247d1b642c35c7be84afc",
"version_minor": 0,
"version_major": 2
},
"text/plain": [
"Downloading: 0%| | 0.00/25.5M [00:00, ?B/s]"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "a8c9482eab0f4738b4f45c045a0a1074",
"version_minor": 0,
"version_major": 2
},
"text/plain": [
" 0%| | 0/1 [00:00, ?it/s]"
]
},
"metadata": {}
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"Dataset parquet downloaded and prepared to /root/.cache/huggingface/datasets/parquet/kingabzpro--ar_corpora_parliament_processed-7ee75506d2f6dcc4/0.0.0/0b6d5799bb726b24ad7fc7be720c170d8e497f575d02d47537de9a5bac074901. Subsequent calls will reuse this data.\n"
]
}
],
"source": [
"from datasets import load_dataset\n",
"\n",
"username = \"kingabzpro\" # change to your username\n",
"\n",
"dataset = load_dataset(f\"{username}/ar_corpora_parliament_processed\", split=\"train\")\n",
"\n",
"with open(\"text.txt\", \"w\") as file:\n",
" file.write(\" \".join(dataset[\"text\"]))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5SAQ0NszQyFC"
},
"source": [
"Now, we just have to run KenLM's `lmplz` command to build our *n-gram*, called `\"5gram.arpa\"`. As it's relatively common in speech recognition, we build a *5-gram* by passing the `-o 5` parameter.\n",
"For more information on the different *n-gram* LM that can be built \n",
"with KenLM, one can take a look at the [official website of KenLM](https://kheafield.com/code/kenlm/).\n",
"\n",
"Executing the command below might take a minute or so."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "_MdDNBlZrPOm",
"outputId": "a53b505f-1a16-4ba1-dbc0-82da856a06c6"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"=== 1/5 Counting and sorting n-grams ===\n",
"Reading /content/text.txt\n",
"----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100\n",
"tcmalloc: large alloc 1918697472 bytes == 0x55e8ed702000 @ 0x7f5cedcad1e7 0x55e8ec2557a2 0x55e8ec1f051e 0x55e8ec1cf2eb 0x55e8ec1bb066 0x7f5cebe46bf7 0x55e8ec1bcbaa\n",
"tcmalloc: large alloc 8953896960 bytes == 0x55e95fcd2000 @ 0x7f5cedcad1e7 0x55e8ec2557a2 0x55e8ec2447ca 0x55e8ec245208 0x55e8ec1cf308 0x55e8ec1bb066 0x7f5cebe46bf7 0x55e8ec1bcbaa\n",
"****************************************************************************************************\n",
"Unigram tokens 4953590 types 154929\n",
"=== 2/5 Calculating and sorting adjusted counts ===\n",
"Chain sizes: 1:1859148 2:1063013888 3:1993151104 4:3189041664 5:4650685952\n",
"tcmalloc: large alloc 4650688512 bytes == 0x55e8ed702000 @ 0x7f5cedcad1e7 0x55e8ec2557a2 0x55e8ec2447ca 0x55e8ec245208 0x55e8ec1cf8d7 0x55e8ec1bb066 0x7f5cebe46bf7 0x55e8ec1bcbaa\n",
"tcmalloc: large alloc 1993154560 bytes == 0x55ea421d4000 @ 0x7f5cedcad1e7 0x55e8ec2557a2 0x55e8ec2447ca 0x55e8ec245208 0x55e8ec1cfcdd 0x55e8ec1bb066 0x7f5cebe46bf7 0x55e8ec1bcbaa\n",
"tcmalloc: large alloc 3189047296 bytes == 0x55eb76098000 @ 0x7f5cedcad1e7 0x55e8ec2557a2 0x55e8ec2447ca 0x55e8ec245208 0x55e8ec1cfcdd 0x55e8ec1bb066 0x7f5cebe46bf7 0x55e8ec1bcbaa\n",
"Statistics:\n",
"1 154928 D1=0.626912 D2=1.01472 D3+=1.4089\n",
"2 1951043 D1=0.806457 D2=1.10655 D3+=1.36765\n",
"3 3808342 D1=0.910118 D2=1.23485 D3+=1.40391\n",
"4 4594632 D1=0.964725 D2=1.34642 D3+=1.45218\n",
"5 4848900 D1=0.98377 D2=1.45145 D3+=1.49256\n",
"Memory estimate for binary LM:\n",
"type MB\n",
"probing 324 assuming -p 1.5\n",
"probing 383 assuming -r models -p 1.5\n",
"trie 160 without quantization\n",
"trie 88 assuming -q 8 -b 8 quantization \n",
"trie 140 assuming -a 22 array pointer compression\n",
"trie 69 assuming -a 22 -q 8 -b 8 array pointer compression and quantization\n",
"=== 3/5 Calculating and sorting initial probabilities ===\n",
"Chain sizes: 1:1859136 2:31216688 3:76166840 4:110271168 5:135769200\n",
"----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100\n",
"####################################################################################################\n",
"=== 4/5 Calculating and writing order-interpolated probabilities ===\n",
"Chain sizes: 1:1859136 2:31216688 3:76166840 4:110271168 5:135769200\n",
"----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100\n",
"####################################################################################################\n",
"=== 5/5 Writing ARPA model ===\n",
"----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100\n",
"****************************************************************************************************\n",
"Name:lmplz\tVmPeak:14182240 kB\tVmRSS:2689208 kB\tRSSMax:2689364 kB\tuser:16.707\tsys:5.20417\tCPU:21.9112\treal:28.821\n"
]
}
],
"source": [
"\n",
"!kenlm/build/bin/lmplz -o 5 <\"text.txt\" > \"5gram.arpa\""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1_58ktqcTBYi"
},
"source": [
"Great, we have built a *5-gram* LM! Let's inspect the first couple of lines."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "TRnV8Miusl--",
"outputId": "81d9f405-8183-4011-bd1d-4eaf7811cdfd"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"\\data\\\n",
"ngram 1=154928\n",
"ngram 2=1951043\n",
"ngram 3=3808342\n",
"ngram 4=4594632\n",
"ngram 5=4848900\n",
"\n",
"\\1-grams:\n",
"-6.2970614\t\t0\n",
"0\t\t-0.09341879\n",
"-4.2426977\tالذهب\t-0.2197705\n",
"-5.265963\tبعشرة\t-0.1773004\n",
"-4.3284874\tآلاف\t-0.29791862\n",
"-4.0686555\tدولار\t-0.33736664\n",
"-4.603849\tسان\t-0.4746134\n",
"-5.4108515\tفرانسيسكو\t-0.11966315\n",
"-2.7832785\tلم\t-0.62132126\n",
"-4.8193336\tيكن\t-0.12888719\n",
"-1.9863459\tمن\t-0.8163693\n",
"-4.8497424\tالسهل\t-0.14798915\n"
]
}
],
"source": [
"!head -20 5gram.arpa"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "l3jfwr2RTKPn"
},
"source": [
"There is a small problem that 🤗 Transformers will not be happy about later on.\n",
"The *5-gram* correctly includes a \"Unknown\" or ``, as well as a *begin-of-sentence*, `` token, but no *end-of-sentence*, `` token.\n",
"This sadly has to be corrected currently after the build.\n",
"\n",
"We can simply add the *end-of-sentence* token by adding the line `0 -0.11831701` below the *begin-of-sentence* token and increasing the `ngram 1` count by 1. Because the file has roughly 100 million lines, this command will take *ca.* 2 minutes."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"id": "_7u7dVPkvyRZ"
},
"outputs": [],
"source": [
"with open(\"5gram.arpa\", \"r\") as read_file, open(\"5gram_correct.arpa\", \"w\") as write_file:\n",
" has_added_eos = False\n",
" for line in read_file:\n",
" if not has_added_eos and \"ngram 1=\" in line:\n",
" count=line.strip().split(\"=\")[-1]\n",
" write_file.write(line.replace(f\"{count}\", f\"{int(count)+1}\"))\n",
" elif not has_added_eos and \"\" in line:\n",
" write_file.write(line)\n",
" write_file.write(line.replace(\"\", \"\"))\n",
" has_added_eos = True\n",
" else:\n",
" write_file.write(line)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "u9Y8uC3VW5vc"
},
"source": [
"Let's now inspect the corrected *5-gram*."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "YF1RSm-Pxst5",
"outputId": "cc06a4bd-d8cf-4ebc-9381-a1751c56b9df"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"\\data\\\n",
"ngram 1=154929\n",
"ngram 2=1951043\n",
"ngram 3=3808342\n",
"ngram 4=4594632\n",
"ngram 5=4848900\n",
"\n",
"\\1-grams:\n",
"-6.2970614\t\t0\n",
"0\t\t-0.09341879\n",
"0\t\t-0.09341879\n",
"-4.2426977\tالذهب\t-0.2197705\n",
"-5.265963\tبعشرة\t-0.1773004\n",
"-4.3284874\tآلاف\t-0.29791862\n",
"-4.0686555\tدولار\t-0.33736664\n",
"-4.603849\tسان\t-0.4746134\n",
"-5.4108515\tفرانسيسكو\t-0.11966315\n",
"-2.7832785\tلم\t-0.62132126\n",
"-4.8193336\tيكن\t-0.12888719\n",
"-1.9863459\tمن\t-0.8163693\n"
]
}
],
"source": [
"!head -20 5gram_correct.arpa"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "m7NfKtyjXCiE"
},
"source": [
"Great, this looks better! We're done at this point and all that is left to do is to correctly integrate the `\"ngram\"` with [`pyctcdecode`](https://github.com/kensho-technologies/pyctcdecode) and 🤗 Transformers."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "kPImHXzDG8aH"
},
"source": [
"## **4. Combine an *n-gram* with Wav2Vec2**"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "kfZ7qYvMXfSV"
},
"source": [
"In a final step, we want to wrap the *5-gram* into a `Wav2Vec2ProcessorWithLM` object to make the *5-gram* boosted decoding as seamless as shown in Section 1.\n",
"We start by downloading the currently \"LM-less\" processor of [`xls-r-300m-sv`](https://huggingface.co/hf-test/xls-r-300m-sv)."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"id": "paV71gdAtkDC",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 177,
"referenced_widgets": [
"698d0b728b4746e7b4a1bec8d9b475ba",
"a241dfe3267841b0bfe2ee227c6c1671",
"c80c54b2337e48c999e392b0c0daab65",
"9b59c5e85ea546bd936a53ff1ece936e",
"c6e0dc5b363246549548fd3764da0d28",
"41e30f39a9f34d65922779c7c2a8b98e",
"f7469fe429e74ab6971e41b17e2cb988",
"1ed0e06794dd4d74846f362e3c2c30a2",
"7c73c747977044d2a07363d4e8f4a1c5",
"adf6b751f48c4d6f9fa1cf3f218e4226",
"eb97689e4f4a46a9bd1691b56a2473d1",
"dbdd78ff6785419bb71db5ede51f339a",
"813507347dc949f5b639fefb58f3f462",
"c11cebba65ff40ae83681d434ebe47ea",
"4c2acf65c8df4e309e56763e263943c5",
"2bb63d36fe704a728665dff3d042647f",
"4058293c10c34ae2b0ff3446abac80a4",
"6df99d787b464e948be476dc9a68ada8",
"3e2fc0a577634ef2b691e685aa560b3c",
"d8e024f11dc9449ba30471f035633a20",
"d61e6c1731b540b6a03d794f8f6b2917",
"4dfa9c9cafc3404ead4f759164cc7a88",
"0f65e709ee974823a88f7819d28a6da9",
"2f3b99d85a6a495eb9861423a3a83ba0",
"220276fb03104701a48d025d1da9220e",
"209de0d4c6a94c9cb448f5d82011b941",
"e45502db00094658b19bd14876220095",
"f4334884958c4c179e8ed6d5adaea268",
"378d560bd8b341abbac878b85efe3b76",
"fdb329d825234d198d608c99eb1cfbf2",
"17f98118face47319366faf10404ac8e",
"eb7d7f396db041e183f204960ad6347d",
"cb1b5250e8ae4706a9d8dd0eb7abdc17",
"892e9754ffc24e71b20fd93751afbd0f",
"8edc330e2ea74e22804842e6b7b5d6c4",
"70ccf93519c34dfe91036ae605d2033d",
"7e26850063c546e181e11d96300d7cb4",
"33463ea837694d9c9b5a8fb8b5446678",
"1318e59ca3754f74931e8226e55534ad",
"e4a39d6afc82422b980f2ebfe601d4bc",
"573106ec254c424d92aa8f46f9b21059",
"0a8890bfad3f4fdca81a1c88e376af65",
"22d47e45145145c4b93af8815c72748c",
"55c04c986abd489fb443a0bf03c3c293",
"27251e4284a3416c94ae0ca49dc8dcb8",
"4a807eafb6af4015864e8e5daa2883af",
"be05ffe206b943b0adc5747b4b0bf5f7",
"4f5b99cea30146caad68f453156b2ecb",
"7b52991c22614e76a8dc34332b390eda",
"7e42a15a02784846979282c51886366b",
"497dba4eae904328999f653d66c2385f",
"6a61b52c2d504dcf9a4a4b5dce3d027c",
"14d54849b0e7466f926ae44c339ea6d5",
"572a608e21ef405f98730b1bc7658368",
"5bf72ebf0c364863b1cfc13aa6c14235"
]
},
"outputId": "6d1b5a03-8f2f-4c99-f12f-99fa6b5eaa6a"
},
"outputs": [
{
"output_type": "display_data",
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "698d0b728b4746e7b4a1bec8d9b475ba",
"version_minor": 0,
"version_major": 2
},
"text/plain": [
"Downloading: 0%| | 0.00/214 [00:00, ?B/s]"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "dbdd78ff6785419bb71db5ede51f339a",
"version_minor": 0,
"version_major": 2
},
"text/plain": [
"Downloading: 0%| | 0.00/181 [00:00, ?B/s]"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "0f65e709ee974823a88f7819d28a6da9",
"version_minor": 0,
"version_major": 2
},
"text/plain": [
"Downloading: 0%| | 0.00/2.02k [00:00, ?B/s]"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "892e9754ffc24e71b20fd93751afbd0f",
"version_minor": 0,
"version_major": 2
},
"text/plain": [
"Downloading: 0%| | 0.00/520 [00:00, ?B/s]"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "27251e4284a3416c94ae0ca49dc8dcb8",
"version_minor": 0,
"version_major": 2
},
"text/plain": [
"Downloading: 0%| | 0.00/85.0 [00:00, ?B/s]"
]
},
"metadata": {}
}
],
"source": [
"from transformers import AutoProcessor\n",
"\n",
"processor = AutoProcessor.from_pretrained(\"kingabzpro/wav2vec2-large-xlsr-300-arabic\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "sT0pDUmdYOx6"
},
"source": [
"Next, we extract the vocabulary of its tokenizer as it represents the `\"labels\"` of `pyctcdecode`'s `BeamSearchDecoder` class."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"id": "ZKwKxMoitoGS"
},
"outputs": [],
"source": [
"vocab_dict = processor.tokenizer.get_vocab()\n",
"sorted_vocab_dict = {k.lower(): v for k, v in sorted(vocab_dict.items(), key=lambda item: item[1])}"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "o-jUBzFXYyCZ"
},
"source": [
"The `\"labels\"` and the previously built `5gram_correct.arpa` file is all that's needed to build the decoder. "
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "zTLzCLB2tQP7",
"outputId": "89d4a0f2-a19d-4355-dcf2-6795d82e7500"
},
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": [
"Found entries of length > 1 in alphabet. This is unusual unless style is BPE, but the alphabet was not recognized as BPE type. Is this correct?\n",
"Unigrams and labels don't seem to agree.\n"
]
}
],
"source": [
"from pyctcdecode import build_ctcdecoder\n",
"\n",
"decoder = build_ctcdecoder(\n",
" labels=list(sorted_vocab_dict.keys()),\n",
" kenlm_model_path=\"5gram_correct.arpa\",\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "IzJYQsSdZQ4-"
},
"source": [
"We can safely ignore the warning and all that is left to do now is to wrap the just created `decoder`, together with the processor's `tokenizer` and `feature_extractor` into a `Wav2Vec2ProcessorWithLM` class."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"id": "VBVf50EzZgAQ"
},
"outputs": [],
"source": [
"from transformers import Wav2Vec2ProcessorWithLM\n",
"\n",
"processor_with_lm = Wav2Vec2ProcessorWithLM(\n",
" feature_extractor=processor.feature_extractor,\n",
" tokenizer=processor.tokenizer,\n",
" decoder=decoder\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "enhS2VRNZ79c"
},
"source": [
"We want to directly upload the LM-boosted processor into \n",
"the model folder of [`xls-r-300m-sv`](https://huggingface.co/hf-test/xls-r-300m-sv) to have all relevant files in one place.\n",
"\n",
"Let's clone the repo, add the new decoder files and upload them afterward.\n",
"First, we need to install `git-lfs`."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"id": "BZZm3ECc5TMP",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "3cb8115b-f9d0-455c-dfee-03d624530c6a"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Reading package lists... Done\n",
"Building dependency tree \n",
"Reading state information... Done\n",
"The following packages were automatically installed and are no longer required:\n",
" cuda-command-line-tools-10-0 cuda-command-line-tools-10-1\n",
" cuda-command-line-tools-11-0 cuda-compiler-10-0 cuda-compiler-10-1\n",
" cuda-compiler-11-0 cuda-cuobjdump-10-0 cuda-cuobjdump-10-1\n",
" cuda-cuobjdump-11-0 cuda-cupti-10-0 cuda-cupti-10-1 cuda-cupti-11-0\n",
" cuda-cupti-dev-11-0 cuda-documentation-10-0 cuda-documentation-10-1\n",
" cuda-documentation-11-0 cuda-documentation-11-1 cuda-gdb-10-0 cuda-gdb-10-1\n",
" cuda-gdb-11-0 cuda-gpu-library-advisor-10-0 cuda-gpu-library-advisor-10-1\n",
" cuda-libraries-10-0 cuda-libraries-10-1 cuda-libraries-11-0\n",
" cuda-memcheck-10-0 cuda-memcheck-10-1 cuda-memcheck-11-0 cuda-nsight-10-0\n",
" cuda-nsight-10-1 cuda-nsight-11-0 cuda-nsight-11-1 cuda-nsight-compute-10-0\n",
" cuda-nsight-compute-10-1 cuda-nsight-compute-11-0 cuda-nsight-compute-11-1\n",
" cuda-nsight-systems-10-1 cuda-nsight-systems-11-0 cuda-nsight-systems-11-1\n",
" cuda-nvcc-10-0 cuda-nvcc-10-1 cuda-nvcc-11-0 cuda-nvdisasm-10-0\n",
" cuda-nvdisasm-10-1 cuda-nvdisasm-11-0 cuda-nvml-dev-10-0 cuda-nvml-dev-10-1\n",
" cuda-nvml-dev-11-0 cuda-nvprof-10-0 cuda-nvprof-10-1 cuda-nvprof-11-0\n",
" cuda-nvprune-10-0 cuda-nvprune-10-1 cuda-nvprune-11-0 cuda-nvtx-10-0\n",
" cuda-nvtx-10-1 cuda-nvtx-11-0 cuda-nvvp-10-0 cuda-nvvp-10-1 cuda-nvvp-11-0\n",
" cuda-nvvp-11-1 cuda-samples-10-0 cuda-samples-10-1 cuda-samples-11-0\n",
" cuda-samples-11-1 cuda-sanitizer-11-0 cuda-sanitizer-api-10-1\n",
" cuda-toolkit-10-0 cuda-toolkit-10-1 cuda-toolkit-11-0 cuda-toolkit-11-1\n",
" cuda-tools-10-0 cuda-tools-10-1 cuda-tools-11-0 cuda-tools-11-1\n",
" cuda-visual-tools-10-0 cuda-visual-tools-10-1 cuda-visual-tools-11-0\n",
" cuda-visual-tools-11-1 default-jre dkms freeglut3 freeglut3-dev\n",
" keyboard-configuration libargon2-0 libcap2 libcryptsetup12\n",
" libdevmapper1.02.1 libfontenc1 libidn11 libip4tc0 libjansson4\n",
" libnvidia-cfg1-510 libnvidia-common-460 libnvidia-common-510\n",
" libnvidia-extra-510 libnvidia-fbc1-510 libnvidia-gl-510 libpam-systemd\n",
" libpolkit-agent-1-0 libpolkit-backend-1-0 libpolkit-gobject-1-0 libxfont2\n",
" libxi-dev libxkbfile1 libxmu-dev libxmu-headers libxnvctrl0 libxtst6\n",
" nsight-compute-2020.2.1 nsight-compute-2022.1.0 nsight-systems-2020.3.2\n",
" nsight-systems-2020.3.4 nsight-systems-2021.5.2 nvidia-dkms-510\n",
" nvidia-kernel-common-510 nvidia-kernel-source-510 nvidia-modprobe\n",
" nvidia-settings openjdk-11-jre policykit-1 policykit-1-gnome python3-xkit\n",
" screen-resolution-extra systemd systemd-sysv udev x11-xkb-utils\n",
" xserver-common xserver-xorg-core-hwe-18.04 xserver-xorg-video-nvidia-510\n",
"Use 'sudo apt autoremove' to remove them.\n",
"The following NEW packages will be installed:\n",
" git-lfs tree\n",
"0 upgraded, 2 newly installed, 0 to remove and 39 not upgraded.\n",
"Need to get 2,169 kB of archives.\n",
"After this operation, 7,767 kB of additional disk space will be used.\n",
"Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 git-lfs amd64 2.3.4-1 [2,129 kB]\n",
"Get:2 http://archive.ubuntu.com/ubuntu bionic/universe amd64 tree amd64 1.7.0-5 [40.7 kB]\n",
"Fetched 2,169 kB in 1s (1,526 kB/s)\n",
"debconf: unable to initialize frontend: Dialog\n",
"debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76, <> line 2.)\n",
"debconf: falling back to frontend: Readline\n",
"debconf: unable to initialize frontend: Readline\n",
"debconf: (This frontend requires a controlling tty.)\n",
"debconf: falling back to frontend: Teletype\n",
"dpkg-preconfigure: unable to re-open stdin: \n",
"Selecting previously unselected package git-lfs.\n",
"(Reading database ... 155672 files and directories currently installed.)\n",
"Preparing to unpack .../git-lfs_2.3.4-1_amd64.deb ...\n",
"Unpacking git-lfs (2.3.4-1) ...\n",
"Selecting previously unselected package tree.\n",
"Preparing to unpack .../tree_1.7.0-5_amd64.deb ...\n",
"Unpacking tree (1.7.0-5) ...\n",
"Setting up tree (1.7.0-5) ...\n",
"Setting up git-lfs (2.3.4-1) ...\n",
"Processing triggers for man-db (2.8.3-2ubuntu0.1) ...\n"
]
}
],
"source": [
"!sudo apt-get install git-lfs tree"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "bvo2rUbfa367"
},
"source": [
"Cloning and uploading of modeling files can be done conveniently with the `huggingface_hub`'s `Repository` class. \n",
"\n",
"More information on how to use the `huggingface_hub` to upload any files, please take a look at the [official docs](https://huggingface.co/docs/hub/how-to-upstream)."
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 162,
"referenced_widgets": [
"49f7e6f289254977a2aafa2f1ee142db",
"8ee054a3be1a4c03a529940b1643f0d1",
"a0b28f4e2ad544f0b69dce9518c81b50",
"2a424bddd9174d0b80c3903dbe775bfc",
"7ae3d834443145d495f1656371ae9d6a",
"397128fbc4af4e9a90e3c63b706d5ed0",
"c8f2d3eaeeb3423dab3cdbb38e0d798e",
"ce944a90a4224ceb9961dc57c3d657f6",
"0b06f2f2710d43eea0426d1737c2a64b",
"fdeac6256f4b43468c00a4b251df955f",
"1e0d61dc183b4f07adbfdfeb6fb202f6",
"8eca6b521e024c80b98e1fc55683ce1b",
"1fcb7caae2b241d58338e5e527cf614b",
"45530b8c2e434dba8ef7c0b606097ddb",
"e1b77a2002554785b0ce7773feea287f",
"def607cd03ff47978cb0e4113e0fbb5d",
"fe77f2e7f6bc471b8ca41a655cda3f19",
"68746a06857e481b96d093b2839f8d5b",
"5c64944d7f67438eba35a49ba21cd3a2",
"bb60a920af7a429695df2971d9259ec6",
"526dec5c46e64ffbb2a5de18d89f7f0d",
"cdc9a0d590614f98afac8eacaa6d3160",
"21afae65d1604439a5942e6eddc25cda",
"79a64935d39d41aebea8a9ca07de4311",
"ab38d6da79ef41c49ee49264ca3fb0a1",
"a96ce339661e4e9eba96fea631035256",
"c49c137fc01d44f38ac770609ff9f71c",
"1b32fdbb446f4476ad1f81a49f6196c6",
"1517ca3338324e2baf15f6d4313b5bbe",
"17dbd042241d4532b34d1b81e331363d",
"23836dc9596848cbb3bff90ccf12778c",
"3ce1b50621bb4d0dbdfc3e9105df998b",
"2e0bf48c91454784a25304feb36dc026",
"3472b8bd64a040e388701c060eef23b7",
"027e2a80d66f48029d8b2b62d711c1c8",
"1784b7fff3a24535b9ebd9bfdefec3d5",
"80bdf3a46f844e59a4fc1a8d757928d8",
"ff3f3b40641e456caaa66eb280b61db1",
"e8fa47058b44474cbdca965fb8b1c522",
"f01093cd46814afb9e7448b2c21e7d67",
"d5e3a11b22a54e84a0be282fb55f70f6",
"02eefb0e19c74cf896c8718e255d4a65",
"9448492ea7b844d5ac3fcc4131defd06",
"d9a20f62b4604975802f84046fd79b3c"
]
},
"id": "fIfcunhF4YM6",
"outputId": "6f691851-9d17-4b11-b0f0-c8419259cd99"
},
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": [
"Cloning https://huggingface.co/kingabzpro/wav2vec2-large-xlsr-300-arabic into local empty directory.\n"
]
},
{
"output_type": "display_data",
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "49f7e6f289254977a2aafa2f1ee142db",
"version_minor": 0,
"version_major": 2
},
"text/plain": [
"Download file pytorch_model.bin: 0%| | 1.58k/1.18G [00:00, ?B/s]"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "8eca6b521e024c80b98e1fc55683ce1b",
"version_minor": 0,
"version_major": 2
},
"text/plain": [
"Download file training_args.bin: 100%|##########| 2.98k/2.98k [00:00, ?B/s]"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "21afae65d1604439a5942e6eddc25cda",
"version_minor": 0,
"version_major": 2
},
"text/plain": [
"Clean file training_args.bin: 34%|###3 | 1.00k/2.98k [00:00, ?B/s]"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "3472b8bd64a040e388701c060eef23b7",
"version_minor": 0,
"version_major": 2
},
"text/plain": [
"Clean file pytorch_model.bin: 0%| | 1.00k/1.18G [00:00, ?B/s]"
]
},
"metadata": {}
}
],
"source": [
"from huggingface_hub import Repository\n",
"\n",
"repo = Repository(local_dir=\"wav2vec2-large-xlsr-300-arabic\", clone_from=\"kingabzpro/wav2vec2-large-xlsr-300-arabic\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "OaD20PvHbakc"
},
"source": [
"Having cloned `xls-r-300m-sv`, let's save the new processor with LM into it."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"id": "UZ1sWfPH2oce"
},
"outputs": [],
"source": [
"processor_with_lm.save_pretrained(\"wav2vec2-large-xlsr-300-arabic\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "TnkJ0iXvcITB"
},
"source": [
"Let's inspect the local repository. The `tree` command conveniently can also show the size of the different files."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ClyENOYFcC_C",
"outputId": "88f00d40-35dd-473b-8685-f8c769a7bb19"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"wav2vec2-large-xlsr-300-arabic/\n",
"├── [ 533] alphabet.json\n",
"├── [2.0K] config.json\n",
"├── [4.0K] language_model\n",
"│ ├── [898M] 5gram_correct.arpa\n",
"│ ├── [ 78] attrs.json\n",
"│ └── [2.0M] unigrams.txt\n",
"├── [ 262] preprocessor_config.json\n",
"├── [1.2G] pytorch_model.bin\n",
"├── [2.4K] README.md\n",
"├── [ 85] special_tokens_map.json\n",
"├── [ 510] tokenizer_config.json\n",
"├── [3.0K] training_args.bin\n",
"└── [ 520] vocab.json\n",
"\n",
"1 directory, 12 files\n"
]
}
],
"source": [
"!tree -h wav2vec2-large-xlsr-300-arabic/"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "kIRC6UcOcRMP"
},
"source": [
"As can be seen the *5-gram* LM is quite large - it amounts to more than 4 GB.\n",
"To reduce the size of the *n-gram* and make loading faster, `kenLM` allows converting `.arpa` files to binary ones using the `build_binary` executable.\n",
"\n",
"Let's make use of it here."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "X9qg4FPt2zi8",
"outputId": "644d1486-08a9-4827-92f5-7d7778d3b0ab"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Reading wav2vec2-large-xlsr-300-arabic/language_model/5gram_correct.arpa\n",
"----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100\n",
"****************************************************************************************************\n",
"SUCCESS\n"
]
}
],
"source": [
"!kenlm/build/bin/build_binary wav2vec2-large-xlsr-300-arabic/language_model/5gram_correct.arpa wav2vec2-large-xlsr-300-arabic/language_model/5gram.bin"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "fY2M8k2KdCNM"
},
"source": [
"Great, it worked! Let's remove the `.arpa` file and check the size of the binary *5-gram* LM."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Zn4J-4OZdMPc",
"outputId": "e7248d37-7115-4592-a62e-507a3a51315a"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"wav2vec2-large-xlsr-300-arabic/\n",
"├── [ 533] alphabet.json\n",
"├── [2.0K] config.json\n",
"├── [4.0K] language_model\n",
"│ ├── [326M] 5gram.bin\n",
"│ ├── [ 78] attrs.json\n",
"│ └── [2.0M] unigrams.txt\n",
"├── [ 262] preprocessor_config.json\n",
"├── [1.2G] pytorch_model.bin\n",
"├── [2.4K] README.md\n",
"├── [ 85] special_tokens_map.json\n",
"├── [ 510] tokenizer_config.json\n",
"├── [3.0K] training_args.bin\n",
"└── [ 520] vocab.json\n",
"\n",
"1 directory, 12 files\n"
]
}
],
"source": [
"!rm wav2vec2-large-xlsr-300-arabic/language_model/5gram_correct.arpa && tree -h wav2vec2-large-xlsr-300-arabic/"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "4g112tF_dgIc"
},
"source": [
"Nice, we reduced the *n-gram* by more than half to less than 2GB now. In the final step, let's upload all files."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 119,
"referenced_widgets": [
"13dc79c921f649bc916b3a3f94bce935",
"384a1164636949bd841586d4adf5af25",
"e0775f61b5da4eb6b7fc377182cca761",
"5a144157cd1a4f71abbea1c89daeb077",
"de2e317bc61f48e28baaaea2dd601b0d",
"5fd3ecc4714140aab0aa62bce6c316c4",
"445521858fcb43479b8ae9b4bf2d6cb0",
"a3784489412b4bd4861c0dbcfdfa3699",
"2585c7e158c144f88e6045e0f083f10d",
"444b2533d20c47cb9f24ce509cdd1f97",
"a314270979984477a88ae66db93aa6c0"
]
},
"id": "WEV1sx6ee3aT",
"outputId": "2831dbfb-4734-407a-f579-5e0886fc1028"
},
"outputs": [
{
"output_type": "display_data",
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "13dc79c921f649bc916b3a3f94bce935",
"version_minor": 0,
"version_major": 2
},
"text/plain": [
"Upload file language_model/5gram.bin: 0%| | 3.37k/326M [00:00, ?B/s]"
]
},
"metadata": {}
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"To https://huggingface.co/kingabzpro/wav2vec2-large-xlsr-300-arabic\n",
" 9006a7e..597f880 main -> main\n",
"\n"
]
},
{
"output_type": "execute_result",
"data": {
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "string"
},
"text/plain": [
"'https://huggingface.co/kingabzpro/wav2vec2-large-xlsr-300-arabic/commit/597f8804f2936c91059b2019383b87969d9016ca'"
]
},
"metadata": {},
"execution_count": 30
}
],
"source": [
"repo.push_to_hub(commit_message=\"Upload lm-boosted decoder\")"
]
},
{
"cell_type": "code",
"source": [
"!python eval.py --model_id kingabzpro/wav2vec2-large-xlsr-300-arabic --dataset mozilla-foundation/common_voice_7_0 --config ar --split test\n"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "x_rVwrJQVBN5",
"outputId": "09179a83-557b-4e5c-ba3c-b0281f147c2e"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Reusing dataset common_voice (/root/.cache/huggingface/datasets/mozilla-foundation___common_voice/ar/7.0.0/fe20cac47c166e25b1f096ab661832e3da7cf298ed4a91dcaa1343ad972d175b)\n",
"tcmalloc: large alloc 1253105664 bytes == 0x559f61bd4000 @ 0x7ff4aeb47615 0x559e1062b3bc 0x559e1070c18a 0x559e1062e1cd 0x559e1062e120 0x559e106a1f33 0x559e1062f9da 0x559e106a22c0 0x559e1062f9da 0x559e1069deae 0x559e1069d02f 0x559e1062faba 0x559e1069deae 0x559e1062f9da 0x559e1069deae 0x559e1069d02f 0x559e1062faba 0x559e1069deae 0x559e1069d02f 0x559e1063036c 0x559e10630571 0x559e1069f633 0x559e1069d02f 0x559e1062faba 0x559e1069deae 0x559e1062f9da 0x559e1069deae 0x559e1069d02f 0x559e1062faba 0x559e1069deae 0x559e1069d02f\n",
"tcmalloc: large alloc 1566384128 bytes == 0x559fbc8b2000 @ 0x7ff4aeb47615 0x559e1062b3bc 0x559e1070c18a 0x559e1062e1cd 0x559e1062e120 0x559e106a1f33 0x559e1062f9da 0x559e106a22c0 0x559e1062f9da 0x559e1069deae 0x559e1069d02f 0x559e1062faba 0x559e1069deae 0x559e1062f9da 0x559e1069deae 0x559e1069d02f 0x559e1062faba 0x559e1069deae 0x559e1069d02f 0x559e1063036c 0x559e10630571 0x559e1069f633 0x559e1069d02f 0x559e1062faba 0x559e1069deae 0x559e1062f9da 0x559e1069deae 0x559e1069d02f 0x559e1062faba 0x559e1069deae 0x559e1069d02f\n",
"3147ex [2:25:28, 2.51s/ex]"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "gUPAx3_MdyQv"
},
"source": [
"That's it. Now you should be able to use the *5gram* for LM-boosted decoding as shown in Section 1.\n",
"\n",
"As can be seen on [`xls-r-300m-sv`'s model card](https://huggingface.co/hf-test/xls-r-300m-sv#inference-with-lm) our *5gram* LM-boosted decoder yields a WER of 18.85% on Common Voice's 7 test set which is a relative performance of *ca.* 30% 🔥."
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [],
"name": "Boosting_Wav2Vec2_with_n_grams_in_🤗_Transformers (1).ipynb",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
"c5eb721fe1b841168a49b5bc22435791": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HBoxModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "HBoxView",
"_dom_classes": [],
"_model_name": "HBoxModel",
"_view_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_view_count": null,
"_view_module_version": "1.5.0",
"box_style": "",
"layout": "IPY_MODEL_b9861ef1c3534e90b472be9ed8862f17",
"_model_module": "@jupyter-widgets/controls",
"children": [
"IPY_MODEL_77db336868d84805937445c51ea72df4",
"IPY_MODEL_fb4d2faef4da4dc0ae1203f9376053af",
"IPY_MODEL_edbdbc0a82854bde92e5abf6c8f87534"
]
}
},
"b9861ef1c3534e90b472be9ed8862f17": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_view_name": "LayoutView",
"grid_template_rows": null,
"right": null,
"justify_content": null,
"_view_module": "@jupyter-widgets/base",
"overflow": null,
"_model_module_version": "1.2.0",
"_view_count": null,
"flex_flow": null,
"width": null,
"min_width": null,
"border": null,
"align_items": null,
"bottom": null,
"_model_module": "@jupyter-widgets/base",
"top": null,
"grid_column": null,
"overflow_y": null,
"overflow_x": null,
"grid_auto_flow": null,
"grid_area": null,
"grid_template_columns": null,
"flex": null,
"_model_name": "LayoutModel",
"justify_items": null,
"grid_row": null,
"max_height": null,
"align_content": null,
"visibility": null,
"align_self": null,
"height": null,
"min_height": null,
"padding": null,
"grid_auto_rows": null,
"grid_gap": null,
"max_width": null,
"order": null,
"_view_module_version": "1.2.0",
"grid_template_areas": null,
"object_position": null,
"object_fit": null,
"grid_auto_columns": null,
"margin": null,
"display": null,
"left": null
}
},
"77db336868d84805937445c51ea72df4": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HTMLModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "HTMLView",
"style": "IPY_MODEL_eb4ed31888734fe39dd96f9642abd38b",
"_dom_classes": [],
"description": "",
"_model_name": "HTMLModel",
"placeholder": "",
"_view_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"value": "Downloading: ",
"_view_count": null,
"_view_module_version": "1.5.0",
"description_tooltip": null,
"_model_module": "@jupyter-widgets/controls",
"layout": "IPY_MODEL_d050daf64b5944fdad48b963ad482f7c"
}
},
"fb4d2faef4da4dc0ae1203f9376053af": {
"model_module": "@jupyter-widgets/controls",
"model_name": "FloatProgressModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "ProgressView",
"style": "IPY_MODEL_88e5417fad3d442585c799cfd9d2f8f4",
"_dom_classes": [],
"description": "",
"_model_name": "FloatProgressModel",
"bar_style": "success",
"max": 2084,
"_view_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"value": 2084,
"_view_count": null,
"_view_module_version": "1.5.0",
"orientation": "horizontal",
"min": 0,
"description_tooltip": null,
"_model_module": "@jupyter-widgets/controls",
"layout": "IPY_MODEL_5a0353a8a1be47a1a42f1e05eb72eb68"
}
},
"edbdbc0a82854bde92e5abf6c8f87534": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HTMLModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "HTMLView",
"style": "IPY_MODEL_40a6127b87464cddba87f6c97c308594",
"_dom_classes": [],
"description": "",
"_model_name": "HTMLModel",
"placeholder": "",
"_view_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"value": " 5.36k/? [00:00<00:00, 12.9kB/s]",
"_view_count": null,
"_view_module_version": "1.5.0",
"description_tooltip": null,
"_model_module": "@jupyter-widgets/controls",
"layout": "IPY_MODEL_c1931c4022d84d28815d460f72ef908b"
}
},
"eb4ed31888734fe39dd96f9642abd38b": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "StyleView",
"_model_name": "DescriptionStyleModel",
"description_width": "",
"_view_module": "@jupyter-widgets/base",
"_model_module_version": "1.5.0",
"_view_count": null,
"_view_module_version": "1.2.0",
"_model_module": "@jupyter-widgets/controls"
}
},
"d050daf64b5944fdad48b963ad482f7c": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_view_name": "LayoutView",
"grid_template_rows": null,
"right": null,
"justify_content": null,
"_view_module": "@jupyter-widgets/base",
"overflow": null,
"_model_module_version": "1.2.0",
"_view_count": null,
"flex_flow": null,
"width": null,
"min_width": null,
"border": null,
"align_items": null,
"bottom": null,
"_model_module": "@jupyter-widgets/base",
"top": null,
"grid_column": null,
"overflow_y": null,
"overflow_x": null,
"grid_auto_flow": null,
"grid_area": null,
"grid_template_columns": null,
"flex": null,
"_model_name": "LayoutModel",
"justify_items": null,
"grid_row": null,
"max_height": null,
"align_content": null,
"visibility": null,
"align_self": null,
"height": null,
"min_height": null,
"padding": null,
"grid_auto_rows": null,
"grid_gap": null,
"max_width": null,
"order": null,
"_view_module_version": "1.2.0",
"grid_template_areas": null,
"object_position": null,
"object_fit": null,
"grid_auto_columns": null,
"margin": null,
"display": null,
"left": null
}
},
"88e5417fad3d442585c799cfd9d2f8f4": {
"model_module": "@jupyter-widgets/controls",
"model_name": "ProgressStyleModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "StyleView",
"_model_name": "ProgressStyleModel",
"description_width": "",
"_view_module": "@jupyter-widgets/base",
"_model_module_version": "1.5.0",
"_view_count": null,
"_view_module_version": "1.2.0",
"bar_color": null,
"_model_module": "@jupyter-widgets/controls"
}
},
"5a0353a8a1be47a1a42f1e05eb72eb68": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_view_name": "LayoutView",
"grid_template_rows": null,
"right": null,
"justify_content": null,
"_view_module": "@jupyter-widgets/base",
"overflow": null,
"_model_module_version": "1.2.0",
"_view_count": null,
"flex_flow": null,
"width": null,
"min_width": null,
"border": null,
"align_items": null,
"bottom": null,
"_model_module": "@jupyter-widgets/base",
"top": null,
"grid_column": null,
"overflow_y": null,
"overflow_x": null,
"grid_auto_flow": null,
"grid_area": null,
"grid_template_columns": null,
"flex": null,
"_model_name": "LayoutModel",
"justify_items": null,
"grid_row": null,
"max_height": null,
"align_content": null,
"visibility": null,
"align_self": null,
"height": null,
"min_height": null,
"padding": null,
"grid_auto_rows": null,
"grid_gap": null,
"max_width": null,
"order": null,
"_view_module_version": "1.2.0",
"grid_template_areas": null,
"object_position": null,
"object_fit": null,
"grid_auto_columns": null,
"margin": null,
"display": null,
"left": null
}
},
"40a6127b87464cddba87f6c97c308594": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "StyleView",
"_model_name": "DescriptionStyleModel",
"description_width": "",
"_view_module": "@jupyter-widgets/base",
"_model_module_version": "1.5.0",
"_view_count": null,
"_view_module_version": "1.2.0",
"_model_module": "@jupyter-widgets/controls"
}
},
"c1931c4022d84d28815d460f72ef908b": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_view_name": "LayoutView",
"grid_template_rows": null,
"right": null,
"justify_content": null,
"_view_module": "@jupyter-widgets/base",
"overflow": null,
"_model_module_version": "1.2.0",
"_view_count": null,
"flex_flow": null,
"width": null,
"min_width": null,
"border": null,
"align_items": null,
"bottom": null,
"_model_module": "@jupyter-widgets/base",
"top": null,
"grid_column": null,
"overflow_y": null,
"overflow_x": null,
"grid_auto_flow": null,
"grid_area": null,
"grid_template_columns": null,
"flex": null,
"_model_name": "LayoutModel",
"justify_items": null,
"grid_row": null,
"max_height": null,
"align_content": null,
"visibility": null,
"align_self": null,
"height": null,
"min_height": null,
"padding": null,
"grid_auto_rows": null,
"grid_gap": null,
"max_width": null,
"order": null,
"_view_module_version": "1.2.0",
"grid_template_areas": null,
"object_position": null,
"object_fit": null,
"grid_auto_columns": null,
"margin": null,
"display": null,
"left": null
}
},
"880b6bbcaad54eb8ba377377b1858ffd": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HBoxModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "HBoxView",
"_dom_classes": [],
"_model_name": "HBoxModel",
"_view_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_view_count": null,
"_view_module_version": "1.5.0",
"box_style": "",
"layout": "IPY_MODEL_d37fb9ad76e64370a426f306919bedd0",
"_model_module": "@jupyter-widgets/controls",
"children": [
"IPY_MODEL_850edc7115a34dde9ddf6e36b5f1e8dd",
"IPY_MODEL_3f7f23c3d40a4a629c26115bad8b19c6",
"IPY_MODEL_ef6e382ddb8f475981b9f2ea8312bf46"
]
}
},
"d37fb9ad76e64370a426f306919bedd0": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_view_name": "LayoutView",
"grid_template_rows": null,
"right": null,
"justify_content": null,
"_view_module": "@jupyter-widgets/base",
"overflow": null,
"_model_module_version": "1.2.0",
"_view_count": null,
"flex_flow": null,
"width": null,
"min_width": null,
"border": null,
"align_items": null,
"bottom": null,
"_model_module": "@jupyter-widgets/base",
"top": null,
"grid_column": null,
"overflow_y": null,
"overflow_x": null,
"grid_auto_flow": null,
"grid_area": null,
"grid_template_columns": null,
"flex": null,
"_model_name": "LayoutModel",
"justify_items": null,
"grid_row": null,
"max_height": null,
"align_content": null,
"visibility": null,
"align_self": null,
"height": null,
"min_height": null,
"padding": null,
"grid_auto_rows": null,
"grid_gap": null,
"max_width": null,
"order": null,
"_view_module_version": "1.2.0",
"grid_template_areas": null,
"object_position": null,
"object_fit": null,
"grid_auto_columns": null,
"margin": null,
"display": null,
"left": null
}
},
"850edc7115a34dde9ddf6e36b5f1e8dd": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HTMLModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "HTMLView",
"style": "IPY_MODEL_c8b82acda88245b686d57b496466a3e2",
"_dom_classes": [],
"description": "",
"_model_name": "HTMLModel",
"placeholder": "",
"_view_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"value": "Downloading: ",
"_view_count": null,
"_view_module_version": "1.5.0",
"description_tooltip": null,
"_model_module": "@jupyter-widgets/controls",
"layout": "IPY_MODEL_ed723bd499654388b4717624baa2a288"
}
},
"3f7f23c3d40a4a629c26115bad8b19c6": {
"model_module": "@jupyter-widgets/controls",
"model_name": "FloatProgressModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "ProgressView",
"style": "IPY_MODEL_dfcd53bb9a6f4cf3b6cdc84f40bbcd51",
"_dom_classes": [],
"description": "",
"_model_name": "FloatProgressModel",
"bar_style": "success",
"max": 7406,
"_view_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"value": 7406,
"_view_count": null,
"_view_module_version": "1.5.0",
"orientation": "horizontal",
"min": 0,
"description_tooltip": null,
"_model_module": "@jupyter-widgets/controls",
"layout": "IPY_MODEL_3752bd8127d140829a62045098151b37"
}
},
"ef6e382ddb8f475981b9f2ea8312bf46": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HTMLModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "HTMLView",
"style": "IPY_MODEL_d73548881b594fd88f074b947b07c408",
"_dom_classes": [],
"description": "",
"_model_name": "HTMLModel",
"placeholder": "",
"_view_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"value": " 116k/? [00:00<00:00, 10.6kB/s]",
"_view_count": null,
"_view_module_version": "1.5.0",
"description_tooltip": null,
"_model_module": "@jupyter-widgets/controls",
"layout": "IPY_MODEL_1963287764bb48fa86c7437eb2dad75b"
}
},
"c8b82acda88245b686d57b496466a3e2": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "StyleView",
"_model_name": "DescriptionStyleModel",
"description_width": "",
"_view_module": "@jupyter-widgets/base",
"_model_module_version": "1.5.0",
"_view_count": null,
"_view_module_version": "1.2.0",
"_model_module": "@jupyter-widgets/controls"
}
},
"ed723bd499654388b4717624baa2a288": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_view_name": "LayoutView",
"grid_template_rows": null,
"right": null,
"justify_content": null,
"_view_module": "@jupyter-widgets/base",
"overflow": null,
"_model_module_version": "1.2.0",
"_view_count": null,
"flex_flow": null,
"width": null,
"min_width": null,
"border": null,
"align_items": null,
"bottom": null,
"_model_module": "@jupyter-widgets/base",
"top": null,
"grid_column": null,
"overflow_y": null,
"overflow_x": null,
"grid_auto_flow": null,
"grid_area": null,
"grid_template_columns": null,
"flex": null,
"_model_name": "LayoutModel",
"justify_items": null,
"grid_row": null,
"max_height": null,
"align_content": null,
"visibility": null,
"align_self": null,
"height": null,
"min_height": null,
"padding": null,
"grid_auto_rows": null,
"grid_gap": null,
"max_width": null,
"order": null,
"_view_module_version": "1.2.0",
"grid_template_areas": null,
"object_position": null,
"object_fit": null,
"grid_auto_columns": null,
"margin": null,
"display": null,
"left": null
}
},
"dfcd53bb9a6f4cf3b6cdc84f40bbcd51": {
"model_module": "@jupyter-widgets/controls",
"model_name": "ProgressStyleModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "StyleView",
"_model_name": "ProgressStyleModel",
"description_width": "",
"_view_module": "@jupyter-widgets/base",
"_model_module_version": "1.5.0",
"_view_count": null,
"_view_module_version": "1.2.0",
"bar_color": null,
"_model_module": "@jupyter-widgets/controls"
}
},
"3752bd8127d140829a62045098151b37": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_view_name": "LayoutView",
"grid_template_rows": null,
"right": null,
"justify_content": null,
"_view_module": "@jupyter-widgets/base",
"overflow": null,
"_model_module_version": "1.2.0",
"_view_count": null,
"flex_flow": null,
"width": null,
"min_width": null,
"border": null,
"align_items": null,
"bottom": null,
"_model_module": "@jupyter-widgets/base",
"top": null,
"grid_column": null,
"overflow_y": null,
"overflow_x": null,
"grid_auto_flow": null,
"grid_area": null,
"grid_template_columns": null,
"flex": null,
"_model_name": "LayoutModel",
"justify_items": null,
"grid_row": null,
"max_height": null,
"align_content": null,
"visibility": null,
"align_self": null,
"height": null,
"min_height": null,
"padding": null,
"grid_auto_rows": null,
"grid_gap": null,
"max_width": null,
"order": null,
"_view_module_version": "1.2.0",
"grid_template_areas": null,
"object_position": null,
"object_fit": null,
"grid_auto_columns": null,
"margin": null,
"display": null,
"left": null
}
},
"d73548881b594fd88f074b947b07c408": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "StyleView",
"_model_name": "DescriptionStyleModel",
"description_width": "",
"_view_module": "@jupyter-widgets/base",
"_model_module_version": "1.5.0",
"_view_count": null,
"_view_module_version": "1.2.0",
"_model_module": "@jupyter-widgets/controls"
}
},
"1963287764bb48fa86c7437eb2dad75b": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_view_name": "LayoutView",
"grid_template_rows": null,
"right": null,
"justify_content": null,
"_view_module": "@jupyter-widgets/base",
"overflow": null,
"_model_module_version": "1.2.0",
"_view_count": null,
"flex_flow": null,
"width": null,
"min_width": null,
"border": null,
"align_items": null,
"bottom": null,
"_model_module": "@jupyter-widgets/base",
"top": null,
"grid_column": null,
"overflow_y": null,
"overflow_x": null,
"grid_auto_flow": null,
"grid_area": null,
"grid_template_columns": null,
"flex": null,
"_model_name": "LayoutModel",
"justify_items": null,
"grid_row": null,
"max_height": null,
"align_content": null,
"visibility": null,
"align_self": null,
"height": null,
"min_height": null,
"padding": null,
"grid_auto_rows": null,
"grid_gap": null,
"max_width": null,
"order": null,
"_view_module_version": "1.2.0",
"grid_template_areas": null,
"object_position": null,
"object_fit": null,
"grid_auto_columns": null,
"margin": null,
"display": null,
"left": null
}
},
"95d736dd67b849ebae4433bff689cd06": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HBoxModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "HBoxView",
"_dom_classes": [],
"_model_name": "HBoxModel",
"_view_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_view_count": null,
"_view_module_version": "1.5.0",
"box_style": "",
"layout": "IPY_MODEL_8584eaf09efb49c995f5a2b1c25fe089",
"_model_module": "@jupyter-widgets/controls",
"children": [
"IPY_MODEL_a377ec0ebad0485987b93ae5212ab19f",
"IPY_MODEL_00b92ed648e94e8b975f4c66cc329c19",
"IPY_MODEL_5f132d8f475345f7acef6a4996a026cf"
]
}
},
"8584eaf09efb49c995f5a2b1c25fe089": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_view_name": "LayoutView",
"grid_template_rows": null,
"right": null,
"justify_content": null,
"_view_module": "@jupyter-widgets/base",
"overflow": null,
"_model_module_version": "1.2.0",
"_view_count": null,
"flex_flow": null,
"width": null,
"min_width": null,
"border": null,
"align_items": null,
"bottom": null,
"_model_module": "@jupyter-widgets/base",
"top": null,
"grid_column": null,
"overflow_y": null,
"overflow_x": null,
"grid_auto_flow": null,
"grid_area": null,
"grid_template_columns": null,
"flex": null,
"_model_name": "LayoutModel",
"justify_items": null,
"grid_row": null,
"max_height": null,
"align_content": null,
"visibility": null,
"align_self": null,
"height": null,
"min_height": null,
"padding": null,
"grid_auto_rows": null,
"grid_gap": null,
"max_width": null,
"order": null,
"_view_module_version": "1.2.0",
"grid_template_areas": null,
"object_position": null,
"object_fit": null,
"grid_auto_columns": null,
"margin": null,
"display": null,
"left": null
}
},
"a377ec0ebad0485987b93ae5212ab19f": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HTMLModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "HTMLView",
"style": "IPY_MODEL_90ec52eebee44eeea4cdead5bf665b67",
"_dom_classes": [],
"description": "",
"_model_name": "HTMLModel",
"placeholder": "",
"_view_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"value": "Downloading: 100%",
"_view_count": null,
"_view_module_version": "1.5.0",
"description_tooltip": null,
"_model_module": "@jupyter-widgets/controls",
"layout": "IPY_MODEL_16a8075896df4ca181a70973a59d1fd7"
}
},
"00b92ed648e94e8b975f4c66cc329c19": {
"model_module": "@jupyter-widgets/controls",
"model_name": "FloatProgressModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "ProgressView",
"style": "IPY_MODEL_4e562a18f1a0471ea0ebd4007a2d674e",
"_dom_classes": [],
"description": "",
"_model_name": "FloatProgressModel",
"bar_style": "success",
"max": 24714354,
"_view_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"value": 24714354,
"_view_count": null,
"_view_module_version": "1.5.0",
"orientation": "horizontal",
"min": 0,
"description_tooltip": null,
"_model_module": "@jupyter-widgets/controls",
"layout": "IPY_MODEL_e3dafae56ffb4a5cbc84d80bc7f30c87"
}
},
"5f132d8f475345f7acef6a4996a026cf": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HTMLModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "HTMLView",
"style": "IPY_MODEL_9681133982cc42048815e2c30e1b4436",
"_dom_classes": [],
"description": "",
"_model_name": "HTMLModel",
"placeholder": "",
"_view_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"value": " 24.7M/24.7M [00:10<00:00, 9.57MB/s]",
"_view_count": null,
"_view_module_version": "1.5.0",
"description_tooltip": null,
"_model_module": "@jupyter-widgets/controls",
"layout": "IPY_MODEL_be4fd6ba2c024224a10571a3a2c44637"
}
},
"90ec52eebee44eeea4cdead5bf665b67": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "StyleView",
"_model_name": "DescriptionStyleModel",
"description_width": "",
"_view_module": "@jupyter-widgets/base",
"_model_module_version": "1.5.0",
"_view_count": null,
"_view_module_version": "1.2.0",
"_model_module": "@jupyter-widgets/controls"
}
},
"16a8075896df4ca181a70973a59d1fd7": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_view_name": "LayoutView",
"grid_template_rows": null,
"right": null,
"justify_content": null,
"_view_module": "@jupyter-widgets/base",
"overflow": null,
"_model_module_version": "1.2.0",
"_view_count": null,
"flex_flow": null,
"width": null,
"min_width": null,
"border": null,
"align_items": null,
"bottom": null,
"_model_module": "@jupyter-widgets/base",
"top": null,
"grid_column": null,
"overflow_y": null,
"overflow_x": null,
"grid_auto_flow": null,
"grid_area": null,
"grid_template_columns": null,
"flex": null,
"_model_name": "LayoutModel",
"justify_items": null,
"grid_row": null,
"max_height": null,
"align_content": null,
"visibility": null,
"align_self": null,
"height": null,
"min_height": null,
"padding": null,
"grid_auto_rows": null,
"grid_gap": null,
"max_width": null,
"order": null,
"_view_module_version": "1.2.0",
"grid_template_areas": null,
"object_position": null,
"object_fit": null,
"grid_auto_columns": null,
"margin": null,
"display": null,
"left": null
}
},
"4e562a18f1a0471ea0ebd4007a2d674e": {
"model_module": "@jupyter-widgets/controls",
"model_name": "ProgressStyleModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "StyleView",
"_model_name": "ProgressStyleModel",
"description_width": "",
"_view_module": "@jupyter-widgets/base",
"_model_module_version": "1.5.0",
"_view_count": null,
"_view_module_version": "1.2.0",
"bar_color": null,
"_model_module": "@jupyter-widgets/controls"
}
},
"e3dafae56ffb4a5cbc84d80bc7f30c87": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_view_name": "LayoutView",
"grid_template_rows": null,
"right": null,
"justify_content": null,
"_view_module": "@jupyter-widgets/base",
"overflow": null,
"_model_module_version": "1.2.0",
"_view_count": null,
"flex_flow": null,
"width": null,
"min_width": null,
"border": null,
"align_items": null,
"bottom": null,
"_model_module": "@jupyter-widgets/base",
"top": null,
"grid_column": null,
"overflow_y": null,
"overflow_x": null,
"grid_auto_flow": null,
"grid_area": null,
"grid_template_columns": null,
"flex": null,
"_model_name": "LayoutModel",
"justify_items": null,
"grid_row": null,
"max_height": null,
"align_content": null,
"visibility": null,
"align_self": null,
"height": null,
"min_height": null,
"padding": null,
"grid_auto_rows": null,
"grid_gap": null,
"max_width": null,
"order": null,
"_view_module_version": "1.2.0",
"grid_template_areas": null,
"object_position": null,
"object_fit": null,
"grid_auto_columns": null,
"margin": null,
"display": null,
"left": null
}
},
"9681133982cc42048815e2c30e1b4436": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "StyleView",
"_model_name": "DescriptionStyleModel",
"description_width": "",
"_view_module": "@jupyter-widgets/base",
"_model_module_version": "1.5.0",
"_view_count": null,
"_view_module_version": "1.2.0",
"_model_module": "@jupyter-widgets/controls"
}
},
"be4fd6ba2c024224a10571a3a2c44637": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_view_name": "LayoutView",
"grid_template_rows": null,
"right": null,
"justify_content": null,
"_view_module": "@jupyter-widgets/base",
"overflow": null,
"_model_module_version": "1.2.0",
"_view_count": null,
"flex_flow": null,
"width": null,
"min_width": null,
"border": null,
"align_items": null,
"bottom": null,
"_model_module": "@jupyter-widgets/base",
"top": null,
"grid_column": null,
"overflow_y": null,
"overflow_x": null,
"grid_auto_flow": null,
"grid_area": null,
"grid_template_columns": null,
"flex": null,
"_model_name": "LayoutModel",
"justify_items": null,
"grid_row": null,
"max_height": null,
"align_content": null,
"visibility": null,
"align_self": null,
"height": null,
"min_height": null,
"padding": null,
"grid_auto_rows": null,
"grid_gap": null,
"max_width": null,
"order": null,
"_view_module_version": "1.2.0",
"grid_template_areas": null,
"object_position": null,
"object_fit": null,
"grid_auto_columns": null,
"margin": null,
"display": null,
"left": null
}
},
"73e64b85117b4bc69d013fad313da9c6": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HBoxModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "HBoxView",
"_dom_classes": [],
"_model_name": "HBoxModel",
"_view_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_view_count": null,
"_view_module_version": "1.5.0",
"box_style": "",
"layout": "IPY_MODEL_678d73e72aa941b2b1f3d1e85ed0d952",
"_model_module": "@jupyter-widgets/controls",
"children": [
"IPY_MODEL_5bb04fe5c73b4bc09f1aa05952244413",
"IPY_MODEL_d4adf3cee88e4482b20f9f0d3b87a36a",
"IPY_MODEL_0f21362b91214406a29a82fcffc34bf7"
]
}
},
"678d73e72aa941b2b1f3d1e85ed0d952": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_view_name": "LayoutView",
"grid_template_rows": null,
"right": null,
"justify_content": null,
"_view_module": "@jupyter-widgets/base",
"overflow": null,
"_model_module_version": "1.2.0",
"_view_count": null,
"flex_flow": null,
"width": null,
"min_width": null,
"border": null,
"align_items": null,
"bottom": null,
"_model_module": "@jupyter-widgets/base",
"top": null,
"grid_column": null,
"overflow_y": null,
"overflow_x": null,
"grid_auto_flow": null,
"grid_area": null,
"grid_template_columns": null,
"flex": null,
"_model_name": "LayoutModel",
"justify_items": null,
"grid_row": null,
"max_height": null,
"align_content": null,
"visibility": null,
"align_self": null,
"height": null,
"min_height": null,
"padding": null,
"grid_auto_rows": null,
"grid_gap": null,
"max_width": null,
"order": null,
"_view_module_version": "1.2.0",
"grid_template_areas": null,
"object_position": null,
"object_fit": null,
"grid_auto_columns": null,
"margin": null,
"display": null,
"left": null
}
},
"5bb04fe5c73b4bc09f1aa05952244413": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HTMLModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "HTMLView",
"style": "IPY_MODEL_f692fa3e8bad4f359baaff49a70d5cd5",
"_dom_classes": [],
"description": "",
"_model_name": "HTMLModel",
"placeholder": "",
"_view_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"value": "",
"_view_count": null,
"_view_module_version": "1.5.0",
"description_tooltip": null,
"_model_module": "@jupyter-widgets/controls",
"layout": "IPY_MODEL_3717640fa6e740c6a73e061fd9cd59e8"
}
},
"d4adf3cee88e4482b20f9f0d3b87a36a": {
"model_module": "@jupyter-widgets/controls",
"model_name": "FloatProgressModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "ProgressView",
"style": "IPY_MODEL_6433d64ddce540f3acacb9990c6f8b20",
"_dom_classes": [],
"description": "",
"_model_name": "FloatProgressModel",
"bar_style": "info",
"max": 1,
"_view_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"value": 1,
"_view_count": null,
"_view_module_version": "1.5.0",
"orientation": "horizontal",
"min": 0,
"description_tooltip": null,
"_model_module": "@jupyter-widgets/controls",
"layout": "IPY_MODEL_1b4a50a4aeea429795ff12ccf7e25ecc"
}
},
"0f21362b91214406a29a82fcffc34bf7": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HTMLModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "HTMLView",
"style": "IPY_MODEL_d8d482db758d4052afe8019ba59d816a",
"_dom_classes": [],
"description": "",
"_model_name": "HTMLModel",
"placeholder": "",
"_view_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"value": " 82450/0 [00:04<00:00, 19622.66 examples/s]",
"_view_count": null,
"_view_module_version": "1.5.0",
"description_tooltip": null,
"_model_module": "@jupyter-widgets/controls",
"layout": "IPY_MODEL_a0ec12647b784b2d94a298db506e1ca4"
}
},
"f692fa3e8bad4f359baaff49a70d5cd5": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "StyleView",
"_model_name": "DescriptionStyleModel",
"description_width": "",
"_view_module": "@jupyter-widgets/base",
"_model_module_version": "1.5.0",
"_view_count": null,
"_view_module_version": "1.2.0",
"_model_module": "@jupyter-widgets/controls"
}
},
"3717640fa6e740c6a73e061fd9cd59e8": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_view_name": "LayoutView",
"grid_template_rows": null,
"right": null,
"justify_content": null,
"_view_module": "@jupyter-widgets/base",
"overflow": null,
"_model_module_version": "1.2.0",
"_view_count": null,
"flex_flow": null,
"width": null,
"min_width": null,
"border": null,
"align_items": null,
"bottom": null,
"_model_module": "@jupyter-widgets/base",
"top": null,
"grid_column": null,
"overflow_y": null,
"overflow_x": null,
"grid_auto_flow": null,
"grid_area": null,
"grid_template_columns": null,
"flex": null,
"_model_name": "LayoutModel",
"justify_items": null,
"grid_row": null,
"max_height": null,
"align_content": null,
"visibility": null,
"align_self": null,
"height": null,
"min_height": null,
"padding": null,
"grid_auto_rows": null,
"grid_gap": null,
"max_width": null,
"order": null,
"_view_module_version": "1.2.0",
"grid_template_areas": null,
"object_position": null,
"object_fit": null,
"grid_auto_columns": null,
"margin": null,
"display": null,
"left": null
}
},
"6433d64ddce540f3acacb9990c6f8b20": {
"model_module": "@jupyter-widgets/controls",
"model_name": "ProgressStyleModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "StyleView",
"_model_name": "ProgressStyleModel",
"description_width": "",
"_view_module": "@jupyter-widgets/base",
"_model_module_version": "1.5.0",
"_view_count": null,
"_view_module_version": "1.2.0",
"bar_color": null,
"_model_module": "@jupyter-widgets/controls"
}
},
"1b4a50a4aeea429795ff12ccf7e25ecc": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_view_name": "LayoutView",
"grid_template_rows": null,
"right": null,
"justify_content": null,
"_view_module": "@jupyter-widgets/base",
"overflow": null,
"_model_module_version": "1.2.0",
"_view_count": null,
"flex_flow": null,
"width": "20px",
"min_width": null,
"border": null,
"align_items": null,
"bottom": null,
"_model_module": "@jupyter-widgets/base",
"top": null,
"grid_column": null,
"overflow_y": null,
"overflow_x": null,
"grid_auto_flow": null,
"grid_area": null,
"grid_template_columns": null,
"flex": null,
"_model_name": "LayoutModel",
"justify_items": null,
"grid_row": null,
"max_height": null,
"align_content": null,
"visibility": null,
"align_self": null,
"height": null,
"min_height": null,
"padding": null,
"grid_auto_rows": null,
"grid_gap": null,
"max_width": null,
"order": null,
"_view_module_version": "1.2.0",
"grid_template_areas": null,
"object_position": null,
"object_fit": null,
"grid_auto_columns": null,
"margin": null,
"display": null,
"left": null
}
},
"d8d482db758d4052afe8019ba59d816a": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "StyleView",
"_model_name": "DescriptionStyleModel",
"description_width": "",
"_view_module": "@jupyter-widgets/base",
"_model_module_version": "1.5.0",
"_view_count": null,
"_view_module_version": "1.2.0",
"_model_module": "@jupyter-widgets/controls"
}
},
"a0ec12647b784b2d94a298db506e1ca4": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_view_name": "LayoutView",
"grid_template_rows": null,
"right": null,
"justify_content": null,
"_view_module": "@jupyter-widgets/base",
"overflow": null,
"_model_module_version": "1.2.0",
"_view_count": null,
"flex_flow": null,
"width": null,
"min_width": null,
"border": null,
"align_items": null,
"bottom": null,
"_model_module": "@jupyter-widgets/base",
"top": null,
"grid_column": null,
"overflow_y": null,
"overflow_x": null,
"grid_auto_flow": null,
"grid_area": null,
"grid_template_columns": null,
"flex": null,
"_model_name": "LayoutModel",
"justify_items": null,
"grid_row": null,
"max_height": null,
"align_content": null,
"visibility": null,
"align_self": null,
"height": null,
"min_height": null,
"padding": null,
"grid_auto_rows": null,
"grid_gap": null,
"max_width": null,
"order": null,
"_view_module_version": "1.2.0",
"grid_template_areas": null,
"object_position": null,
"object_fit": null,
"grid_auto_columns": null,
"margin": null,
"display": null,
"left": null
}
},
"9c52b398782b40cb99a35d2bf5f079b8": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HBoxModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "HBoxView",
"_dom_classes": [],
"_model_name": "HBoxModel",
"_view_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_view_count": null,
"_view_module_version": "1.5.0",
"box_style": "",
"layout": "IPY_MODEL_3137ac50cbae426e9031cc79d9c3443b",
"_model_module": "@jupyter-widgets/controls",
"children": [
"IPY_MODEL_86a4292b30ec47c0a875ada683c00886",
"IPY_MODEL_ac78dc635d2448dea7fdba88d7df067f",
"IPY_MODEL_d4595639599e4233856655c8fe255deb"
]
}
},
"3137ac50cbae426e9031cc79d9c3443b": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_view_name": "LayoutView",
"grid_template_rows": null,
"right": null,
"justify_content": null,
"_view_module": "@jupyter-widgets/base",
"overflow": null,
"_model_module_version": "1.2.0",
"_view_count": null,
"flex_flow": null,
"width": null,
"min_width": null,
"border": null,
"align_items": null,
"bottom": null,
"_model_module": "@jupyter-widgets/base",
"top": null,
"grid_column": null,
"overflow_y": null,
"overflow_x": null,
"grid_auto_flow": null,
"grid_area": null,
"grid_template_columns": null,
"flex": null,
"_model_name": "LayoutModel",
"justify_items": null,
"grid_row": null,
"max_height": null,
"align_content": null,
"visibility": null,
"align_self": null,
"height": null,
"min_height": null,
"padding": null,
"grid_auto_rows": null,
"grid_gap": null,
"max_width": null,
"order": null,
"_view_module_version": "1.2.0",
"grid_template_areas": null,
"object_position": null,
"object_fit": null,
"grid_auto_columns": null,
"margin": null,
"display": null,
"left": null
}
},
"86a4292b30ec47c0a875ada683c00886": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HTMLModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "HTMLView",
"style": "IPY_MODEL_72b8ff227663406f88069ed51847447d",
"_dom_classes": [],
"description": "",
"_model_name": "HTMLModel",
"placeholder": "",
"_view_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"value": "",
"_view_count": null,
"_view_module_version": "1.5.0",
"description_tooltip": null,
"_model_module": "@jupyter-widgets/controls",
"layout": "IPY_MODEL_a8604194873b4defadb70736fe913d6e"
}
},
"ac78dc635d2448dea7fdba88d7df067f": {
"model_module": "@jupyter-widgets/controls",
"model_name": "FloatProgressModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "ProgressView",
"style": "IPY_MODEL_3dcfd3664dab4bc39dda69085533423d",
"_dom_classes": [],
"description": "",
"_model_name": "FloatProgressModel",
"bar_style": "success",
"max": 1,
"_view_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"value": 1,
"_view_count": null,
"_view_module_version": "1.5.0",
"orientation": "horizontal",
"min": 0,
"description_tooltip": null,
"_model_module": "@jupyter-widgets/controls",
"layout": "IPY_MODEL_04b1c8bb81894178ba1e3e6f63f30a2a"
}
},
"d4595639599e4233856655c8fe255deb": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HTMLModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "HTMLView",
"style": "IPY_MODEL_6997e0504b9d4cd6b2f16f41ce144d3c",
"_dom_classes": [],
"description": "",
"_model_name": "HTMLModel",
"placeholder": "",
"_view_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"value": " 83187/? [00:25<00:00, 4631.43ex/s]",
"_view_count": null,
"_view_module_version": "1.5.0",
"description_tooltip": null,
"_model_module": "@jupyter-widgets/controls",
"layout": "IPY_MODEL_2d23fb97540040bda093ff13dccaa713"
}
},
"72b8ff227663406f88069ed51847447d": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "StyleView",
"_model_name": "DescriptionStyleModel",
"description_width": "",
"_view_module": "@jupyter-widgets/base",
"_model_module_version": "1.5.0",
"_view_count": null,
"_view_module_version": "1.2.0",
"_model_module": "@jupyter-widgets/controls"
}
},
"a8604194873b4defadb70736fe913d6e": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_view_name": "LayoutView",
"grid_template_rows": null,
"right": null,
"justify_content": null,
"_view_module": "@jupyter-widgets/base",
"overflow": null,
"_model_module_version": "1.2.0",
"_view_count": null,
"flex_flow": null,
"width": null,
"min_width": null,
"border": null,
"align_items": null,
"bottom": null,
"_model_module": "@jupyter-widgets/base",
"top": null,
"grid_column": null,
"overflow_y": null,
"overflow_x": null,
"grid_auto_flow": null,
"grid_area": null,
"grid_template_columns": null,
"flex": null,
"_model_name": "LayoutModel",
"justify_items": null,
"grid_row": null,
"max_height": null,
"align_content": null,
"visibility": null,
"align_self": null,
"height": null,
"min_height": null,
"padding": null,
"grid_auto_rows": null,
"grid_gap": null,
"max_width": null,
"order": null,
"_view_module_version": "1.2.0",
"grid_template_areas": null,
"object_position": null,
"object_fit": null,
"grid_auto_columns": null,
"margin": null,
"display": null,
"left": null
}
},
"3dcfd3664dab4bc39dda69085533423d": {
"model_module": "@jupyter-widgets/controls",
"model_name": "ProgressStyleModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "StyleView",
"_model_name": "ProgressStyleModel",
"description_width": "",
"_view_module": "@jupyter-widgets/base",
"_model_module_version": "1.5.0",
"_view_count": null,
"_view_module_version": "1.2.0",
"bar_color": null,
"_model_module": "@jupyter-widgets/controls"
}
},
"04b1c8bb81894178ba1e3e6f63f30a2a": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_view_name": "LayoutView",
"grid_template_rows": null,
"right": null,
"justify_content": null,
"_view_module": "@jupyter-widgets/base",
"overflow": null,
"_model_module_version": "1.2.0",
"_view_count": null,
"flex_flow": null,
"width": "20px",
"min_width": null,
"border": null,
"align_items": null,
"bottom": null,
"_model_module": "@jupyter-widgets/base",
"top": null,
"grid_column": null,
"overflow_y": null,
"overflow_x": null,
"grid_auto_flow": null,
"grid_area": null,
"grid_template_columns": null,
"flex": null,
"_model_name": "LayoutModel",
"justify_items": null,
"grid_row": null,
"max_height": null,
"align_content": null,
"visibility": null,
"align_self": null,
"height": null,
"min_height": null,
"padding": null,
"grid_auto_rows": null,
"grid_gap": null,
"max_width": null,
"order": null,
"_view_module_version": "1.2.0",
"grid_template_areas": null,
"object_position": null,
"object_fit": null,
"grid_auto_columns": null,
"margin": null,
"display": null,
"left": null
}
},
"6997e0504b9d4cd6b2f16f41ce144d3c": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "StyleView",
"_model_name": "DescriptionStyleModel",
"description_width": "",
"_view_module": "@jupyter-widgets/base",
"_model_module_version": "1.5.0",
"_view_count": null,
"_view_module_version": "1.2.0",
"_model_module": "@jupyter-widgets/controls"
}
},
"2d23fb97540040bda093ff13dccaa713": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_view_name": "LayoutView",
"grid_template_rows": null,
"right": null,
"justify_content": null,
"_view_module": "@jupyter-widgets/base",
"overflow": null,
"_model_module_version": "1.2.0",
"_view_count": null,
"flex_flow": null,
"width": null,
"min_width": null,
"border": null,
"align_items": null,
"bottom": null,
"_model_module": "@jupyter-widgets/base",
"top": null,
"grid_column": null,
"overflow_y": null,
"overflow_x": null,
"grid_auto_flow": null,
"grid_area": null,
"grid_template_columns": null,
"flex": null,
"_model_name": "LayoutModel",
"justify_items": null,
"grid_row": null,
"max_height": null,
"align_content": null,
"visibility": null,
"align_self": null,
"height": null,
"min_height": null,
"padding": null,
"grid_auto_rows": null,
"grid_gap": null,
"max_width": null,
"order": null,
"_view_module_version": "1.2.0",
"grid_template_areas": null,
"object_position": null,
"object_fit": null,
"grid_auto_columns": null,
"margin": null,
"display": null,
"left": null
}
},
"5d00bf261e08411ab9c1a6127723f344": {
"model_module": "@jupyter-widgets/controls",
"model_name": "VBoxModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "VBoxView",
"_dom_classes": [],
"_model_name": "VBoxModel",
"_view_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_view_count": null,
"_view_module_version": "1.5.0",
"box_style": "",
"layout": "IPY_MODEL_dc353060f608487fb9a9bf6e71d4c1c1",
"_model_module": "@jupyter-widgets/controls",
"children": [
"IPY_MODEL_e08f20628d6e4553a05a30ede0cd4f5c",
"IPY_MODEL_fc9eb8784a9a4636a79e6f538e599727",
"IPY_MODEL_be93f4ff43ee4ba9b7930044e74c56dd",
"IPY_MODEL_8c627fc877fa42e2aa2eeeafdc86a078",
"IPY_MODEL_4e82782bab1b4871a2f499ac33a3b4cf"
]
}
},
"dc353060f608487fb9a9bf6e71d4c1c1": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_view_name": "LayoutView",
"grid_template_rows": null,
"right": null,
"justify_content": null,
"_view_module": "@jupyter-widgets/base",
"overflow": null,
"_model_module_version": "1.2.0",
"_view_count": null,
"flex_flow": "column",
"width": "50%",
"min_width": null,
"border": null,
"align_items": "center",
"bottom": null,
"_model_module": "@jupyter-widgets/base",
"top": null,
"grid_column": null,
"overflow_y": null,
"overflow_x": null,
"grid_auto_flow": null,
"grid_area": null,
"grid_template_columns": null,
"flex": null,
"_model_name": "LayoutModel",
"justify_items": null,
"grid_row": null,
"max_height": null,
"align_content": null,
"visibility": null,
"align_self": null,
"height": null,
"min_height": null,
"padding": null,
"grid_auto_rows": null,
"grid_gap": null,
"max_width": null,
"order": null,
"_view_module_version": "1.2.0",
"grid_template_areas": null,
"object_position": null,
"object_fit": null,
"grid_auto_columns": null,
"margin": null,
"display": "flex",
"left": null
}
},
"e08f20628d6e4553a05a30ede0cd4f5c": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HTMLModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "HTMLView",
"style": "IPY_MODEL_8533cf21f06b4fd289b60fe6cc9add37",
"_dom_classes": [],
"description": "",
"_model_name": "HTMLModel",
"placeholder": "",
"_view_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"value": "
\n\n \nCopy a token from your Hugging Face tokens page and paste it below.\n \nImmediately click login after copying your token or it might be stored in plain text in this notebook file.\n