{ "cells": [ { "cell_type": "markdown", "id": "942fa22a-c776-4a44-bde9-75b7cb4202ba", "metadata": {}, "source": [ "## Outline\n", "\n", "1. We collect a dataset consisting of (user_question, answer_context, dialogue_history -> answer)\n", "2. We duplicate a small portion of dataset, where we remove answer_context\n", "2. We augment 'answer_context' with (non_answer) picked by a reasonably-performing QA system: variable ordering, consistent number of answers\n", "3. We train the model for exact-match generation \n", "- Also evaluate the exact-match ratio\n", "- Separately evaluate with full-context questions" ] }, { "cell_type": "markdown", "id": "766c4c50-6e72-41b2-b6d7-1e4c3c309a68", "metadata": {}, "source": [ "### 1. Positive contexts collection" ] }, { "cell_type": "code", "execution_count": 1, "id": "33d57a85-c079-4cf1-b9ad-3b00ce916720", "metadata": {}, "outputs": [], "source": [ "import datasets" ] }, { "cell_type": "code", "execution_count": 2, "id": "0434a258-27ca-4cec-bb85-60673fea2b16", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Using custom data configuration default-8d557d41fc795903\n", "Found cached dataset json (/home/xstefan3/.cache/huggingface/datasets/json/default-8d557d41fc795903/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "366db7856ce341a6854a08c244aa5db1", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/1 [00:00, ?it/s]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "canard_train = datasets.load_dataset(\"json\", data_files=\"datasets/CANARD_Release/train.json\")[\"train\"]" ] }, { "cell_type": "code", "execution_count": 3, "id": "73cf434e-8d36-4680-a80d-a9304ef801f2", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Dataset({\n", " features: ['History', 'QuAC_dialog_id', 'Question', 'Question_no', 'Rewrite'],\n", " num_rows: 31526\n", "})" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "canard_train" ] }, { "cell_type": "code", "execution_count": 4, "id": "02eba563-5810-4b0a-b130-920d163a54ac", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'History': ['Johnny Unitas', '1964 MVP season'],\n", " 'QuAC_dialog_id': 'C_2ba58216460d43aa986fc0e897537239_0',\n", " 'Question': 'what team did unitas play for',\n", " 'Question_no': 1,\n", " 'Rewrite': 'what team did Johnny Unitas play for?'}" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "canard_train[0]" ] }, { "cell_type": "code", "execution_count": 5, "id": "b73c5e59-2430-4b00-aa8b-0f926729ada1", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Found cached dataset quac (/home/xstefan3/.cache/huggingface/datasets/quac/plain_text/1.1.0/4170258e7e72d7c81bd6441b3f3489ea1544f0ff226ce61e22bb00c6e9d01fb6)\n" ] } ], "source": [ "quac_train = datasets.load_dataset(\"quac\", split=\"train\")" ] }, { "cell_type": "code", "execution_count": 6, "id": "21e21544-b65c-433d-86c1-30d4507088e7", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | wikipedia_page_title | \n", "background | \n", "section_title | \n", "context | \n", "turn_ids | \n", "questions | \n", "followups | \n", "yesnos | \n", "answers | \n", "orig_answers | \n", "
---|---|---|---|---|---|---|---|---|---|---|
dialogue_id | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
C_69758fcdfc1f46baba0e92c0f3b0919c_1 | \n", "Malayali | \n", "The Malayali people or Keralite people (also s... | \n", "Geographic distribution and population | \n", "According to the Indian census of 2001, there ... | \n", "[C_69758fcdfc1f46baba0e92c0f3b0919c_1_q#0, C_6... | \n", "[Where is Malayali located?, What other langua... | \n", "[2, 1, 1, 1, 1, 1, 1] | \n", "[2, 2, 2, 2, 2, 0, 2] | \n", "{'texts': [['30,803,747 speakers of Malayalam ... | \n", "{'texts': ['30,803,747 speakers of Malayalam i... | \n", "
C_69758fcdfc1f46baba0e92c0f3b0919c_0 | \n", "Malayali | \n", "The Malayali people or Keralite people (also s... | \n", "Language and literature | \n", "Malayalam is the language spoken by the Malaya... | \n", "[C_69758fcdfc1f46baba0e92c0f3b0919c_0_q#0, C_6... | \n", "[what language do they speak?, Do they speak a... | \n", "[0, 0, 0, 0, 0, 0, 0] | \n", "[2, 2, 2, 2, 2, 2, 2] | \n", "{'texts': [['Malayalam is the language spoken ... | \n", "{'texts': ['Malayalam is the language spoken b... | \n", "