Nechba commited on
Commit
28a9a4b
β€’
1 Parent(s): 24d828a

update Benchmark 2

Browse files
Benchmark2/benchmark-2-gemini-pro-and-openhermes-mistral.ipynb DELETED
The diff for this file is too large to render. See raw diff
 
Benchmark2/benchmark-2-llama-2-7b-and-llama-2-13b.ipynb DELETED
The diff for this file is too large to render. See raw diff
 
Benchmark2/benchmark-2-mistral-7b-instruct-v0-2.ipynb DELETED
The diff for this file is too large to render. See raw diff
 
Benchmark2/benchmark-2-openchat-3-5-1210-and-dolphin-2-2-1.ipynb DELETED
The diff for this file is too large to render. See raw diff
 
Benchmark2/gemini-pro-openhermes-mistral-and-mistral-7b.ipynb ADDED
@@ -0,0 +1 @@
 
 
1
+ {"metadata":{"accelerator":"GPU","colab":{"gpuType":"T4","provenance":[]},"gpuClass":"standard","kernelspec":{"name":"python3","display_name":"Python 3","language":"python"},"language_info":{"name":"python","version":"3.10.13","mimetype":"text/x-python","codemirror_mode":{"name":"ipython","version":3},"pygments_lexer":"ipython3","nbconvert_exporter":"python","file_extension":".py"},"kaggle":{"accelerator":"nvidiaTeslaT4","dataSources":[{"sourceId":7571253,"sourceType":"datasetVersion","datasetId":4407676},{"sourceId":7678915,"sourceType":"datasetVersion","datasetId":4479814},{"sourceId":7713636,"sourceType":"datasetVersion","datasetId":4504654},{"sourceId":7964016,"sourceType":"datasetVersion","datasetId":4685329},{"sourceId":8017122,"sourceType":"datasetVersion","datasetId":4723613}],"dockerImageVersionId":30684,"isInternetEnabled":true,"language":"python","sourceType":"notebook","isGpuEnabled":true}},"nbformat_minor":4,"nbformat":4,"cells":[{"cell_type":"markdown","source":"<h2><center><font color=#D40004><u>Benchmark 2: Gemini-1.5-Pro, Gemini-Pro, OpenHermes-Mistral and Mistral-7B </u></font></center></h2>\n","metadata":{}},{"cell_type":"markdown","source":"<div style=\"padding: 40px; background: linear-gradient(135deg, #f5f7fa, #cdd2d8); border: 3px groove #d1d8e0; border-radius: 30px; box-shadow: 0 10px 25px rgba(0,0,0,0.1); font-size: 120%; line-height: 1.9; color: #333; font-family: 'Georgia', serif; text-align: justify; position: relative;\">\n <h2 style=\"color: #2c3e50; font-size: 150%; border-bottom: 3px solid #3498db; display: inline-block; padding-bottom: 10px; margin-bottom: 20px;\">\n Notebook Gool\n </h2>\n <p style=\"font-size: 140%; color: #34495e; letter-spacing: 1px;\">T\nThe objective of this notebook is to evaluate the performance of Gemini-1.5-Pro, Gemini-Pro, OpenHermes, and Mistral-7B using the Table-extract Benchmark dataset available at <a href=\"https://huggingface.co/datasets/Effyis/Table-Extraction\" style=\"color: #fffff; text-decoration: none;\">Hugging Face.</a></p>\n</div>\n","metadata":{}},{"cell_type":"markdown","source":"# <div style=\"padding: 30px; color:white; margin:10; font-size:75%; text-align:center; display:fill; border-radius:10px; background-color:#3b3745\"><b><span style='color:#F1A424'></span></b> <b>Table of Content</b></div>\n\n* [I. Loading and Importing Libraries](#1)\n* [II. Definition and Implementation of Metrics](#2)\n* [III. Clean Response Obtained by LLM](#3)\n* [IV. Data Preparation](#5)\n* [V. Benchmark](#6)\n * [Prompt](#61)\n * [Gemini-1.5-Pro-latest](#62)\n * [Gemini-Pro](#63)\n * [OpenHermes-Mistral](#64)\n * [ Mistral-7B-Instruct-v0.2](#65)","metadata":{}},{"cell_type":"markdown","source":"<a id='1'></a>\n# <div style=\"padding: 30px; color:white; margin:10; font-size:75%; text-align:left; display:fill; border-radius:10px; background-color:#3b3745\"><b><span style='color:#F1A424'>I |</span></b> <b>Loading and Importing Libraries</b></div>\n","metadata":{}},{"cell_type":"code","source":"%%capture\n!pip install google-generativeai\n!pip install --upgrade pip\n!pip install bitsandbytes\n!pip install transformers","metadata":{"execution":{"iopub.status.busy":"2024-04-16T13:12:19.002598Z","iopub.execute_input":"2024-04-16T13:12:19.003425Z","iopub.status.idle":"2024-04-16T13:13:28.478550Z","shell.execute_reply.started":"2024-04-16T13:12:19.003393Z","shell.execute_reply":"2024-04-16T13:13:28.477476Z"},"trusted":true},"execution_count":1,"outputs":[]},{"cell_type":"code","source":"import re\nimport json\nfrom tqdm import tqdm\nimport pandas as pd\nfrom datasets import load_dataset, Dataset\nfrom wand.image import Image as WImage\nimport torch\nimport pandas as pd\nfrom transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig\nimport time \nimport random\nimport numpy as np","metadata":{"execution":{"iopub.status.busy":"2024-04-16T13:13:28.480451Z","iopub.execute_input":"2024-04-16T13:13:28.480794Z","iopub.status.idle":"2024-04-16T13:13:37.393188Z","shell.execute_reply.started":"2024-04-16T13:13:28.480765Z","shell.execute_reply":"2024-04-16T13:13:37.392033Z"},"trusted":true},"execution_count":2,"outputs":[]},{"cell_type":"code","source":"import google.generativeai as genai\nimport time \ngenai.configure(api_key=\"AIzaSyAhz9UBzkEIYI886zZRm40qqB1Kd_9Y4-0\")","metadata":{"execution":{"iopub.status.busy":"2024-04-16T13:13:37.394787Z","iopub.execute_input":"2024-04-16T13:13:37.395288Z","iopub.status.idle":"2024-04-16T13:13:38.039962Z","shell.execute_reply.started":"2024-04-16T13:13:37.395257Z","shell.execute_reply":"2024-04-16T13:13:38.039067Z"},"trusted":true},"execution_count":3,"outputs":[]},{"cell_type":"code","source":"# Set random seed for reproducibility\nrandom.seed(42)\nnp.random.seed(42)\ntorch.manual_seed(42)\ntorch.cuda.manual_seed_all(42)\ntorch.backends.cudnn.deterministic = True\ntorch.backends.cudnn.benchmark = False","metadata":{"execution":{"iopub.status.busy":"2024-04-16T13:13:38.041765Z","iopub.execute_input":"2024-04-16T13:13:38.042332Z","iopub.status.idle":"2024-04-16T13:13:38.050925Z","shell.execute_reply.started":"2024-04-16T13:13:38.042303Z","shell.execute_reply":"2024-04-16T13:13:38.049937Z"},"trusted":true},"execution_count":4,"outputs":[]},{"cell_type":"markdown","source":"<a id='2'></a>\n# <div style=\"padding: 30px; color:white; margin:10; font-size:75%; text-align:left; display:fill; border-radius:10px; background-color:#3b3745\"><b><span style='color:#F1A424'>II |</span></b> <b>Definition and Implementation of Metrics</b></div>\nSo, let's begin by providing an example of the example output.","metadata":{}},{"cell_type":"code","source":"desired_output = [{'aircraft': 'robinson r - 22',\n 'description': 'light utility helicopter',\n 'max gross weight': '1370 lb (635 kg)',\n 'total disk area': '497 ft square (46.2 m square)',\n 'max disk loading': '2.6 lb / ft square (14 kg / m square)'},\n {'aircraft': 'bell 206b3 jetranger',\n 'description': 'turboshaft utility helicopter',\n 'max gross weight': '3200 lb (1451 kg)',\n 'total disk area': '872 ft square (81.1 m square)',\n 'max disk loading': '3.7 lb / ft square (18 kg / m square)'},\n {'aircraft': 'ch - 47d chinook',\n 'description': 'tandem rotor helicopter',\n 'max gross weight': '50000 lb (22680 kg)',\n 'total disk area': '5655 ft square (526 m square)',\n 'max disk loading': '8.8 lb / ft square (43 kg / m square)'},\n {'aircraft': 'mil mi - 26',\n 'description': 'heavy - lift helicopter',\n 'max gross weight': '123500 lb (56000 kg)',\n 'total disk area': '8495 ft square (789 m square)',\n 'max disk loading': '14.5 lb / ft square (71 kg / m square)'},\n {'aircraft': 'ch - 53e super stallion',\n 'description': 'heavy - lift helicopter',\n 'max gross weight': '73500 lb (33300 kg)',\n 'total disk area': '4900 ft square (460 m square)',\n 'max disk loading': '15 lb / ft square (72 kg / m square)'}]\n","metadata":{"execution":{"iopub.status.busy":"2024-04-16T13:24:44.748719Z","iopub.execute_input":"2024-04-16T13:24:44.749654Z","iopub.status.idle":"2024-04-16T13:24:44.756340Z","shell.execute_reply.started":"2024-04-16T13:24:44.749619Z","shell.execute_reply":"2024-04-16T13:24:44.755387Z"},"trusted":true},"execution_count":5,"outputs":[]},{"cell_type":"markdown","source":"To compare between the expected list of records and the predicted list of records, we first need to verify the percentage of predicted keys relative to the desired keys","metadata":{}},{"cell_type":"markdown","source":">## Percentage of predicted keys","metadata":{}},{"cell_type":"markdown","source":"Let's begin by defining a function to retrieve all keys of record","metadata":{}},{"cell_type":"code","source":"def get_keys(d):\n # Iterate over each key-value pair in the dictionary\n for k, v in d.items():\n # Append the key to the list of all_keys\n all_keys.append(k)\n # If the value is a dictionary, recursively call get_keys\n if isinstance(v, dict):\n get_keys(v)\n # If the value is a list, iterate over each item\n elif isinstance(v, list):\n for item in v:\n # If the item is a dictionary, recursively call get_keys\n if isinstance(item, dict):\n get_keys(item)\n# Define a function to retrieve all unique keys from a nested dictionary\ndef get_all_keys(d):\n # Declare all_keys as a global variable\n global all_keys\n # Initialize all_keys as an empty list\n all_keys = []\n # Call the helper function get_keys to populate all_keys\n get_keys(d)\n # Return a list containing the unique keys by converting all_keys to a set and then back to a list\n return list(set(all_keys))","metadata":{"execution":{"iopub.status.busy":"2024-04-16T13:24:47.352071Z","iopub.execute_input":"2024-04-16T13:24:47.352417Z","iopub.status.idle":"2024-04-16T13:24:47.359151Z","shell.execute_reply.started":"2024-04-16T13:24:47.352392Z","shell.execute_reply":"2024-04-16T13:24:47.358216Z"},"trusted":true},"execution_count":6,"outputs":[]},{"cell_type":"code","source":"# Testing our function\nget_all_keys(desired_output[0])","metadata":{"execution":{"iopub.status.busy":"2024-04-16T09:18:57.764316Z","iopub.execute_input":"2024-04-16T09:18:57.765162Z","iopub.status.idle":"2024-04-16T09:18:57.772892Z","shell.execute_reply.started":"2024-04-16T09:18:57.765132Z","shell.execute_reply":"2024-04-16T09:18:57.771802Z"},"trusted":true},"execution_count":7,"outputs":[{"execution_count":7,"output_type":"execute_result","data":{"text/plain":"['max disk loading',\n 'total disk area',\n 'aircraft',\n 'max gross weight',\n 'description']"},"metadata":{}}]},{"cell_type":"markdown","source":"Now, we define the percentage of predicted keys as follows:\n\n$$\\Large \\text{Percentage of predicted keys} = \\frac{\\text{Number of correctly predicted keys}}{\\text{Total number of true keys}}$$\nThis percentage is calculated for every record in the list, then summed and divided by the number of records in the list.","metadata":{}},{"cell_type":"code","source":"def process_dict(data):\n if isinstance(data, dict):\n for key, value in data.items():\n if isinstance(value, str):\n data[key] = value.strip().lower()\n elif isinstance(value, list):\n data[key] = [process_dict(item) for item in value]\n elif isinstance(value, dict):\n data[key] = process_dict(value)\n return data","metadata":{"execution":{"iopub.status.busy":"2024-04-16T13:25:02.562198Z","iopub.execute_input":"2024-04-16T13:25:02.562934Z","iopub.status.idle":"2024-04-16T13:25:02.569006Z","shell.execute_reply.started":"2024-04-16T13:25:02.562903Z","shell.execute_reply":"2024-04-16T13:25:02.567897Z"},"trusted":true},"execution_count":7,"outputs":[]},{"cell_type":"code","source":"def percentage_of_predicted_keys(true_dic, pred_dic):\n true_dic=process_dict(true_dic)\n pred_dic=process_dict(pred_dic)\n # Get all keys of the true dictionary\n all_keys_of_true_dic = get_all_keys(true_dic)\n # Get all keys of the predicted dictionary\n all_keys_of_pred_dic = get_all_keys(pred_dic)\n \n # Check if there are no keys in the true dictionary to avoid division by zero\n if len(all_keys_of_true_dic) == 0:\n return 0 # Avoid division by zero\n \n # Initialize count of predicted keys\n p_keys = 0\n # Iterate through all keys in the predicted dictionary\n for key in all_keys_of_pred_dic:\n # Check if the key is also present in the true dictionary\n if key in all_keys_of_true_dic:\n # Increment count if the key is found in both dictionaries\n p_keys += 1\n \n # Calculate the percentage of predicted keys compared to true keys\n p_keys /= len(all_keys_of_true_dic)\n # Return the percentage of predicted keys\n return p_keys","metadata":{"execution":{"iopub.status.busy":"2024-04-16T13:25:04.738122Z","iopub.execute_input":"2024-04-16T13:25:04.739060Z","iopub.status.idle":"2024-04-16T13:25:04.745531Z","shell.execute_reply.started":"2024-04-16T13:25:04.739023Z","shell.execute_reply":"2024-04-16T13:25:04.744305Z"},"trusted":true},"execution_count":8,"outputs":[]},{"cell_type":"code","source":"def average_percentage_key(true_list, pred_list):\n min_length = min(len(true_list), len(pred_list)) # Find the minimum length of the two lists\n score = 0\n for i in range(min_length):\n score += percentage_of_predicted_keys(true_list[i], pred_list[i])\n return score / len(true_list)","metadata":{"execution":{"iopub.status.busy":"2024-04-16T13:25:06.245917Z","iopub.execute_input":"2024-04-16T13:25:06.246241Z","iopub.status.idle":"2024-04-16T13:25:06.251418Z","shell.execute_reply.started":"2024-04-16T13:25:06.246218Z","shell.execute_reply":"2024-04-16T13:25:06.250442Z"},"trusted":true},"execution_count":9,"outputs":[]},{"cell_type":"code","source":"# Example true and predicted lists\ntrue_list = [{'key1': 1, 'key2': 2, 'key3': 3}, {'key1': 4, 'key2': 5, 'key3': 6}, {'key1': 7, 'key2': 8, 'key3': 9}]\npred_list = [{'key1': 1, 'key2': 2, 'key3': 3}, {'key1': 4, 'key2': 5, 'key3': 7}, {'key1': 7, 'key2': 8, 'key3': 9}]\n\n# Test the function\nresult = average_percentage_key(true_list, pred_list)\nprint(\"Average percentage of keys:\", result)","metadata":{"execution":{"iopub.status.busy":"2024-04-16T13:25:08.513042Z","iopub.execute_input":"2024-04-16T13:25:08.513694Z","iopub.status.idle":"2024-04-16T13:25:08.520278Z","shell.execute_reply.started":"2024-04-16T13:25:08.513663Z","shell.execute_reply":"2024-04-16T13:25:08.519432Z"},"trusted":true},"execution_count":10,"outputs":[{"name":"stdout","text":"Average percentage of keys: 1.0\n","output_type":"stream"}]},{"cell_type":"markdown","source":"Now we will define the principal metrics used to compare the values of two list recods.","metadata":{}},{"cell_type":"markdown","source":">## Percentage of predicted values\n\nThe function calculates the percentage of correctly predicted values compared to the total number of true values across different types of data structures.\n\nThe formula for calculating the percentage of values is as follows:\n\n$$\n\\text{Average percentage of values} = \\frac{\\sum_{i=1}^{\\text{Total number of records}} p_i }{Total number of records}\n$$\n\nHere, $p_i$ represents the percentage of correctly predicted values for each key. It's calculated as:\n\n$$p_i = \\frac{\\text{Number of correctly predicted values of item i}}{\\text{Total number of true values of item i}}$$","metadata":{}},{"cell_type":"code","source":"def calculate_percentage_of_values(true_dic, pred_dic):\n total_percentage = 0 # Initialize total percentage\n # Type 1: Single string values\n for key, true_value in true_dic.items(): # Loop through key-value pairs in true_dic\n \n # Check if the key exists in pred_dic, if its value is a string and if it matches the true value\n if key in pred_dic and str(pred_dic[key]) == str(true_value):\n match = 1 # Assign perfect match\n else:\n match = 0 # Assign no match\n total_percentage += match\n return total_percentage / len(true_dic) # Calculate and return the average percentage","metadata":{"execution":{"iopub.status.busy":"2024-04-16T13:25:11.264721Z","iopub.execute_input":"2024-04-16T13:25:11.265579Z","iopub.status.idle":"2024-04-16T13:25:11.271268Z","shell.execute_reply.started":"2024-04-16T13:25:11.265546Z","shell.execute_reply":"2024-04-16T13:25:11.270222Z"},"trusted":true},"execution_count":11,"outputs":[]},{"cell_type":"code","source":"def average_percentage_value(true_list, pred_list):\n min_length = min(len(true_list), len(pred_list)) # Find the minimum length of the two lists\n score = 0\n for i in range(min_length):\n score += calculate_percentage_of_values(true_list[i], pred_list[i])\n return score / len(true_list)","metadata":{"execution":{"iopub.status.busy":"2024-04-16T13:25:12.845015Z","iopub.execute_input":"2024-04-16T13:25:12.845667Z","iopub.status.idle":"2024-04-16T13:25:12.850686Z","shell.execute_reply.started":"2024-04-16T13:25:12.845635Z","shell.execute_reply":"2024-04-16T13:25:12.849668Z"},"trusted":true},"execution_count":12,"outputs":[]},{"cell_type":"code","source":"# Example true and predicted lists\ntrue_list = [{'key1': 1, 'key2': 2, 'key3': 3}, {'key1': 4, 'key2': 5, 'key3': 6}, {'key1': 7, 'key2': 8, 'key3': 9}]\npred_list = [{'key1': 1, 'key2': 2, 'key3': 3}, {'key1': 4, 'key2': 5, 'key3': 7}, {'key1': 7, 'key2': 8, 'key3': 9}]\n\n# Test the function\nresult = average_percentage_value(true_list, pred_list)\nprint(\"Average percentage of keys:\", result)","metadata":{"execution":{"iopub.status.busy":"2024-04-16T13:25:14.665414Z","iopub.execute_input":"2024-04-16T13:25:14.665787Z","iopub.status.idle":"2024-04-16T13:25:14.672461Z","shell.execute_reply.started":"2024-04-16T13:25:14.665760Z","shell.execute_reply":"2024-04-16T13:25:14.671470Z"},"trusted":true},"execution_count":13,"outputs":[{"name":"stdout","text":"Average percentage of keys: 0.8888888888888888\n","output_type":"stream"}]},{"cell_type":"markdown","source":"<a id='3'></a>\n# <div style=\"padding: 30px; color:white; margin:10; font-size:75%; text-align:left; display:fill; border-radius:10px; background-color:#3b3745\"><b><span style='color:#F1A424'>III |</span></b> <b>Clean Response Obtained by LLM</b></div>\n","metadata":{}},{"cell_type":"code","source":"import json\n\ndef parse_json(data_str):\n # Remove leading/trailing whitespace and newlines\n data_str = data_str.strip()\n\n # Check if the string is enclosed within triple backticks (\"```json\" and \"```\")\n if data_str.startswith(\"```json\") and data_str.endswith(\"```\"):\n # Remove the leading/trailing \"```json\" and \"```\"\n data_str = data_str[len(\"```json\"): -len(\"```\")]\n\n try:\n # Parse JSON\n data = json.loads(data_str)\n return data\n except json.JSONDecodeError as e:\n print(\"JSON parsing error:\", e)\n return None","metadata":{"execution":{"iopub.status.busy":"2024-04-16T13:25:16.641838Z","iopub.execute_input":"2024-04-16T13:25:16.642200Z","iopub.status.idle":"2024-04-16T13:25:16.648128Z","shell.execute_reply.started":"2024-04-16T13:25:16.642172Z","shell.execute_reply":"2024-04-16T13:25:16.647185Z"},"trusted":true},"execution_count":14,"outputs":[]},{"cell_type":"code","source":"response_str = \"\"\"[{\"aircraft\": \"robinson r - 22\",\n \"description\": \"light utility helicopter\",\n \"max gross weight\": \"1370 lb (635 kg)\",\n \"total disk area\": \"497 ft square (46.2 m square)\",\n \"max disk loading\": \"2.6 lb / ft square (14 kg / m square)\"},\n{\"aircraft\": \"bell 206b3 jetranger\",\n \"description\": \"turboshaft utility helicopter\",\n \"max gross weight\": \"3200 lb (1451 kg)\",\n \"total disk area\": \"872 ft square (81.1 m square)\",\n \"max disk loading\": \"3.7 lb / ft square (18 kg / m square)\"},\n{\"aircraft\": \"ch - 47d chinook\",\n \"description\": \"tandem rotor helicopter\",\n \"max gross weight\": \"50000 lb (22680 kg)\",\n \"total disk area\": \"5655 ft square (526 m square)\",\n \"max disk loading\": \"8.8 lb / ft square (43 kg / m square)\"},\n{\"aircraft\": \"mil mi - 26\",\n \"description\": \"heavy - lift helicopter\",\n \"max gross weight\": \"123500 lb (56000 kg)\",\n \"total disk area\": \"8495 ft square (789 m square)\",\n \"max disk loading\": \"14.5 lb / ft square (71 kg / m square)\"},\n{\"aircraft\": \"ch - 53e super stallion\",\n \"description\": \"heavy - lift helicopter\",\n \"max gross weight\": \"73500 lb (33300 kg)\",\n \"total disk area\": \"4900 ft square (460 m square)\",\n \"max disk loading\": \"15 lb / ft square (72 kg / m square)\"}]\"\"\"\n\n# Convert the string representation to a list of dictionaries\nparse_json(response_str)","metadata":{"execution":{"iopub.status.busy":"2024-04-16T13:25:18.225851Z","iopub.execute_input":"2024-04-16T13:25:18.226646Z","iopub.status.idle":"2024-04-16T13:25:18.236378Z","shell.execute_reply.started":"2024-04-16T13:25:18.226611Z","shell.execute_reply":"2024-04-16T13:25:18.235495Z"},"trusted":true},"execution_count":15,"outputs":[{"execution_count":15,"output_type":"execute_result","data":{"text/plain":"[{'aircraft': 'robinson r - 22',\n 'description': 'light utility helicopter',\n 'max gross weight': '1370 lb (635 kg)',\n 'total disk area': '497 ft square (46.2 m square)',\n 'max disk loading': '2.6 lb / ft square (14 kg / m square)'},\n {'aircraft': 'bell 206b3 jetranger',\n 'description': 'turboshaft utility helicopter',\n 'max gross weight': '3200 lb (1451 kg)',\n 'total disk area': '872 ft square (81.1 m square)',\n 'max disk loading': '3.7 lb / ft square (18 kg / m square)'},\n {'aircraft': 'ch - 47d chinook',\n 'description': 'tandem rotor helicopter',\n 'max gross weight': '50000 lb (22680 kg)',\n 'total disk area': '5655 ft square (526 m square)',\n 'max disk loading': '8.8 lb / ft square (43 kg / m square)'},\n {'aircraft': 'mil mi - 26',\n 'description': 'heavy - lift helicopter',\n 'max gross weight': '123500 lb (56000 kg)',\n 'total disk area': '8495 ft square (789 m square)',\n 'max disk loading': '14.5 lb / ft square (71 kg / m square)'},\n {'aircraft': 'ch - 53e super stallion',\n 'description': 'heavy - lift helicopter',\n 'max gross weight': '73500 lb (33300 kg)',\n 'total disk area': '4900 ft square (460 m square)',\n 'max disk loading': '15 lb / ft square (72 kg / m square)'}]"},"metadata":{}}]},{"cell_type":"markdown","source":"<a id='5'></a>\n# <div style=\"padding: 30px; color:white; margin:10; font-size:75%; text-align:left; display:fill; border-radius:10px; background-color:#3b3745\"><b><span style='color:#F1A424'>IV |</span></b> <b> Data Preparation</b></div>\n","metadata":{}},{"cell_type":"markdown","source":"I'll extract a sample of 100 records from the dataset excluding those with Arabic names, and then simplify the output to enhance performance.","metadata":{}},{"cell_type":"code","source":"df = pd.read_csv(\"/kaggle/input/table-extraction/table_extract.csv\")\ndf.head(5)","metadata":{"execution":{"iopub.status.busy":"2024-04-16T13:25:21.195467Z","iopub.execute_input":"2024-04-16T13:25:21.196506Z","iopub.status.idle":"2024-04-16T13:25:22.639971Z","shell.execute_reply.started":"2024-04-16T13:25:21.196467Z","shell.execute_reply":"2024-04-16T13:25:22.638999Z"},"trusted":true},"execution_count":16,"outputs":[{"execution_count":16,"output_type":"execute_result","data":{"text/plain":" context \\\n0 aircraft ... \n1 order year manufacturer mod... \n2 player no nationality ... \n3 player no nationali... \n4 player no nationality ... \n\n answer \n0 {\"aircraft\":{\"0\":\"robinson r - 22\",\"1\":\"bell 2... \n1 {\"order year\":{\"0\":\"1992 - 93\",\"1\":\"1996\",\"2\":... \n2 {\"player\":{\"0\":\"quincy acy\",\"1\":\"hassan adams\"... \n3 {\"player\":{\"0\":\"patrick o'bryant\",\"1\":\"jermain... \n4 {\"player\":{\"0\":\"mark baker\",\"1\":\"marcus banks\"... ","text/html":"<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>context</th>\n <th>answer</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>aircraft ...</td>\n <td>{\"aircraft\":{\"0\":\"robinson r - 22\",\"1\":\"bell 2...</td>\n </tr>\n <tr>\n <th>1</th>\n <td>order year manufacturer mod...</td>\n <td>{\"order year\":{\"0\":\"1992 - 93\",\"1\":\"1996\",\"2\":...</td>\n </tr>\n <tr>\n <th>2</th>\n <td>player no nationality ...</td>\n <td>{\"player\":{\"0\":\"quincy acy\",\"1\":\"hassan adams\"...</td>\n </tr>\n <tr>\n <th>3</th>\n <td>player no nationali...</td>\n <td>{\"player\":{\"0\":\"patrick o'bryant\",\"1\":\"jermain...</td>\n </tr>\n <tr>\n <th>4</th>\n <td>player no nationality ...</td>\n <td>{\"player\":{\"0\":\"mark baker\",\"1\":\"marcus banks\"...</td>\n </tr>\n </tbody>\n</table>\n</div>"},"metadata":{}}]},{"cell_type":"code","source":"def is_arabic_name(name):\n \"\"\"\n Checks if a name contains Arabic characters.\n\n Args:\n name: The name string to check.\n\n Returns:\n True if Arabic characters are found, False otherwise.\n \"\"\"\n # Regular expression to match Arabic characters\n arabic_pattern = re.compile(\"[\\u0600-\\u06FF]+\")\n\n # Search for Arabic characters in the name\n match = arabic_pattern.search(name)\n\n # Return True if a match is found, False otherwise\n return bool(match)","metadata":{"execution":{"iopub.status.busy":"2024-04-16T13:25:22.641598Z","iopub.execute_input":"2024-04-16T13:25:22.642000Z","iopub.status.idle":"2024-04-16T13:25:22.647637Z","shell.execute_reply.started":"2024-04-16T13:25:22.641973Z","shell.execute_reply":"2024-04-16T13:25:22.646545Z"},"trusted":true},"execution_count":17,"outputs":[]},{"cell_type":"code","source":"df = df[~df['context'].apply(lambda x: is_arabic_name(x))]","metadata":{"execution":{"iopub.status.busy":"2024-04-16T13:25:24.520859Z","iopub.execute_input":"2024-04-16T13:25:24.521712Z","iopub.status.idle":"2024-04-16T13:25:25.489780Z","shell.execute_reply.started":"2024-04-16T13:25:24.521677Z","shell.execute_reply":"2024-04-16T13:25:25.488506Z"},"trusted":true},"execution_count":18,"outputs":[]},{"cell_type":"code","source":"df_sample =df.loc[:100]","metadata":{"execution":{"iopub.status.busy":"2024-04-16T13:25:26.231506Z","iopub.execute_input":"2024-04-16T13:25:26.232186Z","iopub.status.idle":"2024-04-16T13:25:26.239060Z","shell.execute_reply.started":"2024-04-16T13:25:26.232150Z","shell.execute_reply":"2024-04-16T13:25:26.237913Z"},"trusted":true},"execution_count":19,"outputs":[]},{"cell_type":"code","source":"def transform_json_to_records(json_data):\n \"\"\"\n Transforms a structured JSON object into a list of records.\n\n The function assumes the structure of the JSON object is a dictionary of dictionaries,\n where each top-level key is a field name, and its value is a dictionary mapping indices\n to field values. All sub-dictionaries must have the same keys.\n\n Parameters:\n - json_data: A dictionary representing the structured JSON object to transform.\n\n Returns:\n - A list of dictionaries, where each dictionary represents a record with fields and values\n derived from the input JSON.\n \"\"\"\n json_data = json.loads(json_data)\n # Extract keys from the first dictionary item to use as indices\n indices = list(next(iter(json_data.values())).keys())\n # Initialize the list to store transformed records\n records = []\n\n # Loop over each index to create a record\n for index in indices:\n record = {field: values[index] for field, values in json_data.items()}\n records.append(record)\n\n return records","metadata":{"execution":{"iopub.status.busy":"2024-04-16T13:25:27.953737Z","iopub.execute_input":"2024-04-16T13:25:27.954385Z","iopub.status.idle":"2024-04-16T13:25:27.961373Z","shell.execute_reply.started":"2024-04-16T13:25:27.954351Z","shell.execute_reply":"2024-04-16T13:25:27.960387Z"},"trusted":true},"execution_count":20,"outputs":[]},{"cell_type":"code","source":"df_sample.loc[:, 'answer'] = df_sample['answer'].map(transform_json_to_records)","metadata":{"execution":{"iopub.status.busy":"2024-04-16T13:25:31.129263Z","iopub.execute_input":"2024-04-16T13:25:31.130144Z","iopub.status.idle":"2024-04-16T13:25:31.142875Z","shell.execute_reply.started":"2024-04-16T13:25:31.130106Z","shell.execute_reply":"2024-04-16T13:25:31.141806Z"},"trusted":true},"execution_count":21,"outputs":[]},{"cell_type":"code","source":"df_sample.head()","metadata":{"execution":{"iopub.status.busy":"2024-04-16T13:25:32.664227Z","iopub.execute_input":"2024-04-16T13:25:32.664981Z","iopub.status.idle":"2024-04-16T13:25:32.698044Z","shell.execute_reply.started":"2024-04-16T13:25:32.664945Z","shell.execute_reply":"2024-04-16T13:25:32.697121Z"},"trusted":true},"execution_count":22,"outputs":[{"execution_count":22,"output_type":"execute_result","data":{"text/plain":" context \\\n0 aircraft ... \n1 order year manufacturer mod... \n2 player no nationality ... \n3 player no nationali... \n4 player no nationality ... \n\n answer \n0 [{'aircraft': 'robinson r - 22', 'description'... \n1 [{'order year': '1992 - 93', 'manufacturer': '... \n2 [{'player': 'quincy acy', 'no': '4', 'national... \n3 [{'player': 'patrick o'bryant', 'no': 13, 'nat... \n4 [{'player': 'mark baker', 'no': '3', 'national... ","text/html":"<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>context</th>\n <th>answer</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>aircraft ...</td>\n <td>[{'aircraft': 'robinson r - 22', 'description'...</td>\n </tr>\n <tr>\n <th>1</th>\n <td>order year manufacturer mod...</td>\n <td>[{'order year': '1992 - 93', 'manufacturer': '...</td>\n </tr>\n <tr>\n <th>2</th>\n <td>player no nationality ...</td>\n <td>[{'player': 'quincy acy', 'no': '4', 'national...</td>\n </tr>\n <tr>\n <th>3</th>\n <td>player no nationali...</td>\n <td>[{'player': 'patrick o'bryant', 'no': 13, 'nat...</td>\n </tr>\n <tr>\n <th>4</th>\n <td>player no nationality ...</td>\n <td>[{'player': 'mark baker', 'no': '3', 'national...</td>\n </tr>\n </tbody>\n</table>\n</div>"},"metadata":{}}]},{"cell_type":"markdown","source":"<a id='6'></a>\n# <div style=\"padding: 30px; color:white; margin:10; font-size:75%; text-align:left; display:fill; border-radius:10px; background-color:#3b3745\"><b><span style='color:#F1A424'>V |</span></b> <b>Benchmark</b></div>\n","metadata":{}},{"cell_type":"markdown","source":"<a id='61'></a>\n>## Prompt","metadata":{}},{"cell_type":"code","source":"prompt = \"\"\"Your task is to extract relevant information from the provided context and format it into a list of records, following the template below.\n A JSON object representing the extracted table structure. The list of records follows this format: \n [ { \"column_1\": \"val1\",\"column_2\": \"val1\",\"column_3\": \"val1\",...},\n { \"column_1\": \"val2\",\"column_2\": \"val2\",\"column_3\": \"val3\",...},\n ...\n ]\n Each key in the records represents a column header, and the corresponding value is another object containing key-value pairs for each row in that column.\n\nINPUT example:\n# do not use the data from the examples & template; they are just for reference only. The following data contains actual information. If a value is not found, leave it empty. \n\n aircraft description max gross weight total disk area max disk loading\n0 robinson r - 22 light utility helicopter 1370 lb (635 kg) 497 ft square (46.2 m square) 2.6 lb / ft square (14 kg / m square)\n1 bell 206b3 jetranger turboshaft utility helicopter 3200 lb (1451 kg) 872 ft square (81.1 m square) 3.7 lb / ft square (18 kg / m square)\n2 ch - 47d chinook tandem rotor helicopter 50000 lb (22680 kg) 5655 ft square (526 m square) 8.8 lb / ft square (43 kg / m square)\n3 mil mi - 26 heavy - lift helicopter 123500 lb (56000 kg) 8495 ft square (789 m square) 14.5 lb / ft square (71 kg / m square)\n4 ch - 53e super stallion heavy - lift helicopter 73500 lb (33300 kg) 4900 ft square (460 m square) 15 lb / ft square (72 kg / m square)\n\nOUTPUT example:\n# do not use the data from the examples & template; they are just for reference only. The following data contains actual information. If a value is not found, leave it empty. \n[{\"aircraft\": \"robinson r - 22\",\n \"description\": \"light utility helicopter\",\n \"max gross weight\": \"1370 lb (635 kg)\",\n \"total disk area\": \"497 ft square (46.2 m square)\",\n \"max disk loading\": \"2.6 lb / ft square (14 kg / m square)\"},\n{\"aircraft\": \"bell 206b3 jetranger\",\n \"description\": \"turboshaft utility helicopter\",\n \"max gross weight\": \"3200 lb (1451 kg)\",\n \"total disk area\": \"872 ft square (81.1 m square)\",\n \"max disk loading\": \"3.7 lb / ft square (18 kg / m square)\"},\n{\"aircraft\": \"ch - 47d chinook\",\n \"description\": \"tandem rotor helicopter\",\n \"max gross weight\": \"50000 lb (22680 kg)\",\n \"total disk area\": \"5655 ft square (526 m square)\",\n \"max disk loading\": \"8.8 lb / ft square (43 kg / m square)\"},\n{\"aircraft\": \"mil mi - 26\",\n \"description\": \"heavy - lift helicopter\",\n \"max gross weight\": \"123500 lb (56000 kg)\",\n \"total disk area\": \"8495 ft square (789 m square)\",\n \"max disk loading\": \"14.5 lb / ft square (71 kg / m square)\"},\n{\"aircraft\": \"ch - 53e super stallion\",\n \"description\": \"heavy - lift helicopter\",\n \"max gross weight\": \"73500 lb (33300 kg)\",\n \"total disk area\": \"4900 ft square (460 m square)\",\n \"max disk loading\": \"15 lb / ft square (72 kg / m square)\"}]\n\"\"\"","metadata":{"execution":{"iopub.status.busy":"2024-04-16T13:50:12.658286Z","iopub.execute_input":"2024-04-16T13:50:12.659037Z","iopub.status.idle":"2024-04-16T13:50:12.666395Z","shell.execute_reply.started":"2024-04-16T13:50:12.659000Z","shell.execute_reply":"2024-04-16T13:50:12.665323Z"},"trusted":true},"execution_count":26,"outputs":[]},{"cell_type":"markdown","source":"<a id='62'></a>\n>## Gemini-1.5-Pro-latest","metadata":{}},{"cell_type":"code","source":"# Set up the model\ngeneration_config = {\n \"temperature\": 1,\n \"top_p\": 0.75,\n \"max_output_tokens\": 6000,\n}","metadata":{"execution":{"iopub.status.busy":"2024-04-15T09:41:32.066112Z","iopub.execute_input":"2024-04-15T09:41:32.066601Z","iopub.status.idle":"2024-04-15T09:41:32.073318Z","shell.execute_reply.started":"2024-04-15T09:41:32.066568Z","shell.execute_reply":"2024-04-15T09:41:32.071569Z"},"trusted":true},"execution_count":58,"outputs":[]},{"cell_type":"code","source":"safety_settings = [\n {\n \"category\": \"HARM_CATEGORY_HARASSMENT\",\n \"threshold\": \"BLOCK_MEDIUM_AND_ABOVE\"\n },\n {\n \"category\": \"HARM_CATEGORY_HATE_SPEECH\",\n \"threshold\": \"BLOCK_MEDIUM_AND_ABOVE\"\n },\n {\n \"category\": \"HARM_CATEGORY_SEXUALLY_EXPLICIT\",\n \"threshold\": \"BLOCK_MEDIUM_AND_ABOVE\"\n },\n {\n \"category\": \"HARM_CATEGORY_DANGEROUS_CONTENT\",\n \"threshold\": \"BLOCK_MEDIUM_AND_ABOVE\"\n },\n]","metadata":{"execution":{"iopub.status.busy":"2024-04-15T09:41:34.759048Z","iopub.execute_input":"2024-04-15T09:41:34.759580Z","iopub.status.idle":"2024-04-15T09:41:34.765339Z","shell.execute_reply.started":"2024-04-15T09:41:34.759549Z","shell.execute_reply":"2024-04-15T09:41:34.763792Z"},"trusted":true},"execution_count":59,"outputs":[]},{"cell_type":"code","source":"import google.generativeai as genai\n\nprint('Available base models:', [m.name for m in genai.list_models()])","metadata":{"execution":{"iopub.status.busy":"2024-04-15T09:41:38.212330Z","iopub.execute_input":"2024-04-15T09:41:38.213763Z","iopub.status.idle":"2024-04-15T09:41:39.130091Z","shell.execute_reply.started":"2024-04-15T09:41:38.213716Z","shell.execute_reply":"2024-04-15T09:41:39.128583Z"},"trusted":true},"execution_count":60,"outputs":[{"name":"stdout","text":"Available base models: ['models/chat-bison-001', 'models/text-bison-001', 'models/embedding-gecko-001', 'models/gemini-1.0-pro', 'models/gemini-1.0-pro-001', 'models/gemini-1.0-pro-latest', 'models/gemini-1.0-pro-vision-latest', 'models/gemini-1.5-pro-latest', 'models/gemini-pro', 'models/gemini-pro-vision', 'models/embedding-001', 'models/text-embedding-004', 'models/aqa']\n","output_type":"stream"}]},{"cell_type":"code","source":"model = genai.GenerativeModel(model_name=\"gemini-1.5-pro-latest\",\n generation_config=generation_config,\n safety_settings=safety_settings)","metadata":{"execution":{"iopub.status.busy":"2024-04-15T09:41:42.126280Z","iopub.execute_input":"2024-04-15T09:41:42.127602Z","iopub.status.idle":"2024-04-15T09:41:42.134692Z","shell.execute_reply.started":"2024-04-15T09:41:42.127541Z","shell.execute_reply":"2024-04-15T09:41:42.133090Z"},"trusted":true},"execution_count":61,"outputs":[]},{"cell_type":"code","source":"# Create a copy of the DataFrame\ndf_copy = df_sample.copy()\ndf_copy['pred_response'] = None\n\n# Iterate through each row in the DataFrame with tqdm for progress visualization\nfor i in tqdm(df_copy.index, desc=\"Generating Predictions\", total=len(df_copy)):\n try:\n template = f\"Instruction:\\n{prompt}\\nINPUTDATA:{df_copy.loc[i,'context']}\\nResponse:\\n\"\n response = model.generate_content(template)\n pred_reponse = response.text\n # Update the 'pred_response' column with the generated prediction\n df_copy.loc[i,'pred_response'] = pred_reponse.replace(template,'')\n time.sleep(5)\n except:\n df_copy.loc[i,'pred_response'] = None","metadata":{"execution":{"iopub.status.busy":"2024-04-15T09:41:50.805580Z","iopub.execute_input":"2024-04-15T09:41:50.805916Z","iopub.status.idle":"2024-04-15T10:43:00.527855Z","shell.execute_reply.started":"2024-04-15T09:41:50.805894Z","shell.execute_reply":"2024-04-15T10:43:00.526913Z"},"trusted":true},"execution_count":62,"outputs":[{"name":"stderr","text":"Generating Predictions: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 99/99 [1:01:09<00:00, 37.07s/it]\n","output_type":"stream"}]},{"cell_type":"code","source":"df_copy.dropna(inplace=True)\ndf_copy.shape","metadata":{"execution":{"iopub.status.busy":"2024-04-15T10:44:51.628044Z","iopub.execute_input":"2024-04-15T10:44:51.628446Z","iopub.status.idle":"2024-04-15T10:44:51.642891Z","shell.execute_reply.started":"2024-04-15T10:44:51.628422Z","shell.execute_reply":"2024-04-15T10:44:51.640954Z"},"trusted":true},"execution_count":63,"outputs":[{"execution_count":63,"output_type":"execute_result","data":{"text/plain":"(75, 3)"},"metadata":{}}]},{"cell_type":"code","source":"sum_key=0\nsum_val = 0\ncount_errors = 0\nfor i in df_copy.index:\n\n pred_records = parse_json(df_copy.loc[i,'pred_response'])\n if pred_records==None:\n count_errors +=1\n continue\n true_records = df_copy.loc[i,'answer']\n \n sum_key += average_percentage_key(true_records,pred_records)\n \n sum_val += average_percentage_value(true_records,pred_records)\n# print(i)\n# print(average_percentage_key(true_records,pred_records))","metadata":{"execution":{"iopub.status.busy":"2024-04-15T11:31:28.697391Z","iopub.execute_input":"2024-04-15T11:31:28.697909Z","iopub.status.idle":"2024-04-15T11:31:28.727007Z","shell.execute_reply.started":"2024-04-15T11:31:28.697870Z","shell.execute_reply":"2024-04-15T11:31:28.724017Z"},"trusted":true},"execution_count":120,"outputs":[{"name":"stdout","text":"JSON parsing error: Expecting property name enclosed in double quotes: line 1 column 3 (char 2)\nJSON parsing error: Expecting property name enclosed in double quotes: line 1 column 3 (char 2)\nJSON parsing error: Expecting property name enclosed in double quotes: line 1 column 3 (char 2)\n","output_type":"stream"}]},{"cell_type":"code","source":"print(\"Average Percentage of Predicted Keys:\", sum_key/(len(df_copy)-count_errors))\nprint(\"Average Percentage of Predicted values:\", sum_val/(len(df_copy)-count_errors))","metadata":{"execution":{"iopub.status.busy":"2024-04-15T11:31:30.890876Z","iopub.execute_input":"2024-04-15T11:31:30.891270Z","iopub.status.idle":"2024-04-15T11:31:30.898589Z","shell.execute_reply.started":"2024-04-15T11:31:30.891247Z","shell.execute_reply":"2024-04-15T11:31:30.896750Z"},"trusted":true},"execution_count":121,"outputs":[{"name":"stdout","text":"Average Percentage of Predicted Keys: 0.9818672839506173\nAverage Percentage of Predicted values: 0.9734204067537401\n","output_type":"stream"}]},{"cell_type":"markdown","source":"<a id='63'></a>\n>## Gemini-pro","metadata":{}},{"cell_type":"code","source":"model2 = genai.GenerativeModel(model_name=\"gemini-1.0-pro\",\n generation_config=generation_config,\n safety_settings=safety_settings)","metadata":{"execution":{"iopub.status.busy":"2024-04-15T11:57:35.111910Z","iopub.execute_input":"2024-04-15T11:57:35.113113Z","iopub.status.idle":"2024-04-15T11:57:35.118681Z","shell.execute_reply.started":"2024-04-15T11:57:35.113068Z","shell.execute_reply":"2024-04-15T11:57:35.117351Z"},"trusted":true},"execution_count":123,"outputs":[]},{"cell_type":"code","source":"# Create a copy of the DataFrame\ndf_copy2 = df_sample.copy()\ndf_copy2['pred_response'] = None\n\n# Iterate through each row in the DataFrame with tqdm for progress visualization\nfor i in tqdm(df_copy2.index, desc=\"Generating Predictions\", total=len(df_copy2)):\n try:\n template = f\"Instruction:\\n{prompt}\\nINPUTDATA:{df_copy2.loc[i,'context']}\\nResponse:\\n\"\n response = model2.generate_content(template)\n pred_reponse = response.text\n # Update the 'pred_response' column with the generated prediction\n df_copy2.loc[i,'pred_response'] = pred_reponse.replace(template,'')\n time.sleep(5)\n except:\n df_copy2.loc[i,'pred_response'] = None","metadata":{"execution":{"iopub.status.busy":"2024-04-15T11:57:44.126477Z","iopub.execute_input":"2024-04-15T11:57:44.126863Z","iopub.status.idle":"2024-04-15T12:27:49.160681Z","shell.execute_reply.started":"2024-04-15T11:57:44.126838Z","shell.execute_reply":"2024-04-15T12:27:49.158409Z"},"trusted":true},"execution_count":124,"outputs":[{"name":"stderr","text":"Generating Predictions: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 99/99 [30:05<00:00, 18.23s/it]\n","output_type":"stream"}]},{"cell_type":"code","source":"df_copy2.dropna(inplace=True)\ndf_copy2.shape","metadata":{"execution":{"iopub.status.busy":"2024-04-15T12:30:12.484951Z","iopub.execute_input":"2024-04-15T12:30:12.485662Z","iopub.status.idle":"2024-04-15T12:30:12.506200Z","shell.execute_reply.started":"2024-04-15T12:30:12.485603Z","shell.execute_reply":"2024-04-15T12:30:12.503883Z"},"trusted":true},"execution_count":125,"outputs":[{"execution_count":125,"output_type":"execute_result","data":{"text/plain":"(73, 3)"},"metadata":{}}]},{"cell_type":"code","source":"sum_key=0\nsum_val = 0\ncount_errors = 0\nfor i in df_copy2.index:\n\n pred_records = parse_json(df_copy2.loc[i,'pred_response'])\n if pred_records==None:\n count_errors +=1\n continue\n true_records = df_copy2.loc[i,'answer']\n \n sum_key += average_percentage_key(true_records,pred_records)\n \n sum_val += average_percentage_value(true_records,pred_records))","metadata":{"execution":{"iopub.status.busy":"2024-04-15T12:30:14.843887Z","iopub.execute_input":"2024-04-15T12:30:14.844306Z","iopub.status.idle":"2024-04-15T12:30:14.886908Z","shell.execute_reply.started":"2024-04-15T12:30:14.844279Z","shell.execute_reply":"2024-04-15T12:30:14.883006Z"},"trusted":true},"execution_count":126,"outputs":[{"name":"stdout","text":"JSON parsing error: Expecting value: line 6 column 225 (char 1764)\nJSON parsing error: Expecting value: line 1 column 1 (char 0)\n","output_type":"stream"}]},{"cell_type":"code","source":"print(\"Average Percentage of Predicted Keys:\", sum_key/(len(df_copy2)-count_errors))\nprint(\"Average Percentage of Predicted values:\", sum_val/(len(df_copy2)-count_errors))","metadata":{"execution":{"iopub.status.busy":"2024-04-15T12:30:23.862222Z","iopub.execute_input":"2024-04-15T12:30:23.862742Z","iopub.status.idle":"2024-04-15T12:30:23.874907Z","shell.execute_reply.started":"2024-04-15T12:30:23.862703Z","shell.execute_reply":"2024-04-15T12:30:23.872561Z"},"trusted":true},"execution_count":127,"outputs":[{"name":"stdout","text":"Average Percentage of Predicted Keys: 0.961524703778225\nAverage Percentage of Predicted values: 0.9308108939374473\n","output_type":"stream"}]},{"cell_type":"markdown","source":"<a id='64'></a>\n>## OpenHermes-Mistral","metadata":{}},{"cell_type":"code","source":"df_sample =df.loc[:50]","metadata":{"execution":{"iopub.status.busy":"2024-04-16T13:26:27.142847Z","iopub.execute_input":"2024-04-16T13:26:27.143519Z","iopub.status.idle":"2024-04-16T13:26:27.148692Z","shell.execute_reply.started":"2024-04-16T13:26:27.143484Z","shell.execute_reply":"2024-04-16T13:26:27.147685Z"},"trusted":true},"execution_count":23,"outputs":[]},{"cell_type":"code","source":"base_model_id = \"teknium/OpenHermes-2.5-Mistral-7B\"\nbnb_config = BitsAndBytesConfig(\n load_in_4bit=True,\n bnb_4bit_use_double_quant=True,\n bnb_4bit_quant_type=\"nf4\",\n bnb_4bit_compute_dtype=torch.bfloat16\n)\n\nmodel = AutoModelForCausalLM.from_pretrained(base_model_id, quantization_config=bnb_config, device_map=\"auto\")\ntokenizer = AutoTokenizer.from_pretrained(\"teknium/OpenHermes-2.5-Mistral-7B\")","metadata":{"execution":{"iopub.status.busy":"2024-04-15T14:05:17.856576Z","iopub.execute_input":"2024-04-15T14:05:17.856943Z","iopub.status.idle":"2024-04-15T14:07:31.245528Z","shell.execute_reply.started":"2024-04-15T14:05:17.856914Z","shell.execute_reply":"2024-04-15T14:07:31.244574Z"},"trusted":true},"execution_count":25,"outputs":[{"output_type":"display_data","data":{"text/plain":"config.json: 0%| | 0.00/624 [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"11bec654f2cd4c1dbd1ef0dec60cf37b"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"model.safetensors.index.json: 0%| | 0.00/25.1k [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"5d0663886c3a404da424e6beb8b936e9"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"Downloading shards: 0%| | 0/2 [00:00<?, ?it/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"d4430b55e81443c3adcfc3e9c7d70add"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"model-00001-of-00002.safetensors: 0%| | 0.00/9.94G [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"2a28ec387df44285a003aff36b3f181a"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"model-00002-of-00002.safetensors: 0%| | 0.00/4.54G [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"dfd7e697bb8a4f24af2db7022506a4ea"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"73369406c55947559116c617f70f9e37"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"generation_config.json: 0%| | 0.00/120 [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"323e87c3f7d14c4781c05def9f347ae7"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"tokenizer_config.json: 0%| | 0.00/1.60k [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"806aa0a3004d446a9ca3cc14e31567c3"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"tokenizer.model: 0%| | 0.00/493k [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"f79d5a66a64c426db89bcc09a4f53cee"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"added_tokens.json: 0%| | 0.00/51.0 [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"a2a748d789ab4828a265b301d1dda9dc"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"special_tokens_map.json: 0%| | 0.00/101 [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"bf609414b8eb4670afcbafd09fad2324"}},"metadata":{}},{"name":"stderr","text":"Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\nSpecial tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n","output_type":"stream"}]},{"cell_type":"code","source":"def prediction(context):\n template = f\"Instruction:\\n{prompt}\\nINPUTDATA:{context}\\nResponse:\\n\"\n messages = [\n {\"role\": \"user\", \"content\": template}\n ]\n\n input_ids = tokenizer.apply_chat_template(conversation=messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to('cuda')\n \n model.eval()\n with torch.no_grad():\n output_ids = model.generate(input_ids,max_new_tokens=6000,do_sample=True,temperature=0.000000005,top_k=100,penalty_alpha=0.2,epsilon_cutoff=3e-4,eta_cutoff=3e-4,use_cache=True)\n response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True) \n return response","metadata":{"execution":{"iopub.status.busy":"2024-04-15T14:09:05.424060Z","iopub.execute_input":"2024-04-15T14:09:05.424826Z","iopub.status.idle":"2024-04-15T14:09:05.431905Z","shell.execute_reply.started":"2024-04-15T14:09:05.424787Z","shell.execute_reply":"2024-04-15T14:09:05.430736Z"},"trusted":true},"execution_count":27,"outputs":[]},{"cell_type":"code","source":"# Create a copy of the DataFrame\ndf_copy3 = df_sample.copy()\ndf_copy3['pred_response'] = None\n\n# Iterate through each row in the DataFrame with tqdm for progress visualization\nfor i in tqdm(df_copy3.index, desc=\"Generating Predictions\", total=len(df_copy3)):\n \n context = df_copy3.loc[i,'context']\n res = prediction(context)\n # Update the 'pred_response' column with the generated prediction\n df_copy3.loc[i,'pred_response'] = res","metadata":{"execution":{"iopub.status.busy":"2024-04-15T14:09:50.237374Z","iopub.execute_input":"2024-04-15T14:09:50.237772Z","iopub.status.idle":"2024-04-15T15:49:56.061737Z","shell.execute_reply.started":"2024-04-15T14:09:50.237743Z","shell.execute_reply":"2024-04-15T15:49:56.060756Z"},"trusted":true},"execution_count":29,"outputs":[{"name":"stderr","text":"Generating Predictions: 0%| | 0/50 [00:00<?, ?it/s]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\n2024-04-15 14:10:00.559449: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n2024-04-15 14:10:00.559553: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n2024-04-15 14:10:00.687590: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\nGenerating Predictions: 2%|▏ | 1/50 [01:04<52:34, 64.37s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 4%|▍ | 2/50 [02:20<56:56, 71.18s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 6%|β–Œ | 3/50 [03:55<1:04:30, 82.36s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 8%|β–Š | 4/50 [04:37<50:54, 66.41s/it] The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 10%|β–ˆ | 5/50 [06:13<57:40, 76.91s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 12%|β–ˆβ– | 6/50 [08:15<1:07:37, 92.21s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 14%|β–ˆβ– | 7/50 [09:57<1:08:22, 95.42s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 16%|β–ˆβ–Œ | 8/50 [10:51<57:32, 82.20s/it] The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 18%|β–ˆβ–Š | 9/50 [14:46<1:28:54, 130.11s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 20%|β–ˆβ–ˆ | 10/50 [16:24<1:20:06, 120.16s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 22%|β–ˆβ–ˆβ– | 11/50 [18:45<1:22:12, 126.48s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 24%|β–ˆβ–ˆβ– | 12/50 [20:37<1:17:22, 122.16s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 26%|β–ˆβ–ˆβ–Œ | 13/50 [21:20<1:00:29, 98.09s/it] The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 28%|β–ˆβ–ˆβ–Š | 14/50 [22:24<52:44, 87.91s/it] The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 30%|β–ˆβ–ˆβ–ˆ | 15/50 [23:11<43:57, 75.35s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 32%|β–ˆβ–ˆβ–ˆβ– | 16/50 [24:31<43:37, 76.98s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 34%|β–ˆβ–ˆβ–ˆβ– | 17/50 [25:25<38:28, 69.97s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 36%|β–ˆβ–ˆβ–ˆβ–Œ | 18/50 [30:19<1:13:11, 137.25s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 38%|β–ˆβ–ˆβ–ˆβ–Š | 19/50 [34:44<1:30:46, 175.71s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 20/50 [38:32<1:35:42, 191.40s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 21/50 [40:41<1:23:24, 172.57s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 22/50 [42:38<1:12:43, 155.84s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 23/50 [43:38<57:13, 127.18s/it] The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 24/50 [46:45<1:02:53, 145.12s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 25/50 [48:58<58:55, 141.42s/it] The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 26/50 [52:43<1:06:38, 166.62s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 27/50 [55:18<1:02:34, 163.24s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 28/50 [56:56<52:34, 143.40s/it] The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 29/50 [1:05:44<1:30:37, 258.94s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 30/50 [1:07:01<1:08:03, 204.20s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 31/50 [1:09:19<58:26, 184.54s/it] The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 32/50 [1:11:25<50:01, 166.77s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 33/50 [1:12:12<37:08, 131.09s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 34/50 [1:13:25<30:14, 113.40s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 35/50 [1:14:22<24:08, 96.56s/it] The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 36/50 [1:15:19<19:48, 84.89s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 37/50 [1:16:15<16:28, 76.04s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 38/50 [1:17:36<15:30, 77.58s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 39/50 [1:22:12<25:09, 137.19s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 40/50 [1:23:09<18:51, 113.11s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 41/50 [1:24:11<14:39, 97.70s/it] The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 42/50 [1:25:26<12:06, 90.80s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 43/50 [1:27:05<10:53, 93.37s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 44/50 [1:28:16<08:39, 86.53s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 45/50 [1:30:04<07:45, 93.04s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 46/50 [1:31:49<06:27, 96.80s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 47/50 [1:33:21<04:45, 95.27s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 48/50 [1:34:57<03:11, 95.59s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 49/50 [1:36:04<01:26, 86.92s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 50/50 [1:40:05<00:00, 120.12s/it]\n","output_type":"stream"}]},{"cell_type":"code","source":"sum_key=0\nsum_val = 0\ncount_errors = 0\nfor i in df_copy3.index:\n\n pred_records = parse_json(df_copy3.loc[i,'pred_response'])\n if pred_records==None:\n count_errors +=1\n continue\n true_records = df_copy3.loc[i,'answer']\n \n sum_key += average_percentage_key(true_records,pred_records)\n \n sum_val += average_percentage_value(true_records,pred_records)","metadata":{"execution":{"iopub.status.busy":"2024-04-15T16:26:50.436279Z","iopub.execute_input":"2024-04-15T16:26:50.436672Z","iopub.status.idle":"2024-04-15T16:26:50.455654Z","shell.execute_reply.started":"2024-04-15T16:26:50.436645Z","shell.execute_reply":"2024-04-15T16:26:50.454648Z"},"trusted":true},"execution_count":44,"outputs":[{"name":"stdout","text":"JSON parsing error: Expecting property name enclosed in double quotes: line 76 column 33 (char 2170)\nJSON parsing error: Expecting property name enclosed in double quotes: line 291 column 31 (char 6505)\nJSON parsing error: Expecting property name enclosed in double quotes: line 196 column 36 (char 5917)\nJSON parsing error: Expecting property name enclosed in double quotes: line 182 column 58 (char 7494)\nJSON parsing error: Expecting value: line 122 column 5 (char 2579)\nJSON parsing error: Expecting property name enclosed in double quotes: line 174 column 40 (char 5208)\nJSON parsing error: Unterminated string starting at: line 120 column 41 (char 3709)\nJSON parsing error: Expecting property name enclosed in double quotes: line 211 column 23 (char 6469)\nJSON parsing error: Expecting ',' delimiter: line 143 column 45 (char 4327)\nJSON parsing error: Expecting property name enclosed in double quotes: line 361 column 29 (char 13747)\nJSON parsing error: Expecting property name enclosed in double quotes: line 213 column 64 (char 7286)\n","output_type":"stream"}]},{"cell_type":"code","source":"print(\"Average Percentage of Predicted Keys:\", sum_key/(len(df_copy3)-count_errors))\nprint(\"Average Percentage of Predicted values:\", sum_val/(len(df_copy3)-count_errors))","metadata":{"execution":{"iopub.status.busy":"2024-04-15T15:59:09.997314Z","iopub.execute_input":"2024-04-15T15:59:09.997706Z","iopub.status.idle":"2024-04-15T15:59:10.003374Z","shell.execute_reply.started":"2024-04-15T15:59:09.997677Z","shell.execute_reply":"2024-04-15T15:59:10.002357Z"},"trusted":true},"execution_count":34,"outputs":[{"name":"stdout","text":"Average Percentage of Predicted Keys: 0.8364341214341214\nAverage Percentage of Predicted values: 0.7805710874570523\n","output_type":"stream"}]},{"cell_type":"markdown","source":"<a id='65'></a>\n>## Mistral-7B-Instruct-v0.2","metadata":{}},{"cell_type":"code","source":"base_model_id = \"mistralai/Mistral-7B-Instruct-v0.2\"\nbnb_config = BitsAndBytesConfig(\n load_in_4bit=True,\n bnb_4bit_use_double_quant=True,\n bnb_4bit_quant_type=\"nf4\",\n bnb_4bit_compute_dtype=torch.bfloat16\n)\n\nmodel = AutoModelForCausalLM.from_pretrained(base_model_id, quantization_config=bnb_config, device_map=\"auto\")\ntokenizer = AutoTokenizer.from_pretrained(\"mistralai/Mistral-7B-Instruct-v0.2\")","metadata":{"execution":{"iopub.status.busy":"2024-04-16T09:20:39.889611Z","iopub.execute_input":"2024-04-16T09:20:39.889980Z","iopub.status.idle":"2024-04-16T09:22:41.899623Z","shell.execute_reply.started":"2024-04-16T09:20:39.889952Z","shell.execute_reply":"2024-04-16T09:22:41.898471Z"},"trusted":true},"execution_count":27,"outputs":[{"output_type":"display_data","data":{"text/plain":"config.json: 0%| | 0.00/596 [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"086ace79d4b141348341e9a194cb8d5b"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"model.safetensors.index.json: 0%| | 0.00/25.1k [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"d9c5a113afc549a4a5e92bb7ccc3498a"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"Downloading shards: 0%| | 0/3 [00:00<?, ?it/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"eb6b686dde2b4b4db9365502bf546cd1"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"model-00001-of-00003.safetensors: 0%| | 0.00/4.94G [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"8bd5adcd57454667b8ffd8fb9deba92b"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"model-00002-of-00003.safetensors: 0%| | 0.00/5.00G [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"7276dce9c6b042bf8a036a67d797642b"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"model-00003-of-00003.safetensors: 0%| | 0.00/4.54G [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"6de027ea55d54b02ac2393cf37015d31"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"87d209bbc09b45c9bf16bb20e742bb29"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"generation_config.json: 0%| | 0.00/111 [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"1e77313b70274d049921c01ce261df07"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"tokenizer_config.json: 0%| | 0.00/1.46k [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"0dfde59ac3b2447998a74d09fa2bea2d"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"tokenizer.model: 0%| | 0.00/493k [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"69a3ad9a5a584c578e41ba5d983b5807"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"tokenizer.json: 0%| | 0.00/1.80M [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"2e7d0358a3ec4757804b973d2550c860"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"special_tokens_map.json: 0%| | 0.00/72.0 [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"b7a401cfb29142609648a804949c3f18"}},"metadata":{}}]},{"cell_type":"code","source":"# Create a copy of the DataFrame\ndf_copy4 = df_sample.copy()\ndf_copy4['pred_response'] = None\n\n# Iterate through each row in the DataFrame with tqdm for progress visualization\nfor i in tqdm(df_copy4.index, desc=\"Generating Predictions\", total=len(df_copy4)):\n \n template = f\"Instruction:\\n{prompt}\\nINPUTDATA:{df_copy4.loc[i,'context']}\\nResponse:\\n\"\n model_input = tokenizer(template, return_tensors=\"pt\").to(\"cuda\")\n\n model.eval()\n with torch.no_grad():\n pred_reponse = tokenizer.decode(model.generate(**model_input, max_new_tokens=6000, repetition_penalty=1.15)[0], skip_special_tokens=True)\n \n # Update the 'pred_response' column with the generated prediction\n df_copy4.loc[i,'pred_response'] = pred_reponse.replace(template,'')","metadata":{"execution":{"iopub.status.busy":"2024-04-16T09:22:41.902026Z","iopub.execute_input":"2024-04-16T09:22:41.902464Z","iopub.status.idle":"2024-04-16T10:47:41.450586Z","shell.execute_reply.started":"2024-04-16T09:22:41.902426Z","shell.execute_reply":"2024-04-16T10:47:41.449673Z"},"trusted":true},"execution_count":28,"outputs":[{"name":"stderr","text":"Generating Predictions: 0%| | 0/50 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\n2024-04-16 09:22:52.119186: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n2024-04-16 09:22:52.119495: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n2024-04-16 09:22:52.245389: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\nGenerating Predictions: 2%|▏ | 1/50 [01:00<49:02, 60.04s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 4%|▍ | 2/50 [02:52<1:12:37, 90.78s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 6%|β–Œ | 3/50 [04:04<1:04:37, 82.49s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 8%|β–Š | 4/50 [04:44<50:17, 65.60s/it] Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 10%|β–ˆ | 5/50 [07:12<1:11:19, 95.09s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 12%|β–ˆβ– | 6/50 [08:48<1:10:05, 95.58s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 14%|β–ˆβ– | 7/50 [10:05<1:04:05, 89.43s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 16%|β–ˆβ–Œ | 8/50 [10:49<52:27, 74.94s/it] Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 18%|β–ˆβ–Š | 9/50 [13:59<1:15:44, 110.84s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 20%|β–ˆβ–ˆ | 10/50 [15:32<1:10:19, 105.49s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 22%|β–ˆβ–ˆβ– | 11/50 [17:44<1:13:49, 113.58s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 24%|β–ˆβ–ˆβ– | 12/50 [19:13<1:07:14, 106.18s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 26%|β–ˆβ–ˆβ–Œ | 13/50 [19:56<53:37, 86.95s/it] Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 28%|β–ˆβ–ˆβ–Š | 14/50 [20:55<47:03, 78.44s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 30%|β–ˆβ–ˆβ–ˆ | 15/50 [21:37<39:26, 67.62s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 32%|β–ˆβ–ˆβ–ˆβ– | 16/50 [22:45<38:18, 67.59s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 34%|β–ˆβ–ˆβ–ˆβ– | 17/50 [23:28<33:10, 60.31s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 36%|β–ˆβ–ˆβ–ˆβ–Œ | 18/50 [27:38<1:02:30, 117.20s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 38%|β–ˆβ–ˆβ–ˆβ–Š | 19/50 [31:09<1:15:11, 145.54s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 20/50 [34:06<1:17:25, 154.84s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 21/50 [40:23<1:47:07, 221.63s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 22/50 [42:06<1:26:43, 185.83s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 23/50 [42:53<1:04:54, 144.23s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 24/50 [45:56<1:07:33, 155.89s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 25/50 [49:21<1:11:04, 170.57s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 26/50 [52:25<1:09:51, 174.63s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 27/50 [55:15<1:06:24, 173.23s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 28/50 [56:25<52:14, 142.48s/it] Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 29/50 [57:54<44:13, 126.38s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 30/50 [59:11<37:09, 111.47s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 31/50 [1:01:05<35:34, 112.32s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 32/50 [1:03:05<34:19, 114.41s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 33/50 [1:03:28<24:43, 87.24s/it] Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 34/50 [1:04:27<21:00, 78.78s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 35/50 [1:05:14<17:16, 69.10s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 36/50 [1:06:02<14:38, 62.77s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 37/50 [1:06:42<12:07, 55.99s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 38/50 [1:07:52<12:01, 60.10s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 39/50 [1:12:16<22:13, 121.20s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 40/50 [1:13:25<17:37, 105.76s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 41/50 [1:14:26<13:51, 92.35s/it] Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 42/50 [1:15:26<11:00, 82.53s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 43/50 [1:16:49<09:37, 82.55s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 44/50 [1:17:57<07:49, 78.29s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 45/50 [1:19:17<06:34, 78.92s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 46/50 [1:19:56<04:27, 66.97s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 47/50 [1:21:02<03:19, 66.59s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 48/50 [1:22:30<02:26, 73.12s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 49/50 [1:23:31<01:09, 69.36s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\nGenerating Predictions: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 50/50 [1:24:59<00:00, 101.99s/it]\n","output_type":"stream"}]},{"cell_type":"code","source":"sum_key=0\nsum_val = 0\ncount_errors = 0\nfor i in df_copy4.index:\n\n pred_records = parse_json(df_copy4.loc[i,'pred_response'])\n if pred_records==None:\n count_errors +=1\n continue\n true_records = df_copy4.loc[i,'answer']\n \n sum_key += average_percentage_key(true_records,pred_records)\n \n sum_val += average_percentage_value(true_records,pred_records)","metadata":{"execution":{"iopub.status.busy":"2024-04-16T10:47:41.451808Z","iopub.execute_input":"2024-04-16T10:47:41.452481Z","iopub.status.idle":"2024-04-16T10:47:41.473208Z","shell.execute_reply.started":"2024-04-16T10:47:41.452453Z","shell.execute_reply":"2024-04-16T10:47:41.472299Z"},"trusted":true},"execution_count":29,"outputs":[{"name":"stdout","text":"JSON parsing error: Invalid control character at: line 49 column 63 (char 1352)\nJSON parsing error: Invalid \\escape: line 5 column 7 (char 93)\nJSON parsing error: Invalid \\escape: line 5 column 7 (char 91)\nJSON parsing error: Extra data: line 235 column 27 (char 6488)\nJSON parsing error: Invalid \\escape: line 5 column 5 (char 51)\nJSON parsing error: Invalid \\escape: line 5 column 5 (char 51)\nJSON parsing error: Invalid \\escape: line 6 column 5 (char 96)\nJSON parsing error: Invalid \\escape: line 6 column 5 (char 98)\nJSON parsing error: Invalid \\escape: line 5 column 6 (char 69)\nJSON parsing error: Extra data: line 38 column 1 (char 825)\nJSON parsing error: Invalid \\escape: line 5 column 8 (char 56)\nJSON parsing error: Invalid \\escape: line 5 column 8 (char 56)\n","output_type":"stream"}]},{"cell_type":"code","source":"print(\"Average Percentage of Predicted Keys:\", sum_key/(len(df_copy4)-count_errors))\nprint(\"Average Percentage of Predicted values:\", sum_val/(len(df_copy4)-count_errors))","metadata":{"execution":{"iopub.status.busy":"2024-04-16T10:47:41.474953Z","iopub.execute_input":"2024-04-16T10:47:41.475257Z","iopub.status.idle":"2024-04-16T10:47:41.498035Z","shell.execute_reply.started":"2024-04-16T10:47:41.475232Z","shell.execute_reply":"2024-04-16T10:47:41.496984Z"},"trusted":true},"execution_count":30,"outputs":[{"name":"stdout","text":"Average Percentage of Predicted Keys: 0.7718774213103867\nAverage Percentage of Predicted values: 0.6724906101783668\n","output_type":"stream"}]}]}
Benchmark2/leaderboard.csv CHANGED
@@ -1,8 +1,6 @@
1
  Model, Percentage of keys, Percentage of values,Average time (s),Notebook link,License,Link
2
- OpenHermes-2.5-Mistral-7B,96.17,57.22,114.01,https://huggingface.co/spaces/Effyis/LLms-Benchmark/blob/main/Benchmark2/benchmark-2-gemini-pro-and-openhermes-mistral.ipynb,Apache-2.0,https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B
3
- Gemini Pro,98.65,66.76,19.72,https://huggingface.co/spaces/Effyis/LLms-Benchmark/blob/main/Benchmark2/benchmark-2-gemini-pro-and-openhermes-mistral.ipynb,Google,https://blog.google/technology/ai/gemini-api-developers-cloud/
4
- Mistral-7B-Instruct-v0.2,95.55,53.47,101.29,https://huggingface.co/spaces/Effyis/LLms-Benchmark/blob/main/Benchmark2/benchmark-2-mistral-7b-instruct-v0-2.ipynb,Apache-2.0,https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
5
- Llama-2-7b-chat-hf,57.63,7.5,359.56,https://huggingface.co/spaces/Effyis/LLms-Benchmark/blob/main/Benchmark2/Benchmark 2: LLama-2-7B and LLama-2-13B.ipynb,Llama 2 Community,https://huggingface.co/meta-llama/Llama-2-7b-chat-hf
6
- Llama-2-13b-chat-hf,66.95,36.88,476.71,https://huggingface.co/spaces/Effyis/LLms-Benchmark/blob/main/Benchmark2/Benchmark 2: LLama-2-7B and LLama-2-13B.ipynb,Llama 2 Community,https://huggingface.co/meta-llama/Llama-2-13b-chat-hf
7
- Openchat-3.5-1210,75.41,46.46,131.16,https://huggingface.co/spaces/Effyis/LLms-Benchmark/blob/main/Benchmark2/benchmark-2-openchat-3-5-1210-and-dolphin-2-2-1.ipynb,Apache-2.0,https://huggingface.co/openchat/openchat-3.5-1210
8
  Dolphin-2.2.1-Mistral-7b,94.98,60.47,99.36,https://huggingface.co/spaces/Effyis/LLms-Benchmark/blob/main/Benchmark2/benchmark-2-openchat-3-5-1210-and-dolphin-2-2-1.ipynb,Apache-2.0,https://huggingface.co/cognitivecomputations/dolphin-2.2.1-mistral-7b
 
1
  Model, Percentage of keys, Percentage of values,Average time (s),Notebook link,License,Link
2
+ OpenHermes-2.5-Mistral-7B,83.64,78.05,120.12,https://huggingface.co/spaces/Effyis/LLms-Benchmark/blob/main/Benchmark2/gemini-pro-openhermes-mistral-and-mistral-7b.ipynb,Apache-2.0,https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B
3
+ Gemini-1.5-Pro-latest,98.18,97.34,37.07,https://huggingface.co/spaces/Effyis/LLms-Benchmark/blob/main/Benchmark2/gemini-pro-openhermes-mistral-and-mistral-7b.ipynb,Google,https://blog.google/technology/ai/gemini-api-developers-cloud/
4
+ Gemini Pro,96.15,93.08,18.23,https://huggingface.co/spaces/Effyis/LLms-Benchmark/blob/main/Benchmark2/gemini-pro-openhermes-mistral-and-mistral-7b.ipynb,Google,https://blog.google/technology/ai/gemini-api-developers-cloud/
5
+ Mistral-7B-Instruct-v0.2,77.18,67.24,101.99,https://huggingface.co/spaces/Effyis/LLms-Benchmark/blob/main/Benchmark2/gemini-pro-openhermes-mistral-and-mistral-7b.ipynb,Apache-2.0,https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
 
 
6
  Dolphin-2.2.1-Mistral-7b,94.98,60.47,99.36,https://huggingface.co/spaces/Effyis/LLms-Benchmark/blob/main/Benchmark2/benchmark-2-openchat-3-5-1210-and-dolphin-2-2-1.ipynb,Apache-2.0,https://huggingface.co/cognitivecomputations/dolphin-2.2.1-mistral-7b