{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "## Getting to Main directory\n", "import os\n", "os.chdir(\"../\")" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# loading secret key\n", "import os\n", "from dotenv import load_dotenv\n", "load_dotenv()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "e:\\projects\\AI research assistant\\venv\\lib\\site-packages\\tqdm\\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", " from .autonotebook import tqdm as notebook_tqdm\n" ] } ], "source": [ "from llama_index.core import VectorStoreIndex\n", "from llama_index.core import ServiceContext\n", "from llama_index.core import StorageContext, load_index_from_storage\n", "from llama_index.embeddings.gemini import GeminiEmbedding\n", "from llama_index.llms.gemini import Gemini\n", "import google.generativeai as genai\n", "from llama_index.core import VectorStoreIndex,SimpleDirectoryReader\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "gemini_api_key=os.getenv(\"GEMINI_API_KEY\")\n", "pinecone_api_key=os.getenv(\"PINECONE_API_KEY\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data ingestion - Taking pdf documents and Cleaning and Transforming Data into vector index" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "documents=SimpleDirectoryReader(\"Data\").load_data()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "34" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(documents)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[Document(id_='2c29fa85-a1fa-479c-8cdc-6c366889be7e', embedding=None, metadata={'page_label': '1', 'file_name': 'peft.pdf', 'file_path': 'e:\\\\projects\\\\AI research assistant\\\\Data\\\\peft.pdf', 'file_type': 'application/pdf', 'file_size': 562785, 'creation_date': '2024-03-30', 'last_modified_date': '2024-03-30'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, text='Few-Shot Parameter-Efficient Fine-Tuning is Better\\nand Cheaper than In-Context Learning\\nHaokun Liu∗Derek Tam∗Mohammed Muqeeth∗\\nJay Mohta Tenghao Huang Mohit Bansal Colin Raffel\\nDepartment of Computer Science\\nUniversity of North Carolina at Chapel Hill\\n{haokunl,dtredsox,muqeeth,craffel}@cs.unc.edu\\nAbstract\\nFew-shot in-context learning (ICL) enables pre-trained language models to per-\\nform a previously-unseen task without any gradient-based training by feeding a\\nsmall number of training examples as part of the input. ICL incurs substantial\\ncomputational, memory, and storage costs because it involves processing all of the\\ntraining examples every time a prediction is made. Parameter-efficient fine-tuning\\n(PEFT) (e.g. adapter modules, prompt tuning, sparse update methods, etc.) offers\\nan alternative paradigm where a small set of parameters are trained to enable a\\nmodel to perform the new task. In this paper, we rigorously compare few-shot\\nICL and PEFT and demonstrate that the latter offers better accuracy as well as\\ndramatically lower computational costs. Along the way, we introduce a new PEFT\\nmethod called (IA)3that scales activations by learned vectors, attaining stronger\\nperformance while only introducing a relatively tiny amount of new parameters.\\nWe also propose a simple recipe based on the T0 model [ 1] called T-Few that\\ncan be applied to new tasks without task-specific tuning or modifications. We\\nvalidate the effectiveness of T-Few on completely unseen tasks by applying it to\\nthe RAFT benchmark [ 2], attaining super-human performance for the first time\\nand outperforming the state-of-the-art by 6% absolute. All of the code used in our\\nexperiments is publicly available.1\\n1 Introduction\\nPre-trained language models have become a cornerstone of natural language processing, thanks\\nto the fact that they can dramatically improve data efficiency on tasks of interest – i.e., using a\\npre-trained language model for initialization often produces better results with less labeled data. A\\nhistorically common approach has been to use the pre-trained model’s parameters for initialization\\nbefore performing gradient-based fine-tuning on a downstream task of interest. While fine-tuning\\nhas produced many state-of-the-art results [ 1], it results in a model that is specialized for a single\\ntask with an entirely new set of parameter values, which can become impractical when fine-tuning a\\nmodel on many downstream tasks.\\nAn alternative approach popularized by [ 3,4] isin-context learning (ICL), which induces a model\\nto perform a downstream task by inputting prompted examples. Few-shot prompting converts a\\nsmall collection of input-target pairs into (typically) human-understandable instructions and examples\\n[3,4], along with a single unlabeled example for which a prediction is desired. Notably, ICL requires\\nno gradient-based training and therefore allows a single model to immediately perform a wide variety\\nof tasks. Performing ICL therefore solely relies on the capabilities that a model learned during\\npre-training. These characteristics have led to a great deal of recent interest in ICL methods [5–10].\\n∗Equal contribution.\\n1https://github.com/r-three/t-few\\nPreprint. Under review.arXiv:2205.05638v2 [cs.LG] 26 Aug 2022', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'),\n", " Document(id_='8c598b5b-d100-4d71-8273-7fb26ddac626', embedding=None, metadata={'page_label': '2', 'file_name': 'peft.pdf', 'file_path': 'e:\\\\projects\\\\AI research assistant\\\\Data\\\\peft.pdf', 'file_type': 'application/pdf', 'file_size': 562785, 'creation_date': '2024-03-30', 'last_modified_date': '2024-03-30'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, text=\"VKQ\\nsoftmax \\nDenseNonlinearity Dense\\nT0Susie loves her grandma's \\nbanana bread. Susie called \\nher grandma and asked her to \\nsend some. Grandma lived \\nvery far away. A week passed \\nand grandma surprised Susie \\nby coming to visit. What is \\na possible continuation for \\nthe story? Susie was so happy. \\nSusie was upset. \\n(IA)3Losses used in T-FewFigure 1: Diagram of (IA)3and the loss terms used in the T-Few recipe. Left: (IA)3introduces the\\nlearned vectors lk,lv, andlffwhich respectively rescale (via element-wise multiplication, visualized as\\n⊙) the keys and values in attention mechanisms and the inner activations in position-wise feed-forward\\nnetworks. Right: In addition to a standard cross-entropy loss LLM, we introduce an unlikelihood loss\\nLULthat lowers the probability of incorrect outputs and a length-normalized loss LLNthat applies a\\nstandard softmax cross-entropy loss to length-normalized log-probabilities of all output choices.\\nDespite the practical benefits of ICL, it has several major drawbacks. First, processing all prompted\\ninput-target pairs every time the model makes a prediction incurs significant compute costs. Second,\\nICL typically produces inferior performance compared to fine-tuning [ 4]. Finally, the exact formatting\\nof the prompt (including the wording [ 11] and ordering of examples [ 12]) can have significant and\\nunpredictable impact on the model’s performance, far beyond inter-run variation of fine-tuning.\\nRecent work has also demonstrated that ICL can perform well even when provided with incorrect\\nlabels, raising questions as to how much learning is taking place at all [9].\\nAn additional paradigm for enabling a model to perform a new task with minimal updates is parameter-\\nefficient fine-tuning (PEFT), where a pre-trained model is fine-tuned by only updating a small number\\nof added or selected parameters. Recent methods have matched the performance of fine-tuning the\\nfull model while only updating or adding a small fraction (e.g. 0.01%) of the full model’s parameters\\n[13,14]. Furthermore, certain PEFT methods allow mixed-task batches where different examples in\\na batch are processed differently [14], making both PEFT and ICL viable for multitask models.\\nWhile the benefits of PEFT address some shortcomings of fine-tuning (when compared to ICL), there\\nhas been relatively little focus on whether PEFT methods work well when very little labeled data\\nis available. Our primary goal in this paper is to close this gap by proposing a recipe – i.e., a model, a\\nPEFT method, and a fixed set of hyperparameters – that attains strong performance on novel, unseen\\ntasks while only updating a tiny fraction of the model’s parameters. Specifically, we base our approach\\non the T0 model [ 1], a variant of T5 [ 15] fine-tuned on a multitask mixture of prompted datasets.\\nTo improve performance on classification and multiple-choice tasks, we add unlikelihood [ 16,17]\\nand length normalization-based [ 4] loss terms. In addition, we develop (IA)3, a PEFT method\\nthat multiplies intermediate activations by learned vectors. (IA)3attains stronger performance than\\nfull-model fine-tuning while updating up to 10,000 ×fewer parameters. Finally, we demonstrate\\nthe benefits of pre-training the (IA)3parameters before fine-tuning [ 18,19]. Our overall recipe,\\nwhich we dub “ T-Few ”, performs significantly better than ICL (even against 16×larger models)\\nand outperforms humans for the first time on the real-world few-shot learning benchmark RAFT [ 2]\\nwhile requiring dramatically less compute and allowing for mixed-task batches during inference. To\\nfacilitate the use of T-Few on new problems and future research on PEFT, we release our code.1\\nAfter providing background on ICL and PEFT in the following section, we discuss the design of\\nT-Few in section 3. In section 4, we present experiments comparing T-Few to strong ICL baselines.\\nFinally, we discuss related work in appendix B and conclude in section 5.\\n2 Background\\nIn this section, we provide am verview of ICL and PEFT with a focus on characterizing the com-\\nputation, memory, and on-disk storage costs of making a prediction. Real-world costs depend on\\nimplementation and hardware, so we report costs in terms of FLOPs for computation and bytes for\\nmemory and storage, respectively. Additional related work is discussed in appendix B.\\n2.1 Few-shot in-context learning (ICL)\\nICL [ 3,4] aims to induce a model to perform a task by feeding in concatenated and prompted\\ninput-target examples (called “shots”) along with an unlabeled query example. Taking the cycled\\n2\", start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'),\n", " Document(id_='5a924469-c6ec-4239-b2c7-ef70f8be9c65', embedding=None, metadata={'page_label': '3', 'file_name': 'peft.pdf', 'file_path': 'e:\\\\projects\\\\AI research assistant\\\\Data\\\\peft.pdf', 'file_type': 'application/pdf', 'file_size': 562785, 'creation_date': '2024-03-30', 'last_modified_date': '2024-03-30'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, text='letter task from Brown et al. [4]as an example, a 4-shot input or context would be “ Please\\nunscramble the letters into a word, and write that word: asinoc = casino,\\nyfrogg = froggy, plesim = simple, iggestb = biggest, astedro = ”, for which the\\ndesired output would be “ roasted ”. ICL induces an autoregressive language model to perform\\nthis task by feeding in the context and sampling from the model. For classification tasks, each\\nlabel is associated with a string (e.g. “ positive ” and “ negative ” for sentiment analysis) and\\na label is assigned by choosing the label string that the model assigns the highest probability to.\\nFor multiple-choice tasks (e.g. choosing between Npossible answers to a question), the model’s\\nprediction is similarly determined by determining which choice is assigned the highest probability.\\nThe primary advantage of ICL is that it enables a single model to perform many tasks immediately\\nwithout fine-tuning. This also enables mixed-task batches , where different examples in a batch of data\\ncorrespond to different tasks by using different contexts in the input. ICL is also typically performed\\nwith only a limited number of labeled examples – called few-shot learning – making it data-efficient.\\nDespite these advantages, ICL comes with significant practical drawbacks: First, making a prediction\\nis dramatically more expensive because the model needs to process all of the in-context labeled\\nexamples. Specifically, ignoring the quadratic complexity of self-attention operations in Transformer\\nlanguage models (which are typically small compared to the costs of the rest of the model [ 20]),\\nprocessing the ktraining examples for k-shot ICL increases the computational cost by approximately\\nk+ 1times compared to processing the unlabeled example alone. Memory costs similarly scale\\napproximately linearly with k, though during inference the memory costs are typically dominated by\\nstoring the model’s parameters. Separately, there is a small amount of on-disk storage required for\\nstoring the in-context examples for a given task. For example, storing 32examples for a task where\\nthe prompted input and target for each example is 512tokens long would require about 66kilobytes\\nof storage on disk ( 32examples×512tokens×32bits).\\nBeyond the aforementioned costs, ICL also exhibits unintuitive behavior. Zhao et al. [12] showed\\nthat the ordering of examples in the context heavily influences the model’s predictions. Min et al.\\n[9]showed that ICL can still perform well even if the labels of the in-context examples are swapped\\n(i.e. made incorrect), which raises questions about whether ICL is really “learning” from the labeled\\nexamples.\\nVarious approaches have been proposed to mitigate these issues. One way to decrease computational\\ncosts is to cache the key and value vectors for in-context examples. This is possible because decoder-\\nonly Transformer language models have a causal masking pattern, so the model’s activations for the\\ncontext do not do not depend on the unlabeled example. In an extreme case, 32-shot ICL with 512\\ntokens per in-context example would result in over 144 gigabytes of cached key and value vectors for\\nthe GPT-3 model ( 32examples×512tokens×96layers×12288 d model×32bitseach for the key\\nand value vectors). Separately, Min et al. [21] proposed ensemble ICL , where instead of using the\\noutput probability from concatenating the ktraining examples, the output probabilities of the model\\non each training example (i.e. 1-shot ICL for each of the kexamples) are multiplied together. This\\nlowers the non-parameter memory cost by a factor of k/2but increases the computational cost by\\na factor of 2. In terms of task performance, Min et al. [21] find that ensemble ICL outperforms the\\nstandard concatenative variant.\\n2.2 Parameter-efficient fine-tuning\\nWhile standard fine-tuning updates all parameters of the pre-trained model, it has been demonstrated\\nthat it is possible to instead update or add a relatively small number of parameters. Early methods\\nproposed adding adapters [22–24], which are small trainable feed-forward networks inserted between\\nthe layers in the fixed pre-trained model. Since then, various sophisticated PEFT methods have been\\nproposed, including methods that choose a sparse subset of parameters to train [ 25,26], produce\\nlow-rank updates [ 13], perform optimization in a lower-dimensional subspace [ 27], add low-rank\\nadapters using hypercomplex multiplication [ 28], and more. Relatedly, prompt tuning [14] and prefix\\ntuning [29] concatenate learned continuous embeddings to the model’s input or activations to induce\\nit to perform a task; this can be seen as a PEFT method [ 30]. State-of-the-art PEFT methods can\\nmatch the performance of fine-tuning all of the model’s parameters while updating only a tiny fraction\\n(e.g. 0.01%) of the model’s parameters.\\nPEFT drastically reduces the memory and storage requirements for training and saving the model. In\\naddition, certain PEFT methods straightforwardly allow mixed-task batches – for example, prompt\\n3', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'),\n", " Document(id_='7cb24228-2e95-4b96-ab9a-62b691aa13a9', embedding=None, metadata={'page_label': '4', 'file_name': 'peft.pdf', 'file_path': 'e:\\\\projects\\\\AI research assistant\\\\Data\\\\peft.pdf', 'file_type': 'application/pdf', 'file_size': 562785, 'creation_date': '2024-03-30', 'last_modified_date': '2024-03-30'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, text='tuning enables a single model to perform many tasks simply by concatenating different prompt\\nembeddings to each example in the batch [ 14]. On the other hand, PEFT methods that re-parameterize\\nthe model (e.g. [ 27,13]) are costly or onerous for mixed-task batches. Separately, different PEFT\\nmethods increase the computation and memory required to perform inference by different amounts.\\nFor example, adapters effectively add additional (small) layers to the model, resulting in small but\\nnon-negligible increases in computational costs and memory. An additional cost incurred by PEFT\\nis the cost of fine-tuning itself, which must be performed once and is then amortized as the model\\nis used for inference. However, we will show that PEFT can be dramatically more computationally\\nefficient when considering both fine-tuning and inference while achieving better accuracy than ICL.\\n3 Designing the T-Few Recipe\\nGiven that PEFT allows a model to be adapted to a new task with relatively small storage requirements\\nand computational cost, we argue that PEFT presents a promising alternative to ICL. Our goal\\nis therefore to develop a recipe that allows a model to attain high accuracy on new tasks with\\nlimited labeled examples while allowing mixed-task batches during inference and incurring minimal\\ncomputational and storage costs. By recipe , we mean a specific model and hyperparameter setting\\nthat provides strong performance on any new task without manual tuning or per-task adjustments.\\nIn this way, we can ensure that our approach is a realistic option in few-shot settings where limited\\nlabeled data is available for evaluation [31, 32].\\n3.1 Model and Datasets\\nAs a first step, we must choose a pre-trained model. Ideally, the model should attain high performance\\non new tasks after fine-tuning on a limited number of labeled examples. In preliminary experiments\\napplying PEFT methods to different pre-trained models, we attained the best performance with T0\\n[1]. T0 is based on T5 [ 15], an encoder-decoder Transformer model [ 33] that was pre-trained via a\\nmasked language modeling objective [ 34] on a large corpus of unlabeled text data. T0 was created by\\nfine-tuning T5 on a multitask mixture of datasets in order to enable zero-shot generalization, i.e. the\\nability to perform tasks without any additional gradient-based training. Examples in the datasets used\\nto train T0 were prompted by applying the prompt templates from the Public Pool of Prompts (P3\\n[35]), which convert each example in each dataset to a prompted text-to-text format where each label\\ncorresponds to a different string. For brevity, we omit a detailed description of T0 and T5; interested\\nreaders can refer to Sanh et al. [1]and Raffel et al. [15]. T0 was released in three billion and eleven\\nbillion parameter variants, referred to as “T0-3B” and simply “T0” respectively. In this section (where\\nour goal is to design the T-Few recipe through extensive experimentation), we use T0-3B to reduce\\ncomputational costs. For all models and experiments, we use Hugging Face Transformers [36].\\nWhile T0 was designed for zero-shot generalization, we will demonstrate that it also attains strong\\nperformance after fine-tuning with only a few labeled examples. To test T0’s generalization, Sanh et al.\\n[1]chose a set of tasks (and corresponding datasets) to hold out from the multitask training mixture\\n– specifically, sentence completion (COPA [ 37], H-SWAG [ 38], and Story Cloze [ 39] datasets),\\nnatural language inference (ANLI [ 40], CB [ 41], and RTE [ 42]), coreference resolution (WSC [ 43]\\nand Winogrande [ 44]), and word sense disambiguation (WiC [ 45]). Evaluation of generalization\\ncapabilities can then be straightforwardly done by measuring performance on these held-out datasets.\\nWe also will later test T-Few ’s abilities in the RAFT benchmark [ 2] in section 4.3, a collection of\\nunseen “real-world” few-shot tasks with no validation set and a held-out test set. ANLI, WiC, WSC is\\nlicensed under a Creative Commons License. Winogrande is licnsed under an Apache license. COPA\\nis under a BSD-2 Clause license. We could not find the license of RTE and CB but they are part of\\nSuperGLUE which mentions the datasets are allowed for use in research context.\\nTo ease comparison, we use the same number of few-shot training examples for each dataset as Brown\\net al. [4], which varies from 20 to 70. Unfortunately, the few-shot dataset subsets used by Brown\\net al. [4]have not been publicly disclosed. To allow for a more robust comparison, we therefore\\nconstructed five few-shot datasets by sampling subsets with different seeds and report the median\\nand interquartile range. We prompt examples from each dataset using the prompt templates from P3\\nBach et al. [35], using a randomly-sampled prompt template for each example at each step. Unless\\notherwise stated, we train our model for 1K steps with a batch size of 8 and report performance at the\\nend of training.\\nFor evaluation, we use “rank classification”, where the model’s log-probabilities for all possible label\\nstrings are ranked and the model’s prediction is considered correct if the highest-ranked choice is the\\n4', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'),\n", " Document(id_='1b5f1995-0803-4905-b1bf-9dbff0e1e936', embedding=None, metadata={'page_label': '5', 'file_name': 'peft.pdf', 'file_path': 'e:\\\\projects\\\\AI research assistant\\\\Data\\\\peft.pdf', 'file_type': 'application/pdf', 'file_size': 562785, 'creation_date': '2024-03-30', 'last_modified_date': '2024-03-30'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, text='correct answer. Rank classification evaluation is compatible with both classification and multiple-\\nchoice tasks. Since model performance can vary significantly depending on the prompt template used,\\nwe report the median accuracy across all prompt templates from P3 and across few-shot data subsets\\nfor each dataset. For all datasets, we report the accuracy on the test set or validation set when the test\\nlabels are not public (e.g. SuperGLUE datasets). In the main text, we report median accuracy across\\nthe nine datasets mentioned above. Detailed results on each dataset are provided in the appendices.\\n3.2 Unlikelihood Training and Length Normalization\\nBefore investigating PEFT methods, we first explore two additional loss terms to improve the\\nperformance of few-shot fine-tuning of language models. Language models are normally trained\\nwith cross-entropy loss LLM=−1\\nT∑\\ntlogp(yt|x,y str:\n", " \"\"\"\n", " Remove unwanted characters and patterns in text input.\n", "\n", " :param content: Text input.\n", " \n", " :return: Cleaned version of original text input.\n", " \"\"\"\n", "\n", " # Fix hyphenated words broken by newline\n", " content = re.sub(r'(\\w+)-\\n(\\w+)', r'\\1\\2', content)\n", "\n", " # Remove specific unwanted patterns and characters\n", " unwanted_patterns = [\n", " \"\\\\n\", \" —\", \"——————————\", \"—————————\", \"—————\",\n", " r'\\\\u[\\dA-Fa-f]{4}', r'\\uf075', r'\\uf0b7'\n", " ]\n", " for pattern in unwanted_patterns:\n", " content = re.sub(pattern, \"\", content)\n", "\n", " # Fix improperly spaced hyphenated words and normalize whitespace\n", " content = re.sub(r'(\\w)\\s*-\\s*(\\w)', r'\\1-\\2', content)\n", " content = re.sub(r'\\s+', ' ', content)\n", "\n", " return content\n", "\n", "# Call function\n", "cleaned_docs = []\n", "for d in documents: \n", " cleaned_text = clean_up_text(d.text)\n", " d.text = cleaned_text\n", " cleaned_docs.append(d)\n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Few-Shot Parameter-Efficient Fine-Tuning is Betterand Cheaper than In-Context LearningHaokun Liu∗Derek Tam∗Mohammed Muqeeth∗Jay Mohta Tenghao Huang Mohit Bansal Colin RaffelDepartment of Computer ScienceUniversity of North Carolina at Chapel Hill{haokunl,dtredsox,muqeeth,craffel}@cs.unc.eduAbstractFew-shot in-context learning (ICL) enables pre-trained language models to perform a previously-unseen task without any gradient-based training by feeding asmall number of training examples as part of the input. ICL incurs substantialcomputational, memory, and storage costs because it involves processing all of thetraining examples every time a prediction is made. Parameter-efficient fine-tuning(PEFT) (e.g. adapter modules, prompt tuning, sparse update methods, etc.) offersan alternative paradigm where a small set of parameters are trained to enable amodel to perform the new task. In this paper, we rigorously compare few-shotICL and PEFT and demonstrate that the latter offers better accuracy as well asdramatically lower computational costs. Along the way, we introduce a new PEFTmethod called (IA)3that scales activations by learned vectors, attaining strongerperformance while only introducing a relatively tiny amount of new parameters.We also propose a simple recipe based on the T0 model [ 1] called T-Few thatcan be applied to new tasks without task-specific tuning or modifications. Wevalidate the effectiveness of T-Few on completely unseen tasks by applying it tothe RAFT benchmark [ 2], attaining super-human performance for the first timeand outperforming the state-of-the-art by 6% absolute. All of the code used in ourexperiments is publicly available.11 IntroductionPre-trained language models have become a cornerstone of natural language processing, thanksto the fact that they can dramatically improve data efficiency on tasks of interest – i.e., using apre-trained language model for initialization often produces better results with less labeled data. Ahistorically common approach has been to use the pre-trained model’s parameters for initializationbefore performing gradient-based fine-tuning on a downstream task of interest. While fine-tuninghas produced many state-of-the-art results [ 1], it results in a model that is specialized for a singletask with an entirely new set of parameter values, which can become impractical when fine-tuning amodel on many downstream tasks.An alternative approach popularized by [ 3,4] isin-context learning (ICL), which induces a modelto perform a downstream task by inputting prompted examples. Few-shot prompting converts asmall collection of input-target pairs into (typically) human-understandable instructions and examples[3,4], along with a single unlabeled example for which a prediction is desired. Notably, ICL requiresno gradient-based training and therefore allows a single model to immediately perform a wide varietyof tasks. Performing ICL therefore solely relies on the capabilities that a model learned duringpre-training. These characteristics have led to a great deal of recent interest in ICL methods [5–10].∗Equal contribution.1https://github.com/r-three/t-fewPreprint. Under review.arXiv:2205.05638v2 [cs.LG] 26 Aug 2022'" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Inspect output\n", "cleaned_docs[0].get_content()\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'page_label': '1',\n", " 'file_name': 'peft.pdf',\n", " 'file_path': 'e:\\\\projects\\\\AI research assistant\\\\Data\\\\peft.pdf',\n", " 'file_type': 'application/pdf',\n", " 'file_size': 562785,\n", " 'creation_date': '2024-03-30',\n", " 'last_modified_date': '2024-03-30'}" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cleaned_docs[0].metadata\n" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "34" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(documents)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Configuring Gemini model and GeminiEmbedding" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "genai.configure(api_key=gemini_api_key)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "gemini_embed_model = GeminiEmbedding(model_name=\"models/embedding-001\")" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "# Setting Tempreture to 0.3 for getting low risk results\n", "\n", "model = Gemini(models=\"gemini-pro\",api_key=gemini_api_key,temperature=0.3)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "from llama_index.core.node_parser import SemanticSplitterNodeParser\n", "from llama_index.core.ingestion import IngestionPipeline\n", "\n", "# This will be the model we use both for Node parsing and for vectorization\n", "embed_model =gemini_embed_model\n", "\n", "# Define the initial pipeline\n", "pipeline = IngestionPipeline(\n", " transformations=[\n", " SemanticSplitterNodeParser(\n", " buffer_size=1,\n", " breakpoint_percentile_threshold=95, \n", " embed_model=embed_model,\n", " ),\n", " embed_model,\n", " ],\n", " )\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Setting up the `Settings` module to have the informantion about our llm and embedding models and also chunk size distribution of document files" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### As LLMPredictor is depriciated, we are using Settings.llm to define our base LLM Model" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "from llama_index.core import Settings\n", "from llama_index.core.node_parser import SentenceSplitter\n", "\n", "\n", "Settings.llm = model\n", "Settings.embed_model = gemini_embed_model\n", "Settings.node_parser = SentenceSplitter(chunk_size=512, chunk_overlap=20)\n", "Settings.num_output = 512\n", "Settings.context_window = 3900" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using Pinecone as our vector database taking the index to save them " ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "from llama_index.vector_stores.pinecone import PineconeVectorStore" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "from pinecone import Pinecone\n", "\n", "pc = Pinecone(api_key=pinecone_api_key)\n", "pinecone_index = pc.Index(\"ai-research-assistant\") # `ai-research-assistant` is the index name" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Nowing indexing and upserting indexes to pinecone" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "vector_store = PineconeVectorStore(pinecone_index=pinecone_index)\n", "storage_context = StorageContext.from_defaults(vector_store=vector_store)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "%%capture \n", "# Our pipeline with the addition of our PineconeVectorStore\n", "pipeline = IngestionPipeline(\n", " transformations=[\n", " SemanticSplitterNodeParser(\n", " buffer_size=1,\n", " breakpoint_percentile_threshold=95, \n", " embed_model=embed_model,\n", " ),\n", " embed_model,\n", " ],\n", " vector_store=vector_store # Our new addition\n", " )\n", "\n", "# Now we run our pipeline!\n", "pipeline.run(documents=cleaned_docs)\n" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'dimension': 768,\n", " 'index_fullness': 0.00176,\n", " 'namespaces': {'': {'vector_count': 176}},\n", " 'total_vector_count': 176}" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pinecone_index.describe_index_stats()\n", "\n", "# >>> {'dimension': 1536,\n", "# >>> 'index_fullness': 0.0,\n", "# >>> 'namespaces': {'': {'vector_count': 46}},\n", "# >>> 'total_vector_count': 46}\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Simply querying from the index" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['If your candidate doesn’t know the answer to the above questions and you’re hiring for a ML intern position, then they’re obviously not a great fit.', 'correct answer. Rank classification evaluation is compatible with both classification and multiplechoice tasks. Since model performance can vary significantly depending on the prompt template used,we report the median accuracy across all prompt templates from P3 and across few-shot data subsetsfor each dataset. For all datasets, we report the accuracy on the test set or validation set when the testlabels are not public (e.g. SuperGLUE datasets). ', 'correct answer. Rank classification evaluation is compatible with both classification and multiplechoice tasks. Since model performance can vary significantly depending on the prompt template used,we report the median accuracy across all prompt templates from P3 and across few-shot data subsetsfor each dataset. For all datasets, we report the accuracy on the test set or validation set when the testlabels are not public (e.g. SuperGLUE datasets). ', 'Interview Questions to Ask a ML intern| Xobin [Downloaded]8 Prepared and Curated by Xobin Team', \"Interview Questions to Ask a ML intern| Xobin [Downloaded]1Interview Questions to Ask a ML intern| Xobin [Downloaded]We at Xobin reached out to over 70+ Hiring teams to curate the best interview questions. W e didn't stop there. We went ahead to understand what type of answers dif ferentiated the top candidate from the rest. \"]\n" ] } ], "source": [ "# from llama_index import VectorStoreIndex\n", "from llama_index.core.retrievers import VectorIndexRetriever\n", "\n", "# Instantiate VectorStoreIndex object from your vector_store object\n", "vector_index = VectorStoreIndex.from_vector_store(vector_store=vector_store)\n", "\n", "# Grab 5 search results\n", "retriever = VectorIndexRetriever(index=vector_index, similarity_top_k=5)\n", "\n", "# Query vector DB\n", "answer = retriever.retrieve('generate a summary based on the information you have')\n", "\n", "# Inspect results\n", "print([i.get_content() for i in answer])\n", "\n", "# >>> ['some relevant search result 1', 'some relevant search result 1'...]\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Adding proper prompt templates for the query engine" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Empty Response'" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from llama_index.core.query_engine import RetrieverQueryEngine\n", "from llama_index.core import PromptTemplate\n", "from llama_index.core.postprocessor import SimilarityPostprocessor\n", "\n", "\n", "# Pass in your retriever from above, which is configured to return the top 5 results\n", "query_engine = RetrieverQueryEngine(retriever=retriever)\n", "\n", "postprocessor=SimilarityPostprocessor(similarity_cutoff=0.70)\n", "\n", "query_engine=RetrieverQueryEngine(retriever=retriever,\n", " node_postprocessors=[postprocessor])\n", "\n", "# Now you query:\n", "llm_query = query_engine.query('generate a summary based on the information you have')\n", "# llm_query = query_engine.query('tell me about ML questions')\n", "\n", "llm_query.response" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "def get_full_prompt_template(cur_instr: str, prompt_tmpl):\n", " tmpl_str = prompt_tmpl.get_template()\n", " new_tmpl_str = cur_instr + \"\\n\" + tmpl_str\n", " new_tmpl = PromptTemplate(new_tmpl_str)\n", " return new_tmpl" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "QA_PROMPT_KEY = \"response_synthesizer:text_qa_template\"\n", "\n", "# get the base qa prompt (without any instruction prefix)\n", "base_qa_prompt = query_engine.get_prompts()[QA_PROMPT_KEY]\n", "\n", "\n", "initial_instr = \"\"\"\\\n", "You are a QA assistant specifically designed to help in RESEARCH WORK as a RESEARCH ASSISTANT.\n", "Context information is below. Given the context information and not prior knowledge, \\\n", "answer the query. \\\n", "\"\"\"\n", "\n", "# this is the \"initial\" prompt template\n", "# implicitly used in the first stage of the loop during prompt optimization\n", "# here we explicitly capture it so we can use it for evaluation\n", "old_qa_prompt = get_full_prompt_template(initial_instr, base_qa_prompt)\n" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "PromptTemplate(metadata={'prompt_type': }, template_vars=['context_str', 'query_str'], kwargs={}, output_parser=None, template_var_mappings=None, function_mappings=None, template='You are a QA assistant specifically designed to help in RESEARCH WORK as a RESEARCH ASSISTANT.\\nContext information is below. Given the context information and not prior knowledge, answer the query. \\nContext information is below.\\n---------------------\\n{context_str}\\n---------------------\\nGiven the context information and not prior knowledge, answer the query.\\nQuery: {query_str}\\nAnswer: ')" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "old_qa_prompt" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "I apologize, but the provided context does not contain sufficient information to generate a meaningful summary.\n" ] } ], "source": [ "# Use the custom prompt when querying\n", "query_engine = vector_index.as_query_engine(text_qa_template=old_qa_prompt)\n", "response = query_engine.query(\"generate a summary based on the information you have\")\n", "# response = query_engine.query('tell me about Few-shot in-context learning')\n", "print(response)\n" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Final Response: I apologize, but the provided context does not contain\n", "sufficient information to generate a meaningful summary.\n", "______________________________________________________________________\n", "Source Node 1/2\n", "Node ID: 444a52bb-805b-4698-9e41-6817dcfe1fa1\n", "Similarity: 0.481024384\n", "Text: If your candidate doesn’t know the answer to the above questions\n", "and you’re hiring for a ML intern position, then they’re obviously not\n", "a great fit.\n", "______________________________________________________________________\n", "Source Node 2/2\n", "Node ID: 6c5664e4-5e90-491d-8b80-857852760395\n", "Similarity: 0.492449284\n", "Text: correct answer. Rank classification evaluation is compatible with\n", "both classification and multiplechoice tasks. Since model performance\n", "can vary significantly depending on the prompt template used,we report\n", "the median accuracy across all prompt templates from P3 and across\n", "few-shot data subsetsfor each dataset. For all datasets, we report the\n", "accur...\n", "I apologize, but the provided context does not contain sufficient information to generate a meaningful summary.\n" ] } ], "source": [ "from llama_index.core.response.pprint_utils import pprint_response\n", "pprint_response(response,show_source=True)\n", "print(response)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Correct Prompts and the Right question is important to get the desired respose" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### \"generate a summary based on the information you have about peft\" is a better query in this case than 'tell me about the T-Few Recipe' as in the later case the souce node is fetching irrelevant data like the table content which is apperently more similar through vector search" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The provided context does not mention anything about recipes, so I cannot answer this question from the provided context.\n" ] } ], "source": [ "# Use the custom prompt when querying\n", "query_engine = vector_index.as_query_engine(text_qa_template=old_qa_prompt)\n", "# response = query_engine.query(\"generate a summary based on the information you have about peft\")\n", "response = query_engine.query('tell me about t few recipe')\n", "print(response)\n" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Final Response: The provided context does not mention anything about\n", "recipes, so I cannot answer this question from the provided context.\n", "______________________________________________________________________\n", "Source Node 1/2\n", "Node ID: 401a275e-9caa-44dd-bbab-3b46c526adc7\n", "Similarity: 0.553759813\n", "Text: Interview Questions to Ask a ML intern| Xobin [Downloaded]8\n", "Prepared and Curated by Xobin Team\n", "______________________________________________________________________\n", "Source Node 2/2\n", "Node ID: dda36b9b-cba5-4017-9baf-06326c029b8e\n", "Similarity: 0.575024486\n", "Text: Interview Questions to Ask a ML intern| Xobin\n", "[Downloaded]1Interview Questions to Ask a ML intern| Xobin\n", "[Downloaded]We at Xobin reached out to over 70+ Hiring teams to curate\n", "the best interview questions. W e didn't stop there. We went ahead to\n", "understand what type of answers dif ferentiated the top candidate from\n", "the rest.\n", "The provided context does not mention anything about recipes, so I cannot answer this question from the provided context.\n" ] } ], "source": [ "from llama_index.core.response.pprint_utils import pprint_response\n", "pprint_response(response,show_source=True)\n", "print(response)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "base", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.18" } }, "nbformat": 4, "nbformat_minor": 2 }