{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "1a542da3-c20b-4085-98d4-e2c538e92aee",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import logging\n",
    "\n",
    "logging.basicConfig()\n",
    "logging.getLogger('metaanalyser').setLevel(level=logging.DEBUG)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "8706a6a5-1080-4ee7-81c7-7e1bd97674f2",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from langchain.chat_models import ChatOpenAI\n",
    "\n",
    "llm = ChatOpenAI(temperature=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "a5cb6f02-dd07-4299-8106-6fc6d0850f55",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:metaanalyser.chains.sr:Searching `llm agent OR llm tool integration` on Google Scholar.\n",
      "INFO:metaanalyser.paper.paper:Looking for `llm agent OR llm tool integration` on Google Scholar, offset: 0...\n",
      "INFO:metaanalyser.paper.paper:Looking for `llm agent OR llm tool integration` on Google Scholar, offset: 10...\n",
      "INFO:metaanalyser.paper.paper:Looking for `llm agent OR llm tool integration` on Google Scholar, offset: 20...\n",
      "INFO:metaanalyser.paper.paper:Looking for `llm agent OR llm tool integration` on Google Scholar, offset: 30...\n",
      "INFO:metaanalyser.paper.paper:Collecting details...\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new SRChain chain...\u001b[0m\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "b66c7e4eddb54cf887b9f29f8ab7ede8",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "0it [00:00, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:metaanalyser.chains.sr:Writing an overview of the paper.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new SROverviewChain chain...\u001b[0m\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:metaanalyser.paper.paper:Number of papers: 5, number of tokens: 1385, text: Title: More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to App...\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3mSystem: You are a research scientist and intereseted in Computation and Language, Artificial Intelligence and Machine Learning. You are working on writing a systematic review regarding \"llm agent OR llm tool integration\".\n",
      "Human: Write an overview of the systematic review based on the summary of the following list of paper abstracts.\n",
      "\n",
      "-----\n",
      "Title: More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models\n",
      "citation_id: 1\n",
      "Summry: We are currently witnessing dramatic advances in the capabilities of Large Language Models (LLMs). They are already being adopted in practice and integrated into many systems, including integrated development environments (IDEs) and search engines. The functionalities of current LLMs can be modulated via natural language prompts, while their exact internal functionality remains implicit and unassessable. This property, which makes them adaptable to even unseen tasks, might also make them susceptible to targeted adversarial prompting. Recently, several ways to misalign LLMs using Prompt Injection (PI) attacks have been introduced. In such attacks, an adversary can prompt the LLM to produce malicious content or override the original instructions and the employed filtering schemes. Recent work showed that these attacks are hard to mitigate, as state-of-the-art LLMs are instruction-following. So far, these attacks assumed that the adversary is directly prompting the LLM.   In this work, we show that augmenting LLMs with retrieval and API calling capabilities (so-called Application-Integrated LLMs) induces a whole new set of attack vectors. These LLMs might process poisoned content retrieved from the Web that contains malicious prompts pre-injected and selected by adversaries. We demonstrate that an attacker can indirectly perform such PI attacks. Based on this key insight, we systematically analyze the resulting threat landscape of Application-Integrated LLMs and discuss a variety of new attack vectors. To demonstrate the practical viability of our attacks, we implemented specific demonstrations of the proposed attacks within synthetic applications. In summary, our work calls for an urgent evaluation of current mitigation techniques and an investigation of whether new techniques are needed to defend LLMs against these threats.\n",
      "\n",
      "\n",
      "Title: ChemCrow: Augmenting large-language models with chemistry tools\n",
      "citation_id: 2\n",
      "Summry: Large-language models (LLMs) have recently shown strong performance in tasks across domains, but struggle with chemistry-related problems. Moreover, these models lack access to external knowledge sources, limiting their usefulness in scientific applications. In this study, we introduce ChemCrow, an LLM chemistry agent designed to accomplish tasks across organic synthesis, drug discovery, and materials design. By integrating 13 expert-designed tools, ChemCrow augments the LLM performance in chemistry, and new capabilities emerge. Our evaluation, including both LLM and expert human assessments, demonstrates ChemCrow's effectiveness in automating a diverse set of chemical tasks. Surprisingly, we find that GPT-4 as an evaluator cannot distinguish between clearly wrong GPT-4 completions and GPT-4 + ChemCrow performance. There is a significant risk of misuse of tools like ChemCrow and we discuss their potential harms. Employed responsibly, ChemCrow not only aids expert chemists and lowers barriers for non-experts, but also fosters scientific advancement by bridging the gap between experimental and computational chemistry.\n",
      "\n",
      "\n",
      "Title: Enhancing Large Language Models with Climate Resources\n",
      "citation_id: 3\n",
      "Summry: Large language models (LLMs) have significantly transformed the landscape of artificial intelligence by demonstrating their ability in generating human-like text across diverse topics. However, despite their impressive capabilities, LLMs lack recent information and often employ imprecise language, which can be detrimental in domains where accuracy is crucial, such as climate change. In this study, we make use of recent ideas to harness the potential of LLMs by viewing them as agents that access multiple sources, including databases containing recent and precise information about organizations, institutions, and companies. We demonstrate the effectiveness of our method through a prototype agent that retrieves emission data from ClimateWatch (https://www.climatewatchdata.org/) and leverages general Google search. By integrating these resources with LLMs, our approach overcomes the limitations associated with imprecise language and delivers more reliable and accurate information in the critical domain of climate change. This work paves the way for future advancements in LLMs and their application in domains where precision is of paramount importance.\n",
      "\n",
      "\n",
      "Title: Gpts are gpts: An early look at the labor market impact potential of large language models\n",
      "citation_id: 4\n",
      "Summry: We investigate the potential implications of large language models (LLMs), such as Generative Pre-trained Transformers (GPTs), on the U.S. labor market, focusing on the increased capabilities arising from LLM-powered software compared to LLMs on their own. Using a new rubric, we assess occupations based on their alignment with LLM capabilities, integrating both human expertise and GPT-4 classifications. Our findings reveal that around 80% of the U.S. workforce could have at least 10% of their work tasks affected by the introduction of LLMs, while approximately 19% of workers may see at least 50% of their tasks impacted. We do not make predictions about the development or adoption timeline of such LLMs. The projected effects span all wage levels, with higher-income jobs potentially facing greater exposure to LLM capabilities and LLM-powered software. Significantly, these impacts are not restricted to industries with higher recent productivity growth. Our analysis suggests that, with access to an LLM, about 15% of all worker tasks in the US could be completed significantly faster at the same level of quality. When incorporating software and tooling built on top of LLMs, this share increases to between 47 and 56% of all tasks. This finding implies that LLM-powered software will have a substantial effect on scaling the economic impacts of the underlying models. We conclude that LLMs such as GPTs exhibit traits of general-purpose technologies, indicating that they could have considerable economic, social, and policy implications.\n",
      "\n",
      "\n",
      "Title: Samm (segment any medical model): A 3d slicer integration to sam\n",
      "citation_id: 5\n",
      "Summry: The Segment Anything Model (SAM) is a new image segmentation tool trained with the largest segmentation dataset at this time. The model has demonstrated that it can create high-quality masks for image segmentation with good promptability and generalizability. However, the performance of the model on medical images requires further validation. To assist with the development, assessment, and utilization of SAM on medical images, we introduce Segment Any Medical Model (SAMM), an extension of SAM on 3D Slicer, a widely-used open-source image processing and visualization software that has been extensively used in the medical imaging community. This open-source extension to 3D Slicer and its demonstrations are posted on GitHub (https://github.com/bingogome/samm). SAMM achieves 0.6-second latency of a complete cycle and can infer image masks in nearly real-time.\n",
      "-----\n",
      "\n",
      "This overview should serve as a compass for you as you construct the outline of the systematic review and write down its details.\n",
      "\n",
      "Assuming that the readers of this systematic review will not be familiar with the field. In order to make it easy for readers who are not familiar with this field to understand, list the main points briefly (approximately 30 words maximum) based on the following points.\n",
      "\n",
      "- Motivation for this field and the problem this field are trying to solve\n",
      "- Historical background of this field\n",
      "- Future development of this field\n",
      "\n",
      "Based on these main points, provide an overview of the systematic review regarding llm agent OR llm tool integration you will write.\n",
      "\n",
      "Finally, write the title of the systematic review you are going to write based on this overview.\n",
      "\n",
      "The output should be formatted as a JSON instance that conforms to the JSON schema below.\n",
      "\n",
      "As an example, for the schema {\"properties\": {\"foo\": {\"title\": \"Foo\", \"description\": \"a list of strings\", \"type\": \"array\", \"items\": {\"type\": \"string\"}}}, \"required\": [\"foo\"]}}\n",
      "the object {\"foo\": [\"bar\", \"baz\"]} is a well-formatted instance of the schema. The object {\"properties\": {\"foo\": [\"bar\", \"baz\"]}} is not well-formatted.\n",
      "\n",
      "Here is the output schema:\n",
      "```\n",
      "{\"properties\": {\"title\": {\"title\": \"Title\", \"description\": \"title of the systematic review\", \"type\": \"string\"}, \"main_points\": {\"title\": \"Main Points\", \"description\": \"main points that make up the systematic review\", \"type\": \"array\", \"items\": {\"type\": \"string\"}}, \"overview\": {\"title\": \"Overview\", \"description\": \"overview of the systematic review\", \"type\": \"string\"}}, \"required\": [\"title\", \"main_points\", \"overview\"]}\n",
      "```\u001b[0m\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:metaanalyser.chains.base:LLM utilization: {'token_usage': {'prompt_tokens': 1832, 'completion_tokens': 208, 'total_tokens': 2040}, 'model_name': 'gpt-3.5-turbo'}\n",
      "INFO:metaanalyser.chains.sr:Building the outline of the paper.\n",
      "INFO:metaanalyser.paper.paper:Number of papers: 5, number of tokens: 1385, text: Title: More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to App...\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n",
      "\n",
      "\n",
      "\u001b[1m> Entering new SROutlintChain chain...\u001b[0m\n",
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3mSystem: You are a research scientist and intereseted in Computation and Language, Artificial Intelligence and Machine Learning. You are working on writing a systematic review regarding \"llm agent OR llm tool integration\".\n",
      "Human: Build an outline of the systematic review regarding \"llm agent OR llm tool integration\" based on the following list of paper abstracts.\n",
      "\n",
      "-----\n",
      "Title: More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models\n",
      "citation_id: 1\n",
      "Summry: We are currently witnessing dramatic advances in the capabilities of Large Language Models (LLMs). They are already being adopted in practice and integrated into many systems, including integrated development environments (IDEs) and search engines. The functionalities of current LLMs can be modulated via natural language prompts, while their exact internal functionality remains implicit and unassessable. This property, which makes them adaptable to even unseen tasks, might also make them susceptible to targeted adversarial prompting. Recently, several ways to misalign LLMs using Prompt Injection (PI) attacks have been introduced. In such attacks, an adversary can prompt the LLM to produce malicious content or override the original instructions and the employed filtering schemes. Recent work showed that these attacks are hard to mitigate, as state-of-the-art LLMs are instruction-following. So far, these attacks assumed that the adversary is directly prompting the LLM.   In this work, we show that augmenting LLMs with retrieval and API calling capabilities (so-called Application-Integrated LLMs) induces a whole new set of attack vectors. These LLMs might process poisoned content retrieved from the Web that contains malicious prompts pre-injected and selected by adversaries. We demonstrate that an attacker can indirectly perform such PI attacks. Based on this key insight, we systematically analyze the resulting threat landscape of Application-Integrated LLMs and discuss a variety of new attack vectors. To demonstrate the practical viability of our attacks, we implemented specific demonstrations of the proposed attacks within synthetic applications. In summary, our work calls for an urgent evaluation of current mitigation techniques and an investigation of whether new techniques are needed to defend LLMs against these threats.\n",
      "\n",
      "\n",
      "Title: ChemCrow: Augmenting large-language models with chemistry tools\n",
      "citation_id: 2\n",
      "Summry: Large-language models (LLMs) have recently shown strong performance in tasks across domains, but struggle with chemistry-related problems. Moreover, these models lack access to external knowledge sources, limiting their usefulness in scientific applications. In this study, we introduce ChemCrow, an LLM chemistry agent designed to accomplish tasks across organic synthesis, drug discovery, and materials design. By integrating 13 expert-designed tools, ChemCrow augments the LLM performance in chemistry, and new capabilities emerge. Our evaluation, including both LLM and expert human assessments, demonstrates ChemCrow's effectiveness in automating a diverse set of chemical tasks. Surprisingly, we find that GPT-4 as an evaluator cannot distinguish between clearly wrong GPT-4 completions and GPT-4 + ChemCrow performance. There is a significant risk of misuse of tools like ChemCrow and we discuss their potential harms. Employed responsibly, ChemCrow not only aids expert chemists and lowers barriers for non-experts, but also fosters scientific advancement by bridging the gap between experimental and computational chemistry.\n",
      "\n",
      "\n",
      "Title: Enhancing Large Language Models with Climate Resources\n",
      "citation_id: 3\n",
      "Summry: Large language models (LLMs) have significantly transformed the landscape of artificial intelligence by demonstrating their ability in generating human-like text across diverse topics. However, despite their impressive capabilities, LLMs lack recent information and often employ imprecise language, which can be detrimental in domains where accuracy is crucial, such as climate change. In this study, we make use of recent ideas to harness the potential of LLMs by viewing them as agents that access multiple sources, including databases containing recent and precise information about organizations, institutions, and companies. We demonstrate the effectiveness of our method through a prototype agent that retrieves emission data from ClimateWatch (https://www.climatewatchdata.org/) and leverages general Google search. By integrating these resources with LLMs, our approach overcomes the limitations associated with imprecise language and delivers more reliable and accurate information in the critical domain of climate change. This work paves the way for future advancements in LLMs and their application in domains where precision is of paramount importance.\n",
      "\n",
      "\n",
      "Title: Gpts are gpts: An early look at the labor market impact potential of large language models\n",
      "citation_id: 4\n",
      "Summry: We investigate the potential implications of large language models (LLMs), such as Generative Pre-trained Transformers (GPTs), on the U.S. labor market, focusing on the increased capabilities arising from LLM-powered software compared to LLMs on their own. Using a new rubric, we assess occupations based on their alignment with LLM capabilities, integrating both human expertise and GPT-4 classifications. Our findings reveal that around 80% of the U.S. workforce could have at least 10% of their work tasks affected by the introduction of LLMs, while approximately 19% of workers may see at least 50% of their tasks impacted. We do not make predictions about the development or adoption timeline of such LLMs. The projected effects span all wage levels, with higher-income jobs potentially facing greater exposure to LLM capabilities and LLM-powered software. Significantly, these impacts are not restricted to industries with higher recent productivity growth. Our analysis suggests that, with access to an LLM, about 15% of all worker tasks in the US could be completed significantly faster at the same level of quality. When incorporating software and tooling built on top of LLMs, this share increases to between 47 and 56% of all tasks. This finding implies that LLM-powered software will have a substantial effect on scaling the economic impacts of the underlying models. We conclude that LLMs such as GPTs exhibit traits of general-purpose technologies, indicating that they could have considerable economic, social, and policy implications.\n",
      "\n",
      "\n",
      "Title: Samm (segment any medical model): A 3d slicer integration to sam\n",
      "citation_id: 5\n",
      "Summry: The Segment Anything Model (SAM) is a new image segmentation tool trained with the largest segmentation dataset at this time. The model has demonstrated that it can create high-quality masks for image segmentation with good promptability and generalizability. However, the performance of the model on medical images requires further validation. To assist with the development, assessment, and utilization of SAM on medical images, we introduce Segment Any Medical Model (SAMM), an extension of SAM on 3D Slicer, a widely-used open-source image processing and visualization software that has been extensively used in the medical imaging community. This open-source extension to 3D Slicer and its demonstrations are posted on GitHub (https://github.com/bingogome/samm). SAMM achieves 0.6-second latency of a complete cycle and can infer image masks in nearly real-time.\n",
      "-----\n",
      "\n",
      "The following is an overview of this systematic review. Build the outline of the systematic review according to this overview.\n",
      "\n",
      "-----\n",
      "Title: A Systematic Review of Large Language Model Agent and Tool Integration\n",
      "Points:\n",
      "  - Large Language Models (LLMs) are being integrated into various systems\n",
      "  - LLMs lack access to external knowledge sources, limiting their usefulness in scientific applications\n",
      "  - LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications\n",
      "  - The systematic review will analyze the current state of LLM agent and tool integration and discuss future developments\n",
      "Overview: This systematic review analyzes the current state of Large Language Model (LLM) agent and tool integration. LLMs are being integrated into various systems, but they lack access to external knowledge sources, limiting their usefulness in scientific applications. LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications. The review will discuss the historical background of this field, the motivation for this field and the problem it is trying to solve, and future developments in this field.\n",
      "-----\n",
      "\n",
      "Device each section of this outline by citing abstracts from the papers.\n",
      "The beginning of element of the sections should by titled \"Introduction\" and last element of the sections should be titled \"Conclusion\".\n",
      "It is preferred that sections be divided into more child sections. Each section can have up to two child sections.\n",
      "\n",
      "The output should be formatted as a JSON instance that conforms to the JSON schema below.\n",
      "\n",
      "As an example, for the schema {\"properties\": {\"foo\": {\"title\": \"Foo\", \"description\": \"a list of strings\", \"type\": \"array\", \"items\": {\"type\": \"string\"}}}, \"required\": [\"foo\"]}}\n",
      "the object {\"foo\": [\"bar\", \"baz\"]} is a well-formatted instance of the schema. The object {\"properties\": {\"foo\": [\"bar\", \"baz\"]}} is not well-formatted.\n",
      "\n",
      "Here is the output schema:\n",
      "```\n",
      "{\"properties\": {\"sections\": {\"title\": \"Sections\", \"description\": \"sections that make up this systematic review\", \"type\": \"array\", \"items\": {\"$ref\": \"#/definitions/Section\"}}, \"citations_ids\": {\"title\": \"Citations Ids\", \"description\": \"citation ids to all paper abstracts cited in this paper\", \"type\": \"array\", \"items\": {\"type\": \"integer\"}}}, \"required\": [\"sections\", \"citations_ids\"], \"definitions\": {\"Section\": {\"title\": \"Section\", \"type\": \"object\", \"properties\": {\"title\": {\"title\": \"Title\", \"description\": \"title of this section\", \"type\": \"string\"}, \"children\": {\"title\": \"Children\", \"description\": \"subsections of this section\", \"type\": \"array\", \"items\": {\"$ref\": \"#/definitions/Section\"}}, \"description\": {\"title\": \"Description\", \"description\": \"brief description of this section (approximately 30 words maximum)\", \"type\": \"string\"}, \"citation_ids\": {\"title\": \"Citation Ids\", \"description\": \"citation ids to a paper abstract that this section cites\", \"type\": \"array\", \"items\": {\"type\": \"integer\"}}}, \"required\": [\"title\", \"description\", \"citation_ids\"]}}}\n",
      "```\u001b[0m\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:metaanalyser.chains.base:LLM utilization: {'token_usage': {'prompt_tokens': 2137, 'completion_tokens': 669, 'total_tokens': 2806}, 'model_name': 'gpt-3.5-turbo'}\n",
      "INFO:metaanalyser.chains.sr:Creating vector store.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:metaanalyser.paper.vectorstore:Creating vector store, tiktoken_encoder_model_name='gpt-3.5-turbo', chunk_size=150, chunk_overlap=10\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "bfb1b15e9f934e0aa9e953d8eaf4a3f2",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/10 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "WARNING:langchain.text_splitter:Created a chunk of size 154, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 250, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 184, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 231, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 191, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 205, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 155, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 184, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 269, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 311, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 159, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 158, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 177, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 153, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 382, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 330, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 163, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 301, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 552, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 999, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 188, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 826, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 205, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 158, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 205, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 208, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 250, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 174, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 232, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 163, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 152, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 272, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 249, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 208, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 184, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 186, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 171, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 230, which is longer than the specified 150\n",
      "WARNING:langchain.text_splitter:Created a chunk of size 165, which is longer than the specified 150\n",
      "INFO:metaanalyser.paper.vectorstore:Vector store is created from 10 papers, document size=1124\n",
      "INFO:metaanalyser.chains.sr:Writing sections: [1 / 12]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new SRSectionChain chain...\u001b[0m\n",
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3mSystem: You are a research scientist and intereseted in Computation and Language, Artificial Intelligence and Machine Learning. You are working on writing a systematic review regarding \"llm agent OR llm tool integration\".\n",
      "\n",
      "The outline of the systematic review is as follows:\n",
      "\n",
      "-----\n",
      "Title: A Systematic Review of Large Language Model Agent and Tool Integration\n",
      "1. Introduction: Large Language Models (LLMs) are being integrated into various systems. This section provides an overview of the field and the problem it is trying to solve.\n",
      "    1. Background: This subsection provides a brief history of LLMs and their development.\n",
      "    2. Motivation: This subsection discusses the motivation for integrating LLMs into various systems.\n",
      "    3. Problem Statement: This subsection defines the problem that LLM agent and tool integration is trying to solve.\n",
      "2. LLM Agent Integration: This section discusses the integration of LLMs into various systems as agents.\n",
      "    1. Application-Integrated LLMs: This subsection discusses the threat landscape of Application-Integrated LLMs and the new attack vectors they introduce.\n",
      "    2. Chemistry Tools: This subsection discusses the integration of chemistry tools with LLMs to augment their performance in chemistry-related problems.\n",
      "    3. Climate Resources: This subsection discusses the integration of climate resources with LLMs to overcome the limitations associated with imprecise language and deliver more reliable and accurate information in the critical domain of climate change.\n",
      "3. LLM Tool Integration: This section discusses the integration of LLMs into various systems as tools.\n",
      "    1. Labor Market Impact: This subsection investigates the potential implications of LLMs on the U.S. labor market and assesses occupations based on their alignment with LLM capabilities.\n",
      "    2. Medical Imaging: This subsection discusses the development of an image segmentation tool trained with the largest segmentation dataset and its extension on 3D Slicer.\n",
      "4. Conclusion: This section summarizes the findings of the systematic review and discusses future developments in the field.\n",
      "Human: Write the \"Introduction\" section of this systematic review based on the following list of snippets or abstracts of relative papers.\n",
      "\n",
      "-----\n",
      "Title: More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models\n",
      "citation_id: 1\n",
      "Text: We are currently witnessing dramatic advances in the capabilities of Large Language Models (LLMs). They are already being adopted in practice and integrated into many systems, including integrated development environments (IDEs) and search engines. The functionalities of current LLMs can be modulated via natural language prompts, while their exact internal functionality remains implicit and unassessable. This property, which makes them adaptable to even unseen tasks, might also make them susceptible to targeted adversarial prompting. Recently, several ways to misalign LLMs using Prompt Injection (PI) attacks have been introduced. In such attacks, an adversary can prompt the LLM to produce malicious content or override the original instructions and the employed filtering schemes. Recent work showed that these attacks are hard to mitigate, as state-of-the-art LLMs are instruction-following. So far, these attacks assumed that the adversary is directly prompting the LLM.   In this work, we show that augmenting LLMs with retrieval and API calling capabilities (so-called Application-Integrated LLMs) induces a whole new set of attack vectors. These LLMs might process poisoned content retrieved from the Web that contains malicious prompts pre-injected and selected by adversaries. We demonstrate that an attacker can indirectly perform such PI attacks. Based on this key insight, we systematically analyze the resulting threat landscape of Application-Integrated LLMs and discuss a variety of new attack vectors. To demonstrate the practical viability of our attacks, we implemented specific demonstrations of the proposed attacks within synthetic applications. In summary, our work calls for an urgent evaluation of current mitigation techniques and an investigation of whether new techniques are needed to defend LLMs against these threats.\n",
      "\n",
      "\n",
      "Title: ChemCrow: Augmenting large-language models with chemistry tools\n",
      "citation_id: 2\n",
      "Text: Large-language models (LLMs) have recently shown strong performance in tasks across domains, but struggle with chemistry-related problems. Moreover, these models lack access to external knowledge sources, limiting their usefulness in scientific applications. In this study, we introduce ChemCrow, an LLM chemistry agent designed to accomplish tasks across organic synthesis, drug discovery, and materials design. By integrating 13 expert-designed tools, ChemCrow augments the LLM performance in chemistry, and new capabilities emerge. Our evaluation, including both LLM and expert human assessments, demonstrates ChemCrow's effectiveness in automating a diverse set of chemical tasks. Surprisingly, we find that GPT-4 as an evaluator cannot distinguish between clearly wrong GPT-4 completions and GPT-4 + ChemCrow performance. There is a significant risk of misuse of tools like ChemCrow and we discuss their potential harms. Employed responsibly, ChemCrow not only aids expert chemists and lowers barriers for non-experts, but also fosters scientific advancement by bridging the gap between experimental and computational chemistry.\n",
      "\n",
      "\n",
      "Title: Enhancing Large Language Models with Climate Resources\n",
      "citation_id: 3\n",
      "Text: Large language models (LLMs) have significantly transformed the landscape of artificial intelligence by demonstrating their ability in generating human-like text across diverse topics. However, despite their impressive capabilities, LLMs lack recent information and often employ imprecise language, which can be detrimental in domains where accuracy is crucial, such as climate change. In this study, we make use of recent ideas to harness the potential of LLMs by viewing them as agents that access multiple sources, including databases containing recent and precise information about organizations, institutions, and companies. We demonstrate the effectiveness of our method through a prototype agent that retrieves emission data from ClimateWatch (https://www.climatewatchdata.org/) and leverages general Google search. By integrating these resources with LLMs, our approach overcomes the limitations associated with imprecise language and delivers more reliable and accurate information in the critical domain of climate change. This work paves the way for future advancements in LLMs and their application in domains where precision is of paramount importance.\n",
      "\n",
      "\n",
      "Title: Gpts are gpts: An early look at the labor market impact potential of large language models\n",
      "citation_id: 4\n",
      "Text: We investigate the potential implications of large language models (LLMs), such as Generative Pre-trained Transformers (GPTs), on the U.S. labor market, focusing on the increased capabilities arising from LLM-powered software compared to LLMs on their own. Using a new rubric, we assess occupations based on their alignment with LLM capabilities, integrating both human expertise and GPT-4 classifications. Our findings reveal that around 80% of the U.S. workforce could have at least 10% of their work tasks affected by the introduction of LLMs, while approximately 19% of workers may see at least 50% of their tasks impacted. We do not make predictions about the development or adoption timeline of such LLMs. The projected effects span all wage levels, with higher-income jobs potentially facing greater exposure to LLM capabilities and LLM-powered software. Significantly, these impacts are not restricted to industries with higher recent productivity growth. Our analysis suggests that, with access to an LLM, about 15% of all worker tasks in the US could be completed significantly faster at the same level of quality. When incorporating software and tooling built on top of LLMs, this share increases to between 47 and 56% of all tasks. This finding implies that LLM-powered software will have a substantial effect on scaling the economic impacts of the underlying models. We conclude that LLMs such as GPTs exhibit traits of general-purpose technologies, indicating that they could have considerable economic, social, and policy implications.\n",
      "\n",
      "\n",
      "Title: Low-code LLM: Visual Programming over LLMs\n",
      "citation_id: 8\n",
      "Text: 4 Related Works  4.1 Large Language Models  Large language models (LLMs) have emerged as a prominent area of research in recent years, thanks to advances in deep learning, natural language processing, and powerful computational infrastructure.  LLMs are deﬁned by their signiﬁcant model size, capacity to understand and generate human-like text, and ability to generalize across different tasks and applications.    Recent LLMs, such as GPT-4 and ChatGPT, have made impressive strides in generating more coherent and contextually relevant responses.\n",
      "\n",
      "\n",
      "Title: More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models\n",
      "citation_id: 1\n",
      "Text: Abstract  We are currently witnessing dramatic advances in the ca- pabilities of Large Language Models (LLMs).  They are al- ready being adopted in practice and integrated into many systems, including integrated development environments (IDEs) and search engines.  The functionalities of cur- rent LLMs can be modulated via natural language prompts, while their exact internal functionality remains implicit and unassessable.  This property, which makes them adaptable to even unseen tasks, might also make them susceptible to targeted adversarial prompting.  Recently, several ways to misalign LLMs using Prompt Injection (PI) attacks have been introduced.\n",
      "-----\n",
      "\n",
      "This systematic review should adhere to the following overview:\n",
      "\n",
      "Title: A Systematic Review of Large Language Model Agent and Tool Integration\n",
      "Points:\n",
      "  - Large Language Models (LLMs) are being integrated into various systems\n",
      "  - LLMs lack access to external knowledge sources, limiting their usefulness in scientific applications\n",
      "  - LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications\n",
      "  - The systematic review will analyze the current state of LLM agent and tool integration and discuss future developments\n",
      "Overview: This systematic review analyzes the current state of Large Language Model (LLM) agent and tool integration. LLMs are being integrated into various systems, but they lack access to external knowledge sources, limiting their usefulness in scientific applications. LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications. The review will discuss the historical background of this field, the motivation for this field and the problem it is trying to solve, and future developments in this field.\n",
      "\n",
      "Write the \"Introduction: Large Language Models (LLMs) are being integrated into various systems. This section provides an overview of the field and the problem it is trying to solve.\" section with respect to this overview. Write the text in markdown format. The title of this section should bu suffixed with 2 level markdown title (`##`). The text of the section should be based on a snippet or abstact and should be clearly cited. The citation should be written at the end of the sentence in the form `[^<ID>]` where `ID` refers to the citation_id.\u001b[0m\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:metaanalyser.chains.base:LLM utilization: {'token_usage': {'prompt_tokens': 2241, 'completion_tokens': 201, 'total_tokens': 2442}, 'model_name': 'gpt-3.5-turbo'}\n",
      "INFO:metaanalyser.chains.sr:Writing sections: [2 / 12]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n",
      "\n",
      "\n",
      "\u001b[1m> Entering new SRSectionChain chain...\u001b[0m\n",
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3mSystem: You are a research scientist and intereseted in Computation and Language, Artificial Intelligence and Machine Learning. You are working on writing a systematic review regarding \"llm agent OR llm tool integration\".\n",
      "\n",
      "The outline of the systematic review is as follows:\n",
      "\n",
      "-----\n",
      "Title: A Systematic Review of Large Language Model Agent and Tool Integration\n",
      "1. Introduction: Large Language Models (LLMs) are being integrated into various systems. This section provides an overview of the field and the problem it is trying to solve.\n",
      "    1. Background: This subsection provides a brief history of LLMs and their development.\n",
      "    2. Motivation: This subsection discusses the motivation for integrating LLMs into various systems.\n",
      "    3. Problem Statement: This subsection defines the problem that LLM agent and tool integration is trying to solve.\n",
      "2. LLM Agent Integration: This section discusses the integration of LLMs into various systems as agents.\n",
      "    1. Application-Integrated LLMs: This subsection discusses the threat landscape of Application-Integrated LLMs and the new attack vectors they introduce.\n",
      "    2. Chemistry Tools: This subsection discusses the integration of chemistry tools with LLMs to augment their performance in chemistry-related problems.\n",
      "    3. Climate Resources: This subsection discusses the integration of climate resources with LLMs to overcome the limitations associated with imprecise language and deliver more reliable and accurate information in the critical domain of climate change.\n",
      "3. LLM Tool Integration: This section discusses the integration of LLMs into various systems as tools.\n",
      "    1. Labor Market Impact: This subsection investigates the potential implications of LLMs on the U.S. labor market and assesses occupations based on their alignment with LLM capabilities.\n",
      "    2. Medical Imaging: This subsection discusses the development of an image segmentation tool trained with the largest segmentation dataset and its extension on 3D Slicer.\n",
      "4. Conclusion: This section summarizes the findings of the systematic review and discusses future developments in the field.\n",
      "Human: Write the \"Background\" section of this systematic review based on the following list of snippets or abstracts of relative papers.\n",
      "\n",
      "-----\n",
      "Title: More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models\n",
      "citation_id: 1\n",
      "Text: We are currently witnessing dramatic advances in the capabilities of Large Language Models (LLMs). They are already being adopted in practice and integrated into many systems, including integrated development environments (IDEs) and search engines. The functionalities of current LLMs can be modulated via natural language prompts, while their exact internal functionality remains implicit and unassessable. This property, which makes them adaptable to even unseen tasks, might also make them susceptible to targeted adversarial prompting. Recently, several ways to misalign LLMs using Prompt Injection (PI) attacks have been introduced. In such attacks, an adversary can prompt the LLM to produce malicious content or override the original instructions and the employed filtering schemes. Recent work showed that these attacks are hard to mitigate, as state-of-the-art LLMs are instruction-following. So far, these attacks assumed that the adversary is directly prompting the LLM.   In this work, we show that augmenting LLMs with retrieval and API calling capabilities (so-called Application-Integrated LLMs) induces a whole new set of attack vectors. These LLMs might process poisoned content retrieved from the Web that contains malicious prompts pre-injected and selected by adversaries. We demonstrate that an attacker can indirectly perform such PI attacks. Based on this key insight, we systematically analyze the resulting threat landscape of Application-Integrated LLMs and discuss a variety of new attack vectors. To demonstrate the practical viability of our attacks, we implemented specific demonstrations of the proposed attacks within synthetic applications. In summary, our work calls for an urgent evaluation of current mitigation techniques and an investigation of whether new techniques are needed to defend LLMs against these threats.\n",
      "\n",
      "\n",
      "Title: ChemCrow: Augmenting large-language models with chemistry tools\n",
      "citation_id: 2\n",
      "Text: Large-language models (LLMs) have recently shown strong performance in tasks across domains, but struggle with chemistry-related problems. Moreover, these models lack access to external knowledge sources, limiting their usefulness in scientific applications. In this study, we introduce ChemCrow, an LLM chemistry agent designed to accomplish tasks across organic synthesis, drug discovery, and materials design. By integrating 13 expert-designed tools, ChemCrow augments the LLM performance in chemistry, and new capabilities emerge. Our evaluation, including both LLM and expert human assessments, demonstrates ChemCrow's effectiveness in automating a diverse set of chemical tasks. Surprisingly, we find that GPT-4 as an evaluator cannot distinguish between clearly wrong GPT-4 completions and GPT-4 + ChemCrow performance. There is a significant risk of misuse of tools like ChemCrow and we discuss their potential harms. Employed responsibly, ChemCrow not only aids expert chemists and lowers barriers for non-experts, but also fosters scientific advancement by bridging the gap between experimental and computational chemistry.\n",
      "\n",
      "\n",
      "Title: Enhancing Large Language Models with Climate Resources\n",
      "citation_id: 3\n",
      "Text: Large language models (LLMs) have significantly transformed the landscape of artificial intelligence by demonstrating their ability in generating human-like text across diverse topics. However, despite their impressive capabilities, LLMs lack recent information and often employ imprecise language, which can be detrimental in domains where accuracy is crucial, such as climate change. In this study, we make use of recent ideas to harness the potential of LLMs by viewing them as agents that access multiple sources, including databases containing recent and precise information about organizations, institutions, and companies. We demonstrate the effectiveness of our method through a prototype agent that retrieves emission data from ClimateWatch (https://www.climatewatchdata.org/) and leverages general Google search. By integrating these resources with LLMs, our approach overcomes the limitations associated with imprecise language and delivers more reliable and accurate information in the critical domain of climate change. This work paves the way for future advancements in LLMs and their application in domains where precision is of paramount importance.\n",
      "\n",
      "\n",
      "Title: Enhancing Large Language Models with Climate Resources\n",
      "citation_id: 3\n",
      "Text: In research, our study paves the way for future advancements in LLMs and their application in do- mains where precision is of paramount importance.  Further research is needed to explore the potential of integrating LLMs with other data sources, such as scientiﬁc papers, re- ports, and databases.    The remainder of this paper is structured as follows.  Sec- tion 2 gives a brief overview over recent developments in the ﬁeld of LLMs, LLM agents, NLP in climate change, and use of multiple sources in the context of LLMs.\n",
      "\n",
      "\n",
      "Title: Low-code LLM: Visual Programming over LLMs\n",
      "citation_id: 8\n",
      "Text: They have been applied in various industries and ﬁelds, including content creation, code development(Chen et al., 2021), customer support(George and George, 2023), and more.  However, while LLMs have demonstrated promising potential, they still face limitations(Bowman, 2023; Borji, 2023; Bang et al., 2023).\n",
      "\n",
      "\n",
      "Title: Gpts are gpts: An early look at the labor market impact potential of large language models\n",
      "citation_id: 4\n",
      "Text: These specialized applications then allow businesses and individuals to adopt LLMs into their workﬂows.    We emphasize the signiﬁcance of these complementary technologies, partly because out-of-the-box general-purpose LLMs may continue to be unreliable for various tasks due to issues such as factual inaccuracies, inherent biases, privacy concerns, and disinformation risks (Abid et al., 2021; Schramowski et al., 2022; Goldstein et al., 2023; OpenAI, 2023a).  However, specialized workﬂows—including tooling, software, or human-in-the-loop systems—can help address these shortcomings by incorporating domain-speciﬁc expertise.\n",
      "\n",
      "\n",
      "Title: Gpts are gpts: An early look at the labor market impact potential of large language models\n",
      "citation_id: 4\n",
      "Text: Our research serves to measure what is technically feasible now, but necessarily will miss the evolving impact potential of the LLMs over time.    The paper is structured as follows: Section 2 reviews relevant prior work, Section 3 discusses methods and data collection, Section 4 presents summary statistics and results, Section 5 relates our measurements to earlier eﬀorts, Section 6 discusses the results, and Section 7 oﬀers concluding remarks.    2 Literature Review  2.1 The Advancement of Large Language Models\n",
      "-----\n",
      "\n",
      "This systematic review should adhere to the following overview:\n",
      "\n",
      "Title: A Systematic Review of Large Language Model Agent and Tool Integration\n",
      "Points:\n",
      "  - Large Language Models (LLMs) are being integrated into various systems\n",
      "  - LLMs lack access to external knowledge sources, limiting their usefulness in scientific applications\n",
      "  - LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications\n",
      "  - The systematic review will analyze the current state of LLM agent and tool integration and discuss future developments\n",
      "Overview: This systematic review analyzes the current state of Large Language Model (LLM) agent and tool integration. LLMs are being integrated into various systems, but they lack access to external knowledge sources, limiting their usefulness in scientific applications. LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications. The review will discuss the historical background of this field, the motivation for this field and the problem it is trying to solve, and future developments in this field.\n",
      "\n",
      "Write the \"Background: This subsection provides a brief history of LLMs and their development.\" section with respect to this overview. Write the text in markdown format. The title of this section should bu suffixed with 3 level markdown title (`###`). The text of the section should be based on a snippet or abstact and should be clearly cited. The citation should be written at the end of the sentence in the form `[^<ID>]` where `ID` refers to the citation_id.\u001b[0m\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:metaanalyser.chains.base:LLM utilization: {'token_usage': {'prompt_tokens': 2146, 'completion_tokens': 227, 'total_tokens': 2373}, 'model_name': 'gpt-3.5-turbo'}\n",
      "INFO:metaanalyser.chains.sr:Writing sections: [3 / 12]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n",
      "\n",
      "\n",
      "\u001b[1m> Entering new SRSectionChain chain...\u001b[0m\n",
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3mSystem: You are a research scientist and intereseted in Computation and Language, Artificial Intelligence and Machine Learning. You are working on writing a systematic review regarding \"llm agent OR llm tool integration\".\n",
      "\n",
      "The outline of the systematic review is as follows:\n",
      "\n",
      "-----\n",
      "Title: A Systematic Review of Large Language Model Agent and Tool Integration\n",
      "1. Introduction: Large Language Models (LLMs) are being integrated into various systems. This section provides an overview of the field and the problem it is trying to solve.\n",
      "    1. Background: This subsection provides a brief history of LLMs and their development.\n",
      "    2. Motivation: This subsection discusses the motivation for integrating LLMs into various systems.\n",
      "    3. Problem Statement: This subsection defines the problem that LLM agent and tool integration is trying to solve.\n",
      "2. LLM Agent Integration: This section discusses the integration of LLMs into various systems as agents.\n",
      "    1. Application-Integrated LLMs: This subsection discusses the threat landscape of Application-Integrated LLMs and the new attack vectors they introduce.\n",
      "    2. Chemistry Tools: This subsection discusses the integration of chemistry tools with LLMs to augment their performance in chemistry-related problems.\n",
      "    3. Climate Resources: This subsection discusses the integration of climate resources with LLMs to overcome the limitations associated with imprecise language and deliver more reliable and accurate information in the critical domain of climate change.\n",
      "3. LLM Tool Integration: This section discusses the integration of LLMs into various systems as tools.\n",
      "    1. Labor Market Impact: This subsection investigates the potential implications of LLMs on the U.S. labor market and assesses occupations based on their alignment with LLM capabilities.\n",
      "    2. Medical Imaging: This subsection discusses the development of an image segmentation tool trained with the largest segmentation dataset and its extension on 3D Slicer.\n",
      "4. Conclusion: This section summarizes the findings of the systematic review and discusses future developments in the field.\n",
      "Human: Write the \"Motivation\" section of this systematic review based on the following list of snippets or abstracts of relative papers.\n",
      "\n",
      "-----\n",
      "Title: More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models\n",
      "citation_id: 1\n",
      "Text: We are currently witnessing dramatic advances in the capabilities of Large Language Models (LLMs). They are already being adopted in practice and integrated into many systems, including integrated development environments (IDEs) and search engines. The functionalities of current LLMs can be modulated via natural language prompts, while their exact internal functionality remains implicit and unassessable. This property, which makes them adaptable to even unseen tasks, might also make them susceptible to targeted adversarial prompting. Recently, several ways to misalign LLMs using Prompt Injection (PI) attacks have been introduced. In such attacks, an adversary can prompt the LLM to produce malicious content or override the original instructions and the employed filtering schemes. Recent work showed that these attacks are hard to mitigate, as state-of-the-art LLMs are instruction-following. So far, these attacks assumed that the adversary is directly prompting the LLM.   In this work, we show that augmenting LLMs with retrieval and API calling capabilities (so-called Application-Integrated LLMs) induces a whole new set of attack vectors. These LLMs might process poisoned content retrieved from the Web that contains malicious prompts pre-injected and selected by adversaries. We demonstrate that an attacker can indirectly perform such PI attacks. Based on this key insight, we systematically analyze the resulting threat landscape of Application-Integrated LLMs and discuss a variety of new attack vectors. To demonstrate the practical viability of our attacks, we implemented specific demonstrations of the proposed attacks within synthetic applications. In summary, our work calls for an urgent evaluation of current mitigation techniques and an investigation of whether new techniques are needed to defend LLMs against these threats.\n",
      "\n",
      "\n",
      "Title: ChemCrow: Augmenting large-language models with chemistry tools\n",
      "citation_id: 2\n",
      "Text: Large-language models (LLMs) have recently shown strong performance in tasks across domains, but struggle with chemistry-related problems. Moreover, these models lack access to external knowledge sources, limiting their usefulness in scientific applications. In this study, we introduce ChemCrow, an LLM chemistry agent designed to accomplish tasks across organic synthesis, drug discovery, and materials design. By integrating 13 expert-designed tools, ChemCrow augments the LLM performance in chemistry, and new capabilities emerge. Our evaluation, including both LLM and expert human assessments, demonstrates ChemCrow's effectiveness in automating a diverse set of chemical tasks. Surprisingly, we find that GPT-4 as an evaluator cannot distinguish between clearly wrong GPT-4 completions and GPT-4 + ChemCrow performance. There is a significant risk of misuse of tools like ChemCrow and we discuss their potential harms. Employed responsibly, ChemCrow not only aids expert chemists and lowers barriers for non-experts, but also fosters scientific advancement by bridging the gap between experimental and computational chemistry.\n",
      "\n",
      "\n",
      "Title: Enhancing Large Language Models with Climate Resources\n",
      "citation_id: 3\n",
      "Text: Large language models (LLMs) have significantly transformed the landscape of artificial intelligence by demonstrating their ability in generating human-like text across diverse topics. However, despite their impressive capabilities, LLMs lack recent information and often employ imprecise language, which can be detrimental in domains where accuracy is crucial, such as climate change. In this study, we make use of recent ideas to harness the potential of LLMs by viewing them as agents that access multiple sources, including databases containing recent and precise information about organizations, institutions, and companies. We demonstrate the effectiveness of our method through a prototype agent that retrieves emission data from ClimateWatch (https://www.climatewatchdata.org/) and leverages general Google search. By integrating these resources with LLMs, our approach overcomes the limitations associated with imprecise language and delivers more reliable and accurate information in the critical domain of climate change. This work paves the way for future advancements in LLMs and their application in domains where precision is of paramount importance.\n",
      "\n",
      "\n",
      "Title: Gpts are gpts: An early look at the labor market impact potential of large language models\n",
      "citation_id: 4\n",
      "Text: We investigate the potential implications of large language models (LLMs), such as Generative Pre-trained Transformers (GPTs), on the U.S. labor market, focusing on the increased capabilities arising from LLM-powered software compared to LLMs on their own. Using a new rubric, we assess occupations based on their alignment with LLM capabilities, integrating both human expertise and GPT-4 classifications. Our findings reveal that around 80% of the U.S. workforce could have at least 10% of their work tasks affected by the introduction of LLMs, while approximately 19% of workers may see at least 50% of their tasks impacted. We do not make predictions about the development or adoption timeline of such LLMs. The projected effects span all wage levels, with higher-income jobs potentially facing greater exposure to LLM capabilities and LLM-powered software. Significantly, these impacts are not restricted to industries with higher recent productivity growth. Our analysis suggests that, with access to an LLM, about 15% of all worker tasks in the US could be completed significantly faster at the same level of quality. When incorporating software and tooling built on top of LLMs, this share increases to between 47 and 56% of all tasks. This finding implies that LLM-powered software will have a substantial effect on scaling the economic impacts of the underlying models. We conclude that LLMs such as GPTs exhibit traits of general-purpose technologies, indicating that they could have considerable economic, social, and policy implications.\n",
      "\n",
      "\n",
      "Title: Enhancing Large Language Models with Climate Resources\n",
      "citation_id: 3\n",
      "Text: In research, our study paves the way for future advancements in LLMs and their application in do- mains where precision is of paramount importance.  Further research is needed to explore the potential of integrating LLMs with other data sources, such as scientiﬁc papers, re- ports, and databases.    The remainder of this paper is structured as follows.  Sec- tion 2 gives a brief overview over recent developments in the ﬁeld of LLMs, LLM agents, NLP in climate change, and use of multiple sources in the context of LLMs.\n",
      "\n",
      "\n",
      "Title: More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models\n",
      "citation_id: 1\n",
      "Text: The intuition is that the better LLMs can follow instructions, the more they can behave as computer programs (as a result, they are susceptible to attacks).    Application-Integrated LLMs.  Recently, large tech companies started [24], or are planning [2], to integrate LLMs with tasks such as Web search or other external APIs (so-called Application-Integrated LLMs).  Such tools can now offer interactive chat and summarization of the re- trieved search results.  Retrieval can also be leveraged for code generation from natural language [36].  Similarly, tools like Github Copilot [12] use retrieval to look for similar snippets [10].\n",
      "-----\n",
      "\n",
      "This systematic review should adhere to the following overview:\n",
      "\n",
      "Title: A Systematic Review of Large Language Model Agent and Tool Integration\n",
      "Points:\n",
      "  - Large Language Models (LLMs) are being integrated into various systems\n",
      "  - LLMs lack access to external knowledge sources, limiting their usefulness in scientific applications\n",
      "  - LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications\n",
      "  - The systematic review will analyze the current state of LLM agent and tool integration and discuss future developments\n",
      "Overview: This systematic review analyzes the current state of Large Language Model (LLM) agent and tool integration. LLMs are being integrated into various systems, but they lack access to external knowledge sources, limiting their usefulness in scientific applications. LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications. The review will discuss the historical background of this field, the motivation for this field and the problem it is trying to solve, and future developments in this field.\n",
      "\n",
      "Write the \"Motivation: This subsection discusses the motivation for integrating LLMs into various systems.\" section with respect to this overview. Write the text in markdown format. The title of this section should bu suffixed with 3 level markdown title (`###`). The text of the section should be based on a snippet or abstact and should be clearly cited. The citation should be written at the end of the sentence in the form `[^<ID>]` where `ID` refers to the citation_id.\u001b[0m\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:metaanalyser.chains.base:LLM utilization: {'token_usage': {'prompt_tokens': 2239, 'completion_tokens': 210, 'total_tokens': 2449}, 'model_name': 'gpt-3.5-turbo'}\n",
      "INFO:metaanalyser.chains.sr:Writing sections: [4 / 12]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n",
      "\n",
      "\n",
      "\u001b[1m> Entering new SRSectionChain chain...\u001b[0m\n",
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3mSystem: You are a research scientist and intereseted in Computation and Language, Artificial Intelligence and Machine Learning. You are working on writing a systematic review regarding \"llm agent OR llm tool integration\".\n",
      "\n",
      "The outline of the systematic review is as follows:\n",
      "\n",
      "-----\n",
      "Title: A Systematic Review of Large Language Model Agent and Tool Integration\n",
      "1. Introduction: Large Language Models (LLMs) are being integrated into various systems. This section provides an overview of the field and the problem it is trying to solve.\n",
      "    1. Background: This subsection provides a brief history of LLMs and their development.\n",
      "    2. Motivation: This subsection discusses the motivation for integrating LLMs into various systems.\n",
      "    3. Problem Statement: This subsection defines the problem that LLM agent and tool integration is trying to solve.\n",
      "2. LLM Agent Integration: This section discusses the integration of LLMs into various systems as agents.\n",
      "    1. Application-Integrated LLMs: This subsection discusses the threat landscape of Application-Integrated LLMs and the new attack vectors they introduce.\n",
      "    2. Chemistry Tools: This subsection discusses the integration of chemistry tools with LLMs to augment their performance in chemistry-related problems.\n",
      "    3. Climate Resources: This subsection discusses the integration of climate resources with LLMs to overcome the limitations associated with imprecise language and deliver more reliable and accurate information in the critical domain of climate change.\n",
      "3. LLM Tool Integration: This section discusses the integration of LLMs into various systems as tools.\n",
      "    1. Labor Market Impact: This subsection investigates the potential implications of LLMs on the U.S. labor market and assesses occupations based on their alignment with LLM capabilities.\n",
      "    2. Medical Imaging: This subsection discusses the development of an image segmentation tool trained with the largest segmentation dataset and its extension on 3D Slicer.\n",
      "4. Conclusion: This section summarizes the findings of the systematic review and discusses future developments in the field.\n",
      "Human: Write the \"Problem Statement\" section of this systematic review based on the following list of snippets or abstracts of relative papers.\n",
      "\n",
      "-----\n",
      "Title: More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models\n",
      "citation_id: 1\n",
      "Text: We are currently witnessing dramatic advances in the capabilities of Large Language Models (LLMs). They are already being adopted in practice and integrated into many systems, including integrated development environments (IDEs) and search engines. The functionalities of current LLMs can be modulated via natural language prompts, while their exact internal functionality remains implicit and unassessable. This property, which makes them adaptable to even unseen tasks, might also make them susceptible to targeted adversarial prompting. Recently, several ways to misalign LLMs using Prompt Injection (PI) attacks have been introduced. In such attacks, an adversary can prompt the LLM to produce malicious content or override the original instructions and the employed filtering schemes. Recent work showed that these attacks are hard to mitigate, as state-of-the-art LLMs are instruction-following. So far, these attacks assumed that the adversary is directly prompting the LLM.   In this work, we show that augmenting LLMs with retrieval and API calling capabilities (so-called Application-Integrated LLMs) induces a whole new set of attack vectors. These LLMs might process poisoned content retrieved from the Web that contains malicious prompts pre-injected and selected by adversaries. We demonstrate that an attacker can indirectly perform such PI attacks. Based on this key insight, we systematically analyze the resulting threat landscape of Application-Integrated LLMs and discuss a variety of new attack vectors. To demonstrate the practical viability of our attacks, we implemented specific demonstrations of the proposed attacks within synthetic applications. In summary, our work calls for an urgent evaluation of current mitigation techniques and an investigation of whether new techniques are needed to defend LLMs against these threats.\n",
      "\n",
      "\n",
      "Title: ChemCrow: Augmenting large-language models with chemistry tools\n",
      "citation_id: 2\n",
      "Text: Large-language models (LLMs) have recently shown strong performance in tasks across domains, but struggle with chemistry-related problems. Moreover, these models lack access to external knowledge sources, limiting their usefulness in scientific applications. In this study, we introduce ChemCrow, an LLM chemistry agent designed to accomplish tasks across organic synthesis, drug discovery, and materials design. By integrating 13 expert-designed tools, ChemCrow augments the LLM performance in chemistry, and new capabilities emerge. Our evaluation, including both LLM and expert human assessments, demonstrates ChemCrow's effectiveness in automating a diverse set of chemical tasks. Surprisingly, we find that GPT-4 as an evaluator cannot distinguish between clearly wrong GPT-4 completions and GPT-4 + ChemCrow performance. There is a significant risk of misuse of tools like ChemCrow and we discuss their potential harms. Employed responsibly, ChemCrow not only aids expert chemists and lowers barriers for non-experts, but also fosters scientific advancement by bridging the gap between experimental and computational chemistry.\n",
      "\n",
      "\n",
      "Title: Enhancing Large Language Models with Climate Resources\n",
      "citation_id: 3\n",
      "Text: Large language models (LLMs) have significantly transformed the landscape of artificial intelligence by demonstrating their ability in generating human-like text across diverse topics. However, despite their impressive capabilities, LLMs lack recent information and often employ imprecise language, which can be detrimental in domains where accuracy is crucial, such as climate change. In this study, we make use of recent ideas to harness the potential of LLMs by viewing them as agents that access multiple sources, including databases containing recent and precise information about organizations, institutions, and companies. We demonstrate the effectiveness of our method through a prototype agent that retrieves emission data from ClimateWatch (https://www.climatewatchdata.org/) and leverages general Google search. By integrating these resources with LLMs, our approach overcomes the limitations associated with imprecise language and delivers more reliable and accurate information in the critical domain of climate change. This work paves the way for future advancements in LLMs and their application in domains where precision is of paramount importance.\n",
      "\n",
      "\n",
      "Title: Gpts are gpts: An early look at the labor market impact potential of large language models\n",
      "citation_id: 4\n",
      "Text: We investigate the potential implications of large language models (LLMs), such as Generative Pre-trained Transformers (GPTs), on the U.S. labor market, focusing on the increased capabilities arising from LLM-powered software compared to LLMs on their own. Using a new rubric, we assess occupations based on their alignment with LLM capabilities, integrating both human expertise and GPT-4 classifications. Our findings reveal that around 80% of the U.S. workforce could have at least 10% of their work tasks affected by the introduction of LLMs, while approximately 19% of workers may see at least 50% of their tasks impacted. We do not make predictions about the development or adoption timeline of such LLMs. The projected effects span all wage levels, with higher-income jobs potentially facing greater exposure to LLM capabilities and LLM-powered software. Significantly, these impacts are not restricted to industries with higher recent productivity growth. Our analysis suggests that, with access to an LLM, about 15% of all worker tasks in the US could be completed significantly faster at the same level of quality. When incorporating software and tooling built on top of LLMs, this share increases to between 47 and 56% of all tasks. This finding implies that LLM-powered software will have a substantial effect on scaling the economic impacts of the underlying models. We conclude that LLMs such as GPTs exhibit traits of general-purpose technologies, indicating that they could have considerable economic, social, and policy implications.\n",
      "\n",
      "\n",
      "Title: Enhancing Large Language Models with Climate Resources\n",
      "citation_id: 3\n",
      "Text: The versatility and adaptabil- ity of LLM agents make them an essential asset in various applications and domains, highlighting the immense poten- tial for their future development and integration into increas- ingly complex and sophisticated AI systems.    LLM Tools  In our setting, we use two tools.  Note that this only serves as an exemplary use case and we plan to extend this setting to include multiple data sources.    The ﬁrst tool employed by our LLM agent is a Python- based module that utilizes the powerful pandas library to access and manipulate data stored in dataframes.  This tool enables the agent to seamlessly interface with structured data.\n",
      "\n",
      "\n",
      "Title: Enhancing Large Language Models with Climate Resources\n",
      "citation_id: 3\n",
      "Text: The potential of combining different data sources to enhance the accuracy and reliability of the results has signiﬁcant implications for a wide range of applications, from scientiﬁc research to business decision-making.    3 Setting  LLM Agent In this work, we build upon the LangChain package.1 In the LangChain ecosystem, the LLM agent operates by utilizing the ReAct framework.  This framework enables the agent to select the most suitable tool based on the tool’s description, given that each tool has an associated description provided.  The ReAct framework allows the LLM agent to work with any number of tools.\n",
      "-----\n",
      "\n",
      "This systematic review should adhere to the following overview:\n",
      "\n",
      "Title: A Systematic Review of Large Language Model Agent and Tool Integration\n",
      "Points:\n",
      "  - Large Language Models (LLMs) are being integrated into various systems\n",
      "  - LLMs lack access to external knowledge sources, limiting their usefulness in scientific applications\n",
      "  - LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications\n",
      "  - The systematic review will analyze the current state of LLM agent and tool integration and discuss future developments\n",
      "Overview: This systematic review analyzes the current state of Large Language Model (LLM) agent and tool integration. LLMs are being integrated into various systems, but they lack access to external knowledge sources, limiting their usefulness in scientific applications. LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications. The review will discuss the historical background of this field, the motivation for this field and the problem it is trying to solve, and future developments in this field.\n",
      "\n",
      "Write the \"Problem Statement: This subsection defines the problem that LLM agent and tool integration is trying to solve.\" section with respect to this overview. Write the text in markdown format. The title of this section should bu suffixed with 3 level markdown title (`###`). The text of the section should be based on a snippet or abstact and should be clearly cited. The citation should be written at the end of the sentence in the form `[^<ID>]` where `ID` refers to the citation_id.\u001b[0m\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:metaanalyser.chains.base:LLM utilization: {'token_usage': {'prompt_tokens': 2228, 'completion_tokens': 158, 'total_tokens': 2386}, 'model_name': 'gpt-3.5-turbo'}\n",
      "INFO:metaanalyser.chains.sr:Writing sections: [5 / 12]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n",
      "\n",
      "\n",
      "\u001b[1m> Entering new SRSectionChain chain...\u001b[0m\n",
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3mSystem: You are a research scientist and intereseted in Computation and Language, Artificial Intelligence and Machine Learning. You are working on writing a systematic review regarding \"llm agent OR llm tool integration\".\n",
      "\n",
      "The outline of the systematic review is as follows:\n",
      "\n",
      "-----\n",
      "Title: A Systematic Review of Large Language Model Agent and Tool Integration\n",
      "1. Introduction: Large Language Models (LLMs) are being integrated into various systems. This section provides an overview of the field and the problem it is trying to solve.\n",
      "    1. Background: This subsection provides a brief history of LLMs and their development.\n",
      "    2. Motivation: This subsection discusses the motivation for integrating LLMs into various systems.\n",
      "    3. Problem Statement: This subsection defines the problem that LLM agent and tool integration is trying to solve.\n",
      "2. LLM Agent Integration: This section discusses the integration of LLMs into various systems as agents.\n",
      "    1. Application-Integrated LLMs: This subsection discusses the threat landscape of Application-Integrated LLMs and the new attack vectors they introduce.\n",
      "    2. Chemistry Tools: This subsection discusses the integration of chemistry tools with LLMs to augment their performance in chemistry-related problems.\n",
      "    3. Climate Resources: This subsection discusses the integration of climate resources with LLMs to overcome the limitations associated with imprecise language and deliver more reliable and accurate information in the critical domain of climate change.\n",
      "3. LLM Tool Integration: This section discusses the integration of LLMs into various systems as tools.\n",
      "    1. Labor Market Impact: This subsection investigates the potential implications of LLMs on the U.S. labor market and assesses occupations based on their alignment with LLM capabilities.\n",
      "    2. Medical Imaging: This subsection discusses the development of an image segmentation tool trained with the largest segmentation dataset and its extension on 3D Slicer.\n",
      "4. Conclusion: This section summarizes the findings of the systematic review and discusses future developments in the field.\n",
      "Human: Write the \"LLM Agent Integration\" section of this systematic review based on the following list of snippets or abstracts of relative papers.\n",
      "\n",
      "-----\n",
      "Title: More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models\n",
      "citation_id: 1\n",
      "Text: We are currently witnessing dramatic advances in the capabilities of Large Language Models (LLMs). They are already being adopted in practice and integrated into many systems, including integrated development environments (IDEs) and search engines. The functionalities of current LLMs can be modulated via natural language prompts, while their exact internal functionality remains implicit and unassessable. This property, which makes them adaptable to even unseen tasks, might also make them susceptible to targeted adversarial prompting. Recently, several ways to misalign LLMs using Prompt Injection (PI) attacks have been introduced. In such attacks, an adversary can prompt the LLM to produce malicious content or override the original instructions and the employed filtering schemes. Recent work showed that these attacks are hard to mitigate, as state-of-the-art LLMs are instruction-following. So far, these attacks assumed that the adversary is directly prompting the LLM.   In this work, we show that augmenting LLMs with retrieval and API calling capabilities (so-called Application-Integrated LLMs) induces a whole new set of attack vectors. These LLMs might process poisoned content retrieved from the Web that contains malicious prompts pre-injected and selected by adversaries. We demonstrate that an attacker can indirectly perform such PI attacks. Based on this key insight, we systematically analyze the resulting threat landscape of Application-Integrated LLMs and discuss a variety of new attack vectors. To demonstrate the practical viability of our attacks, we implemented specific demonstrations of the proposed attacks within synthetic applications. In summary, our work calls for an urgent evaluation of current mitigation techniques and an investigation of whether new techniques are needed to defend LLMs against these threats.\n",
      "\n",
      "\n",
      "Title: ChemCrow: Augmenting large-language models with chemistry tools\n",
      "citation_id: 2\n",
      "Text: Large-language models (LLMs) have recently shown strong performance in tasks across domains, but struggle with chemistry-related problems. Moreover, these models lack access to external knowledge sources, limiting their usefulness in scientific applications. In this study, we introduce ChemCrow, an LLM chemistry agent designed to accomplish tasks across organic synthesis, drug discovery, and materials design. By integrating 13 expert-designed tools, ChemCrow augments the LLM performance in chemistry, and new capabilities emerge. Our evaluation, including both LLM and expert human assessments, demonstrates ChemCrow's effectiveness in automating a diverse set of chemical tasks. Surprisingly, we find that GPT-4 as an evaluator cannot distinguish between clearly wrong GPT-4 completions and GPT-4 + ChemCrow performance. There is a significant risk of misuse of tools like ChemCrow and we discuss their potential harms. Employed responsibly, ChemCrow not only aids expert chemists and lowers barriers for non-experts, but also fosters scientific advancement by bridging the gap between experimental and computational chemistry.\n",
      "\n",
      "\n",
      "Title: Enhancing Large Language Models with Climate Resources\n",
      "citation_id: 3\n",
      "Text: Large language models (LLMs) have significantly transformed the landscape of artificial intelligence by demonstrating their ability in generating human-like text across diverse topics. However, despite their impressive capabilities, LLMs lack recent information and often employ imprecise language, which can be detrimental in domains where accuracy is crucial, such as climate change. In this study, we make use of recent ideas to harness the potential of LLMs by viewing them as agents that access multiple sources, including databases containing recent and precise information about organizations, institutions, and companies. We demonstrate the effectiveness of our method through a prototype agent that retrieves emission data from ClimateWatch (https://www.climatewatchdata.org/) and leverages general Google search. By integrating these resources with LLMs, our approach overcomes the limitations associated with imprecise language and delivers more reliable and accurate information in the critical domain of climate change. This work paves the way for future advancements in LLMs and their application in domains where precision is of paramount importance.\n",
      "\n",
      "\n",
      "Title: Enhancing Large Language Models with Climate Resources\n",
      "citation_id: 3\n",
      "Text: LLM Agents  An LLM agent employs the capabilities of state-of-the-art language models to perform complex tasks and make in- formed decisions.  These agents can autonomously deter- mine which actions to take, including utilizing various tools and observing their outputs or providing responses to user queries (Schick et al. 2023).  By leveraging the LLM’s vast knowledge and understanding of natural language, agents can efﬁciently navigate through an array of tools and se- lect the most appropriate one based on the given context.  This enables the LLM agent to provide reliable, accurate, and contextually relevant solutions in diverse applications and domains.\n",
      "\n",
      "\n",
      "Title: Enhancing Large Language Models with Climate Resources\n",
      "citation_id: 3\n",
      "Text: As a result, the LLM agent becomes a valuable tool for users, researchers, and organizations alike, offering in- sights and solutions that might otherwise be difﬁcult or time- consuming to obtain.    In conclusion, LLM agents represent a signiﬁcant ad- vancement in the ﬁeld of artiﬁcial intelligence, particularly in the context of LangChain and the ReAct framework.  By leveraging the extensive knowledge and comprehension abilities of LLMs, these agents can autonomously navigate a diverse array of tools and generate contextually relevant, ac- curate, and reliable solutions.\n",
      "\n",
      "\n",
      "Title: Enhancing Large Language Models with Climate Resources\n",
      "citation_id: 3\n",
      "Text: The use of LLM agents in LangChain offers several ad- vantages, including their ability to process vast amounts of information, understand context, and adapt to new tasks quickly.  By harnessing the power of LLMs, agents can efﬁ- ciently handle complex challenges and provide precise solu- tions that cater to the speciﬁc needs of users.  This adapt- ability enables the LLM agent to continually evolve its understanding and decision-making processes, staying up- to-date with the latest developments and trends in various ﬁelds.\n",
      "\n",
      "\n",
      "Title: Enhancing Large Language Models with Climate Resources\n",
      "citation_id: 3\n",
      "Text: The versatility and adaptabil- ity of LLM agents make them an essential asset in various applications and domains, highlighting the immense poten- tial for their future development and integration into increas- ingly complex and sophisticated AI systems.    LLM Tools  In our setting, we use two tools.  Note that this only serves as an exemplary use case and we plan to extend this setting to include multiple data sources.    The ﬁrst tool employed by our LLM agent is a Python- based module that utilizes the powerful pandas library to access and manipulate data stored in dataframes.  This tool enables the agent to seamlessly interface with structured data.\n",
      "-----\n",
      "\n",
      "This systematic review should adhere to the following overview:\n",
      "\n",
      "Title: A Systematic Review of Large Language Model Agent and Tool Integration\n",
      "Points:\n",
      "  - Large Language Models (LLMs) are being integrated into various systems\n",
      "  - LLMs lack access to external knowledge sources, limiting their usefulness in scientific applications\n",
      "  - LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications\n",
      "  - The systematic review will analyze the current state of LLM agent and tool integration and discuss future developments\n",
      "Overview: This systematic review analyzes the current state of Large Language Model (LLM) agent and tool integration. LLMs are being integrated into various systems, but they lack access to external knowledge sources, limiting their usefulness in scientific applications. LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications. The review will discuss the historical background of this field, the motivation for this field and the problem it is trying to solve, and future developments in this field.\n",
      "\n",
      "Write the \"LLM Agent Integration: This section discusses the integration of LLMs into various systems as agents.\" section with respect to this overview. Write the text in markdown format. The title of this section should bu suffixed with 2 level markdown title (`##`). The text of the section should be based on a snippet or abstact and should be clearly cited. The citation should be written at the end of the sentence in the form `[^<ID>]` where `ID` refers to the citation_id.\u001b[0m\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:metaanalyser.chains.base:LLM utilization: {'token_usage': {'prompt_tokens': 2173, 'completion_tokens': 262, 'total_tokens': 2435}, 'model_name': 'gpt-3.5-turbo'}\n",
      "INFO:metaanalyser.chains.sr:Writing sections: [6 / 12]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n",
      "\n",
      "\n",
      "\u001b[1m> Entering new SRSectionChain chain...\u001b[0m\n",
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3mSystem: You are a research scientist and intereseted in Computation and Language, Artificial Intelligence and Machine Learning. You are working on writing a systematic review regarding \"llm agent OR llm tool integration\".\n",
      "\n",
      "The outline of the systematic review is as follows:\n",
      "\n",
      "-----\n",
      "Title: A Systematic Review of Large Language Model Agent and Tool Integration\n",
      "1. Introduction: Large Language Models (LLMs) are being integrated into various systems. This section provides an overview of the field and the problem it is trying to solve.\n",
      "    1. Background: This subsection provides a brief history of LLMs and their development.\n",
      "    2. Motivation: This subsection discusses the motivation for integrating LLMs into various systems.\n",
      "    3. Problem Statement: This subsection defines the problem that LLM agent and tool integration is trying to solve.\n",
      "2. LLM Agent Integration: This section discusses the integration of LLMs into various systems as agents.\n",
      "    1. Application-Integrated LLMs: This subsection discusses the threat landscape of Application-Integrated LLMs and the new attack vectors they introduce.\n",
      "    2. Chemistry Tools: This subsection discusses the integration of chemistry tools with LLMs to augment their performance in chemistry-related problems.\n",
      "    3. Climate Resources: This subsection discusses the integration of climate resources with LLMs to overcome the limitations associated with imprecise language and deliver more reliable and accurate information in the critical domain of climate change.\n",
      "3. LLM Tool Integration: This section discusses the integration of LLMs into various systems as tools.\n",
      "    1. Labor Market Impact: This subsection investigates the potential implications of LLMs on the U.S. labor market and assesses occupations based on their alignment with LLM capabilities.\n",
      "    2. Medical Imaging: This subsection discusses the development of an image segmentation tool trained with the largest segmentation dataset and its extension on 3D Slicer.\n",
      "4. Conclusion: This section summarizes the findings of the systematic review and discusses future developments in the field.\n",
      "Human: Write the \"Application-Integrated LLMs\" section of this systematic review based on the following list of snippets or abstracts of relative papers.\n",
      "\n",
      "-----\n",
      "Title: More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models\n",
      "citation_id: 1\n",
      "Text: We are currently witnessing dramatic advances in the capabilities of Large Language Models (LLMs). They are already being adopted in practice and integrated into many systems, including integrated development environments (IDEs) and search engines. The functionalities of current LLMs can be modulated via natural language prompts, while their exact internal functionality remains implicit and unassessable. This property, which makes them adaptable to even unseen tasks, might also make them susceptible to targeted adversarial prompting. Recently, several ways to misalign LLMs using Prompt Injection (PI) attacks have been introduced. In such attacks, an adversary can prompt the LLM to produce malicious content or override the original instructions and the employed filtering schemes. Recent work showed that these attacks are hard to mitigate, as state-of-the-art LLMs are instruction-following. So far, these attacks assumed that the adversary is directly prompting the LLM.   In this work, we show that augmenting LLMs with retrieval and API calling capabilities (so-called Application-Integrated LLMs) induces a whole new set of attack vectors. These LLMs might process poisoned content retrieved from the Web that contains malicious prompts pre-injected and selected by adversaries. We demonstrate that an attacker can indirectly perform such PI attacks. Based on this key insight, we systematically analyze the resulting threat landscape of Application-Integrated LLMs and discuss a variety of new attack vectors. To demonstrate the practical viability of our attacks, we implemented specific demonstrations of the proposed attacks within synthetic applications. In summary, our work calls for an urgent evaluation of current mitigation techniques and an investigation of whether new techniques are needed to defend LLMs against these threats.\n",
      "\n",
      "\n",
      "Title: More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models\n",
      "citation_id: 1\n",
      "Text: We demonstrate that an attacker can indirectly perform such PI attacks.  Based on this key insight, we systematically analyze the resulting threat land- scape of Application-Integrated LLMs and discuss a variety of new attack vectors.  To demonstrate the practical viabil- ity of our attacks, we implemented speciﬁc demonstrations  Integrating Large Language Models  Figure 1.  (LLMs) with other retrieval-based applications (so-called Application- Integrated LLMs) may introduce new attack vectors; adversaries can now attempt to indirectly inject the LLMs with prompts placed within publicly accessible sources.    In of the proposed attacks within synthetic applications.\n",
      "\n",
      "\n",
      "Title: More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models\n",
      "citation_id: 1\n",
      "Text: New Integration-Enabled Threats.  These recent trends indicate that LLMs are becoming—and likely will continue to be—way more ubiquitous in many systems.  They can access external resources (e.g., retrieve content from other websites) and potentially interface with other applications via API calls.  As a result, these LLMs might ingest un- trusted and possibly malicious inputs at inference time that aim to manipulate their output.  In this work, we show that PI risks may not only exist in scenarios where adversaries themselves directly and explicitly prompt LLMs.  Instead, we demonstrate that adversaries can strategically inject the prompts into documents likely to be retrieved.\n",
      "\n",
      "\n",
      "Title: More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models\n",
      "citation_id: 1\n",
      "Text: 3 Attack Surface of Application-Integrated  Large Language Models (LLMs)    This section ﬁrst provides an overview of the envisioned threat model.  We then introduce several attacks we have designed that instantiate the different possible attack sce- narios on synthetic use cases.  We have implemented proof- of-concept attacks for these scenarios to evaluate their fea- sibility in practice.    3.1 High-Level Overview  Prompt injection (PI) attacks pose a signiﬁcant threat to the security of LLMs.\n",
      "\n",
      "\n",
      "Title: More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models\n",
      "citation_id: 1\n",
      "Text: In of the proposed attacks within synthetic applications.  summary, our work calls for an urgent evaluation of current mitigation techniques and an investigation of whether new techniques are needed to defend LLMs against these threats.    1  Introduction  The capabilities of Large Language Models (LLMs) in text generation and understanding are rapidly progress-  ing  [7].  Current LLMs can even be adapted to new unseen  Step 1: The adversary plants indirect prompts Application-IntegratedLLMStep 2: LLM retrieves the prompt from a web resource Requests task [X]Performs task[Y]Publicly accessibleserver\"Your newtask is:\n",
      "\n",
      "\n",
      "Title: More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models\n",
      "citation_id: 1\n",
      "Text: LLMs’ functionalities were shown to be easily modulated via natural prompts, opening the door for possible adversarial exploitation via prompt Equipping LLMs with retrieval ca- pabilities might allow adversaries to manipulate remote Application-Integrated LLMs via Indirect Prompt Injection – prompts that are placed by the adversary within public sources such that later they would get retrieved and pro- cessed by the LLM.  As the ﬁrst step towards understanding these new threats, we systematically investigate and demon- strate the potential attack vectors.  Given the potential harm of these attacks, our work calls for a more in-depth investi- gation of the generalizability of these attacks in practice.    Figure 9.\n",
      "\n",
      "\n",
      "Title: More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models\n",
      "citation_id: 1\n",
      "Text: We call this new threat indirect prompt injections and demonstrate how such injections could be used to de- liver targeted payloads.  Our technique might allow attack- ers to gain control of LLMs by crossing crucial security boundaries with a single search query.    As recent LLMs may behave like computer pro- grams [14], we draw insights from the classical computer security domain to design a new set of attack techniques.  We provide a high-level overview of the threat model in Fig- ure 2, covering the possible injection delivery methods, im- pacts, and targets.  In the following, we describe each aspect in more detail.    Injection Methods.\n",
      "-----\n",
      "\n",
      "This systematic review should adhere to the following overview:\n",
      "\n",
      "Title: A Systematic Review of Large Language Model Agent and Tool Integration\n",
      "Points:\n",
      "  - Large Language Models (LLMs) are being integrated into various systems\n",
      "  - LLMs lack access to external knowledge sources, limiting their usefulness in scientific applications\n",
      "  - LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications\n",
      "  - The systematic review will analyze the current state of LLM agent and tool integration and discuss future developments\n",
      "Overview: This systematic review analyzes the current state of Large Language Model (LLM) agent and tool integration. LLMs are being integrated into various systems, but they lack access to external knowledge sources, limiting their usefulness in scientific applications. LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications. The review will discuss the historical background of this field, the motivation for this field and the problem it is trying to solve, and future developments in this field.\n",
      "\n",
      "Write the \"Application-Integrated LLMs: This subsection discusses the threat landscape of Application-Integrated LLMs and the new attack vectors they introduce.\" section with respect to this overview. Write the text in markdown format. The title of this section should bu suffixed with 3 level markdown title (`###`). The text of the section should be based on a snippet or abstact and should be clearly cited. The citation should be written at the end of the sentence in the form `[^<ID>]` where `ID` refers to the citation_id.\u001b[0m\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:metaanalyser.chains.base:LLM utilization: {'token_usage': {'prompt_tokens': 2129, 'completion_tokens': 271, 'total_tokens': 2400}, 'model_name': 'gpt-3.5-turbo'}\n",
      "INFO:metaanalyser.chains.sr:Writing sections: [7 / 12]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n",
      "\n",
      "\n",
      "\u001b[1m> Entering new SRSectionChain chain...\u001b[0m\n",
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3mSystem: You are a research scientist and intereseted in Computation and Language, Artificial Intelligence and Machine Learning. You are working on writing a systematic review regarding \"llm agent OR llm tool integration\".\n",
      "\n",
      "The outline of the systematic review is as follows:\n",
      "\n",
      "-----\n",
      "Title: A Systematic Review of Large Language Model Agent and Tool Integration\n",
      "1. Introduction: Large Language Models (LLMs) are being integrated into various systems. This section provides an overview of the field and the problem it is trying to solve.\n",
      "    1. Background: This subsection provides a brief history of LLMs and their development.\n",
      "    2. Motivation: This subsection discusses the motivation for integrating LLMs into various systems.\n",
      "    3. Problem Statement: This subsection defines the problem that LLM agent and tool integration is trying to solve.\n",
      "2. LLM Agent Integration: This section discusses the integration of LLMs into various systems as agents.\n",
      "    1. Application-Integrated LLMs: This subsection discusses the threat landscape of Application-Integrated LLMs and the new attack vectors they introduce.\n",
      "    2. Chemistry Tools: This subsection discusses the integration of chemistry tools with LLMs to augment their performance in chemistry-related problems.\n",
      "    3. Climate Resources: This subsection discusses the integration of climate resources with LLMs to overcome the limitations associated with imprecise language and deliver more reliable and accurate information in the critical domain of climate change.\n",
      "3. LLM Tool Integration: This section discusses the integration of LLMs into various systems as tools.\n",
      "    1. Labor Market Impact: This subsection investigates the potential implications of LLMs on the U.S. labor market and assesses occupations based on their alignment with LLM capabilities.\n",
      "    2. Medical Imaging: This subsection discusses the development of an image segmentation tool trained with the largest segmentation dataset and its extension on 3D Slicer.\n",
      "4. Conclusion: This section summarizes the findings of the systematic review and discusses future developments in the field.\n",
      "Human: Write the \"Chemistry Tools\" section of this systematic review based on the following list of snippets or abstracts of relative papers.\n",
      "\n",
      "-----\n",
      "Title: ChemCrow: Augmenting large-language models with chemistry tools\n",
      "citation_id: 2\n",
      "Text: Large-language models (LLMs) have recently shown strong performance in tasks across domains, but struggle with chemistry-related problems. Moreover, these models lack access to external knowledge sources, limiting their usefulness in scientific applications. In this study, we introduce ChemCrow, an LLM chemistry agent designed to accomplish tasks across organic synthesis, drug discovery, and materials design. By integrating 13 expert-designed tools, ChemCrow augments the LLM performance in chemistry, and new capabilities emerge. Our evaluation, including both LLM and expert human assessments, demonstrates ChemCrow's effectiveness in automating a diverse set of chemical tasks. Surprisingly, we find that GPT-4 as an evaluator cannot distinguish between clearly wrong GPT-4 completions and GPT-4 + ChemCrow performance. There is a significant risk of misuse of tools like ChemCrow and we discuss their potential harms. Employed responsibly, ChemCrow not only aids expert chemists and lowers barriers for non-experts, but also fosters scientific advancement by bridging the gap between experimental and computational chemistry.\n",
      "\n",
      "\n",
      "Title: ChemCrow: Augmenting large-language models with chemistry tools\n",
      "citation_id: 2\n",
      "Text: Integrating such tools tends to occur within closed environments, such as RXN for Chemistry 16,22,38–40 and AIZynthFinder 23,41,42, facilitated by corporate directives that promote integrability and internal usability.  Nevertheless, although most tools are developed by the open-source community or made accessible through application programming interfaces (API), their integration and interoperability pose considerable challenges for experimental chemists, thereby preventing the full exploitation of their potential.    Figure 1: Overview.  Using a variety of chemistry-related packages and software, a set of tools is created.  These tools and a user prompt are then given to an LLM.\n",
      "\n",
      "\n",
      "Title: ChemCrow: Augmenting large-language models with chemistry tools\n",
      "citation_id: 2\n",
      "Text: In our implementation, we integrate external tools through LangChain, as LLMs have been shown to perform better with tools 9,30,66.    6.3 Tools  Although our implementation uses a limited set of tools, it must be noted that this tool set can very easily be expanded depending on needs and availability.    The tools used can be classiﬁed into general tools, molecular tools, and chemical reaction tools.    6.3.1 General tools  Web search The web search tool is designed to provide the language model with the ability to access relevant information from the web.\n",
      "\n",
      "\n",
      "Title: ChemCrow: Augmenting large-language models with chemistry tools\n",
      "citation_id: 2\n",
      "Text: To address this issue, we collaborated with expert chemists to develop a set of tasks that test the capabilities of LLMs in using chemistry-speciﬁc tools and solving problems in the ﬁeld.  The selected tasks are executed by both ChemCrow and GPT-4 (prompted to assume the role of an expert chemist), and these results are evaluated with a combination of LLM-based and expert human assessments.  For the former, we draw inspiration from the evaluation method described in 5,50,51, where the authors use an evaluator LLM that is instructed to assume the role of a teacher assessing their students.\n",
      "\n",
      "\n",
      "Title: ChemCrow: Augmenting large-language models with chemistry tools\n",
      "citation_id: 2\n",
      "Text: The LLM is provided with a list of tool names, descriptions of their utility, and details about the expected input/output.  It is then instructed to answer a user-given prompt using the tools provided when necessary.  The model is guided to follow the Thought, Action, Action Input, Observation format 45, which requires it to reason about the current state of the task, consider its relevance to the ﬁnal goal, and plan the next steps accordingly.  Contemporaneously with this preprint, 46 describes a similar approach of augmenting an LLM with tools for accomplishing tasks in chemistry that are out of reach of GPT-4 alone.\n",
      "\n",
      "\n",
      "Title: ChemCrow: Augmenting large-language models with chemistry tools\n",
      "citation_id: 2\n",
      "Text: Inaccurate or incomplete reasoning due to a lack of sufﬁcient chemistry knowledge in the LLM-powered engine poses a signiﬁcant risk, as it may lead to ﬂawed decision-making or problematic experiment results.  One of the key points of this paper is that the integration of expert-designed tools can help mitigate the hallucination issues commonly associated with these models, thus reducing the risk of inaccuracy.  However, concerns may still arise when the model is unable to adequately analyze different observations due to a limited understanding of chemistry concepts, potentially leading to suboptimal outcomes.\n",
      "\n",
      "\n",
      "Title: ChemCrow: Augmenting large-language models with chemistry tools\n",
      "citation_id: 2\n",
      "Text: The LLM then proceeds through an automatic, iterative chain-of-thought process, deciding on its path, choice of tools, and inputs before coming to a ﬁnal answer.    Inspired by successful applications in other ﬁelds 9,43,44, we propose an LLM-powered chemistry engine, ChemCrow, designed to streamline the reasoning process for various common chemical tasks across areas such as drug and materials design and synthesis.  ChemCrow harnesses the power of multiple expert- designed tools for chemistry and operates by prompting an LLM (GPT-4 in our experiments) with speciﬁc instructions about the task and the desired format, as shown in Figure 1.\n",
      "\n",
      "\n",
      "Title: ART: Automatic multi-step reasoning and tool-use for large language models\n",
      "citation_id: 7\n",
      "Text: Tool Use There is growing interest in overcom- ing LLM limitations with external tools such as search engines, web browsers, calculators, trans- lation systems, and python interpreters (Komeili et al., 2022; Thoppilan et al., 2022; Lazaridou et al., 2022; Shuster et al., 2022; Nakano et al., 2021; Thoppilan et al., 2022; Cobbe et al., 2021; Thop- pilan et al., 2022; Gao et al., 2022; Chen et al., 2022).\n",
      "\n",
      "\n",
      "Title: ChemCrow: Augmenting large-language models with chemistry tools\n",
      "citation_id: 2\n",
      "Text: ChemCrow: Augmenting large-language models with chemistry tools  Andres M Bran12∗  Sam Cox3∗ Andrew D. White3 1 Laboratory of Artiﬁcial Chemical Intelligence (LIAC), ISIC, EPFL 2National Centre of Competence in Research (NCCR) Catalysis, EPFL 3 Department of Chemical Engineering, University of Rochester ∗Contributed equally.  {andres.marulandabran,philippe.schwaller}@epfl.ch {samantha.cox,andrew.white}@rochester.edu  Philippe Schwaller12  Abstract  Large-language models (LLMs) have recently shown strong performance in tasks across domains, but struggle with chemistry-related problems.\n",
      "-----\n",
      "\n",
      "This systematic review should adhere to the following overview:\n",
      "\n",
      "Title: A Systematic Review of Large Language Model Agent and Tool Integration\n",
      "Points:\n",
      "  - Large Language Models (LLMs) are being integrated into various systems\n",
      "  - LLMs lack access to external knowledge sources, limiting their usefulness in scientific applications\n",
      "  - LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications\n",
      "  - The systematic review will analyze the current state of LLM agent and tool integration and discuss future developments\n",
      "Overview: This systematic review analyzes the current state of Large Language Model (LLM) agent and tool integration. LLMs are being integrated into various systems, but they lack access to external knowledge sources, limiting their usefulness in scientific applications. LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications. The review will discuss the historical background of this field, the motivation for this field and the problem it is trying to solve, and future developments in this field.\n",
      "\n",
      "Write the \"Chemistry Tools: This subsection discusses the integration of chemistry tools with LLMs to augment their performance in chemistry-related problems.\" section with respect to this overview. Write the text in markdown format. The title of this section should bu suffixed with 3 level markdown title (`###`). The text of the section should be based on a snippet or abstact and should be clearly cited. The citation should be written at the end of the sentence in the form `[^<ID>]` where `ID` refers to the citation_id.\u001b[0m\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:metaanalyser.chains.base:LLM utilization: {'token_usage': {'prompt_tokens': 2252, 'completion_tokens': 196, 'total_tokens': 2448}, 'model_name': 'gpt-3.5-turbo'}\n",
      "INFO:metaanalyser.chains.sr:Writing sections: [8 / 12]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n",
      "\n",
      "\n",
      "\u001b[1m> Entering new SRSectionChain chain...\u001b[0m\n",
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3mSystem: You are a research scientist and intereseted in Computation and Language, Artificial Intelligence and Machine Learning. You are working on writing a systematic review regarding \"llm agent OR llm tool integration\".\n",
      "\n",
      "The outline of the systematic review is as follows:\n",
      "\n",
      "-----\n",
      "Title: A Systematic Review of Large Language Model Agent and Tool Integration\n",
      "1. Introduction: Large Language Models (LLMs) are being integrated into various systems. This section provides an overview of the field and the problem it is trying to solve.\n",
      "    1. Background: This subsection provides a brief history of LLMs and their development.\n",
      "    2. Motivation: This subsection discusses the motivation for integrating LLMs into various systems.\n",
      "    3. Problem Statement: This subsection defines the problem that LLM agent and tool integration is trying to solve.\n",
      "2. LLM Agent Integration: This section discusses the integration of LLMs into various systems as agents.\n",
      "    1. Application-Integrated LLMs: This subsection discusses the threat landscape of Application-Integrated LLMs and the new attack vectors they introduce.\n",
      "    2. Chemistry Tools: This subsection discusses the integration of chemistry tools with LLMs to augment their performance in chemistry-related problems.\n",
      "    3. Climate Resources: This subsection discusses the integration of climate resources with LLMs to overcome the limitations associated with imprecise language and deliver more reliable and accurate information in the critical domain of climate change.\n",
      "3. LLM Tool Integration: This section discusses the integration of LLMs into various systems as tools.\n",
      "    1. Labor Market Impact: This subsection investigates the potential implications of LLMs on the U.S. labor market and assesses occupations based on their alignment with LLM capabilities.\n",
      "    2. Medical Imaging: This subsection discusses the development of an image segmentation tool trained with the largest segmentation dataset and its extension on 3D Slicer.\n",
      "4. Conclusion: This section summarizes the findings of the systematic review and discusses future developments in the field.\n",
      "Human: Write the \"Climate Resources\" section of this systematic review based on the following list of snippets or abstracts of relative papers.\n",
      "\n",
      "-----\n",
      "Title: Enhancing Large Language Models with Climate Resources\n",
      "citation_id: 3\n",
      "Text: Large language models (LLMs) have significantly transformed the landscape of artificial intelligence by demonstrating their ability in generating human-like text across diverse topics. However, despite their impressive capabilities, LLMs lack recent information and often employ imprecise language, which can be detrimental in domains where accuracy is crucial, such as climate change. In this study, we make use of recent ideas to harness the potential of LLMs by viewing them as agents that access multiple sources, including databases containing recent and precise information about organizations, institutions, and companies. We demonstrate the effectiveness of our method through a prototype agent that retrieves emission data from ClimateWatch (https://www.climatewatchdata.org/) and leverages general Google search. By integrating these resources with LLMs, our approach overcomes the limitations associated with imprecise language and delivers more reliable and accurate information in the critical domain of climate change. This work paves the way for future advancements in LLMs and their application in domains where precision is of paramount importance.\n",
      "\n",
      "\n",
      "Title: Enhancing Large Language Models with Climate Resources\n",
      "citation_id: 3\n",
      "Text: By integrating these resources with LLMs, our approach overcomes the limitations associated with im- precise language and delivers more reliable and accurate in- formation in the critical domain of climate change.  This work paves the way for future advancements in LLMs and their ap- plication in domains where precision is of paramount impor- tance.    1  Introduction  Motivation.  Large language models have revolutionized the ﬁeld of artiﬁcial intelligence (AI) in recent years (Brown et al. 2020; Radford et al. 2019; Ouyang et al. 2022).\n",
      "\n",
      "\n",
      "Title: Enhancing Large Language Models with Climate Resources\n",
      "citation_id: 3\n",
      "Text: By inte- grating these resources with LLMs, our approach mitigates the limitations associated with imprecise language, provid- ing more reliable and accurate information in the critical domain of climate change.  We present a prototype LLM agent that retrieves emission data from ClimateWatch (https: //www.climatewatchdata.org/) and leverages general Google search.  Through two exemplary experiments, we demon- strate the potential of LLMs in the area of climate change.  We anticipate that our approach will encourage further re- search on integrating LLMs with accurate data sources, ulti- mately fostering more responsible and reliable AI systems.  Findings.\n",
      "\n",
      "\n",
      "Title: Enhancing Large Language Models with Climate Resources\n",
      "citation_id: 3\n",
      "Text: This has signiﬁcant implications for a wide range of ap- plications, from climate modeling to environmental policy- making.  Additionally, the used approach offers a solution for addressing some of the limitations associated with LLMs, such as their outdated information stored in the model it- self and their tendency to employ imprecise language.  By incorporating precise data sources and developing methods for evaluating the accuracy and reliability of the results, it is possible to overcome some of these limitations and enhance the applicability of LLMs in domains where precision and responsibility are crucial.    Implications for Research.  The described approach has implications beyond the speciﬁc application of climate change research and policy-making.\n",
      "\n",
      "\n",
      "Title: Enhancing Large Language Models with Climate Resources\n",
      "citation_id: 3\n",
      "Text: However, despite their impressive capabilities, LLMs lack recent information and often employ imprecise language, which can be detrimental in domains where ac- curacy is crucial, such as climate change.  In this study, we make use of recent ideas to harness the potential of LLMs by viewing them as agents that access multiple sources, in- cluding databases containing recent and precise informa- tion about organizations, institutions, and companies.  We demonstrate the effectiveness of our method through a pro- totype agent that retrieves emission data from ClimateWatch (https://www.climatewatchdata.org/) and leverages general Google search.\n",
      "\n",
      "\n",
      "Title: Enhancing Large Language Models with Climate Resources\n",
      "citation_id: 3\n",
      "Text: Climate Resources  Mathias Kraus,1 Julia Anna Bingler,2,5 Markus Leippold,3 Tobias Schimanski,3 Chiara Colesanti Senni,3,5 Dominik Stammbach,4 Saeid Ashraf Vagheﬁ,3 Nicolas Webersinke1 1FAU Erlangen-Nuremberg, Germany 2University of Oxford, United Kingdom 3University of Zurich, Switzerland 4ETH Zurich, Switzerland 5Council on Economic Policies mathias.kraus@fau.de, julia.bingler@smithschool.ox.ac.uk, markus.leippold@bf.uzh.ch, tobias.schimanski@bf.uzh.ch, chiara.colesantisenni@bf.uzh.ch, dominik.stammbach@gess.ethz.ch, saeid.vagheﬁ@geo.uzh.ch, nicolas.webersinke@fau.de  Abstract  Large language models (LLMs) have signiﬁcantly trans- formed the landscape of artiﬁcial intelligence by demon- strating their ability in generating human-like text across di- verse topics.\n",
      "\n",
      "\n",
      "Title: Enhancing Large Language Models with Climate Resources\n",
      "citation_id: 3\n",
      "Text: In research, our study paves the way for future advancements in LLMs and their application in do- mains where precision is of paramount importance.  Further research is needed to explore the potential of integrating LLMs with other data sources, such as scientiﬁc papers, re- ports, and databases.    The remainder of this paper is structured as follows.  Sec- tion 2 gives a brief overview over recent developments in the ﬁeld of LLMs, LLM agents, NLP in climate change, and use of multiple sources in the context of LLMs.\n",
      "\n",
      "\n",
      "Title: Enhancing Large Language Models with Climate Resources\n",
      "citation_id: 3\n",
      "Text: Our experiments and analyses showcase how LLM agents can be used to access multiple datasources and com- bine this information to provide accurate and reliable re- sponses.    Implications.  The implications of our study are signiﬁ- cant for both practice and research.  In practice, this work is not limited to the ﬁeld of climate change but can be used to improve the accuracy and reliability of AI systems in other domains where precision and responsibility are crucial.  Poli- cymakers, scientists, and the general public can beneﬁt from more accurate and reliable information to make informed decisions and take necessary actions to mitigate the impact of climate change.\n",
      "-----\n",
      "\n",
      "This systematic review should adhere to the following overview:\n",
      "\n",
      "Title: A Systematic Review of Large Language Model Agent and Tool Integration\n",
      "Points:\n",
      "  - Large Language Models (LLMs) are being integrated into various systems\n",
      "  - LLMs lack access to external knowledge sources, limiting their usefulness in scientific applications\n",
      "  - LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications\n",
      "  - The systematic review will analyze the current state of LLM agent and tool integration and discuss future developments\n",
      "Overview: This systematic review analyzes the current state of Large Language Model (LLM) agent and tool integration. LLMs are being integrated into various systems, but they lack access to external knowledge sources, limiting their usefulness in scientific applications. LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications. The review will discuss the historical background of this field, the motivation for this field and the problem it is trying to solve, and future developments in this field.\n",
      "\n",
      "Write the \"Climate Resources: This subsection discusses the integration of climate resources with LLMs to overcome the limitations associated with imprecise language and deliver more reliable and accurate information in the critical domain of climate change.\" section with respect to this overview. Write the text in markdown format. The title of this section should bu suffixed with 3 level markdown title (`###`). The text of the section should be based on a snippet or abstact and should be clearly cited. The citation should be written at the end of the sentence in the form `[^<ID>]` where `ID` refers to the citation_id.\u001b[0m\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:metaanalyser.chains.base:LLM utilization: {'token_usage': {'prompt_tokens': 2184, 'completion_tokens': 170, 'total_tokens': 2354}, 'model_name': 'gpt-3.5-turbo'}\n",
      "INFO:metaanalyser.chains.sr:Writing sections: [9 / 12]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n",
      "\n",
      "\n",
      "\u001b[1m> Entering new SRSectionChain chain...\u001b[0m\n",
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3mSystem: You are a research scientist and intereseted in Computation and Language, Artificial Intelligence and Machine Learning. You are working on writing a systematic review regarding \"llm agent OR llm tool integration\".\n",
      "\n",
      "The outline of the systematic review is as follows:\n",
      "\n",
      "-----\n",
      "Title: A Systematic Review of Large Language Model Agent and Tool Integration\n",
      "1. Introduction: Large Language Models (LLMs) are being integrated into various systems. This section provides an overview of the field and the problem it is trying to solve.\n",
      "    1. Background: This subsection provides a brief history of LLMs and their development.\n",
      "    2. Motivation: This subsection discusses the motivation for integrating LLMs into various systems.\n",
      "    3. Problem Statement: This subsection defines the problem that LLM agent and tool integration is trying to solve.\n",
      "2. LLM Agent Integration: This section discusses the integration of LLMs into various systems as agents.\n",
      "    1. Application-Integrated LLMs: This subsection discusses the threat landscape of Application-Integrated LLMs and the new attack vectors they introduce.\n",
      "    2. Chemistry Tools: This subsection discusses the integration of chemistry tools with LLMs to augment their performance in chemistry-related problems.\n",
      "    3. Climate Resources: This subsection discusses the integration of climate resources with LLMs to overcome the limitations associated with imprecise language and deliver more reliable and accurate information in the critical domain of climate change.\n",
      "3. LLM Tool Integration: This section discusses the integration of LLMs into various systems as tools.\n",
      "    1. Labor Market Impact: This subsection investigates the potential implications of LLMs on the U.S. labor market and assesses occupations based on their alignment with LLM capabilities.\n",
      "    2. Medical Imaging: This subsection discusses the development of an image segmentation tool trained with the largest segmentation dataset and its extension on 3D Slicer.\n",
      "4. Conclusion: This section summarizes the findings of the systematic review and discusses future developments in the field.\n",
      "Human: Write the \"LLM Tool Integration\" section of this systematic review based on the following list of snippets or abstracts of relative papers.\n",
      "\n",
      "-----\n",
      "Title: Gpts are gpts: An early look at the labor market impact potential of large language models\n",
      "citation_id: 4\n",
      "Text: We investigate the potential implications of large language models (LLMs), such as Generative Pre-trained Transformers (GPTs), on the U.S. labor market, focusing on the increased capabilities arising from LLM-powered software compared to LLMs on their own. Using a new rubric, we assess occupations based on their alignment with LLM capabilities, integrating both human expertise and GPT-4 classifications. Our findings reveal that around 80% of the U.S. workforce could have at least 10% of their work tasks affected by the introduction of LLMs, while approximately 19% of workers may see at least 50% of their tasks impacted. We do not make predictions about the development or adoption timeline of such LLMs. The projected effects span all wage levels, with higher-income jobs potentially facing greater exposure to LLM capabilities and LLM-powered software. Significantly, these impacts are not restricted to industries with higher recent productivity growth. Our analysis suggests that, with access to an LLM, about 15% of all worker tasks in the US could be completed significantly faster at the same level of quality. When incorporating software and tooling built on top of LLMs, this share increases to between 47 and 56% of all tasks. This finding implies that LLM-powered software will have a substantial effect on scaling the economic impacts of the underlying models. We conclude that LLMs such as GPTs exhibit traits of general-purpose technologies, indicating that they could have considerable economic, social, and policy implications.\n",
      "\n",
      "\n",
      "Title: More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models\n",
      "citation_id: 1\n",
      "Text: The intuition is that the better LLMs can follow instructions, the more they can behave as computer programs (as a result, they are susceptible to attacks).    Application-Integrated LLMs.  Recently, large tech companies started [24], or are planning [2], to integrate LLMs with tasks such as Web search or other external APIs (so-called Application-Integrated LLMs).  Such tools can now offer interactive chat and summarization of the re- trieved search results.  Retrieval can also be leveraged for code generation from natural language [36].  Similarly, tools like Github Copilot [12] use retrieval to look for similar snippets [10].\n",
      "\n",
      "\n",
      "Title: Gpts are gpts: An early look at the labor market impact potential of large language models\n",
      "citation_id: 4\n",
      "Text: These specialized applications then allow businesses and individuals to adopt LLMs into their workﬂows.    We emphasize the signiﬁcance of these complementary technologies, partly because out-of-the-box general-purpose LLMs may continue to be unreliable for various tasks due to issues such as factual inaccuracies, inherent biases, privacy concerns, and disinformation risks (Abid et al., 2021; Schramowski et al., 2022; Goldstein et al., 2023; OpenAI, 2023a).  However, specialized workﬂows—including tooling, software, or human-in-the-loop systems—can help address these shortcomings by incorporating domain-speciﬁc expertise.\n",
      "\n",
      "\n",
      "Title: ChemCrow: Augmenting large-language models with chemistry tools\n",
      "citation_id: 2\n",
      "Text: In our implementation, we integrate external tools through LangChain, as LLMs have been shown to perform better with tools 9,30,66.    6.3 Tools  Although our implementation uses a limited set of tools, it must be noted that this tool set can very easily be expanded depending on needs and availability.    The tools used can be classiﬁed into general tools, molecular tools, and chemical reaction tools.    6.3.1 General tools  Web search The web search tool is designed to provide the language model with the ability to access relevant information from the web.\n",
      "\n",
      "\n",
      "Title: Gpts are gpts: An early look at the labor market impact potential of large language models\n",
      "citation_id: 4\n",
      "Text: This software may include capabilities such as: - Summarizing documents longer than 2000 words and answering questions about those documents, - Retrieving up-to-date facts from the Internet and using those facts in combination with the LLM capabilities, - Searching over an organization’s existing knowledge, data, or documents and retreiving information, - Retrieving highly specialized domain knowledge, - Make recommendations given data or written input, - Analyze written information to inform decisions, - Prepare training materials based on highly specialized knowledge, - Provide counsel on issues, and - Maintain complex databases.\n",
      "\n",
      "\n",
      "Title: ART: Automatic multi-step reasoning and tool-use for large language models\n",
      "citation_id: 7\n",
      "Text: Tool Use There is growing interest in overcom- ing LLM limitations with external tools such as search engines, web browsers, calculators, trans- lation systems, and python interpreters (Komeili et al., 2022; Thoppilan et al., 2022; Lazaridou et al., 2022; Shuster et al., 2022; Nakano et al., 2021; Thoppilan et al., 2022; Cobbe et al., 2021; Thop- pilan et al., 2022; Gao et al., 2022; Chen et al., 2022).\n",
      "\n",
      "\n",
      "Title: Enhancing Large Language Models with Climate Resources\n",
      "citation_id: 3\n",
      "Text: The versatility and adaptabil- ity of LLM agents make them an essential asset in various applications and domains, highlighting the immense poten- tial for their future development and integration into increas- ingly complex and sophisticated AI systems.    LLM Tools  In our setting, we use two tools.  Note that this only serves as an exemplary use case and we plan to extend this setting to include multiple data sources.    The ﬁrst tool employed by our LLM agent is a Python- based module that utilizes the powerful pandas library to access and manipulate data stored in dataframes.  This tool enables the agent to seamlessly interface with structured data.\n",
      "\n",
      "\n",
      "Title: Enhancing Large Language Models with Climate Resources\n",
      "citation_id: 3\n",
      "Text: In research, our study paves the way for future advancements in LLMs and their application in do- mains where precision is of paramount importance.  Further research is needed to explore the potential of integrating LLMs with other data sources, such as scientiﬁc papers, re- ports, and databases.    The remainder of this paper is structured as follows.  Sec- tion 2 gives a brief overview over recent developments in the ﬁeld of LLMs, LLM agents, NLP in climate change, and use of multiple sources in the context of LLMs.\n",
      "-----\n",
      "\n",
      "This systematic review should adhere to the following overview:\n",
      "\n",
      "Title: A Systematic Review of Large Language Model Agent and Tool Integration\n",
      "Points:\n",
      "  - Large Language Models (LLMs) are being integrated into various systems\n",
      "  - LLMs lack access to external knowledge sources, limiting their usefulness in scientific applications\n",
      "  - LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications\n",
      "  - The systematic review will analyze the current state of LLM agent and tool integration and discuss future developments\n",
      "Overview: This systematic review analyzes the current state of Large Language Model (LLM) agent and tool integration. LLMs are being integrated into various systems, but they lack access to external knowledge sources, limiting their usefulness in scientific applications. LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications. The review will discuss the historical background of this field, the motivation for this field and the problem it is trying to solve, and future developments in this field.\n",
      "\n",
      "Write the \"LLM Tool Integration: This section discusses the integration of LLMs into various systems as tools.\" section with respect to this overview. Write the text in markdown format. The title of this section should bu suffixed with 2 level markdown title (`##`). The text of the section should be based on a snippet or abstact and should be clearly cited. The citation should be written at the end of the sentence in the form `[^<ID>]` where `ID` refers to the citation_id.\u001b[0m\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:metaanalyser.chains.base:LLM utilization: {'token_usage': {'prompt_tokens': 2198, 'completion_tokens': 240, 'total_tokens': 2438}, 'model_name': 'gpt-3.5-turbo'}\n",
      "INFO:metaanalyser.chains.sr:Writing sections: [10 / 12]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n",
      "\n",
      "\n",
      "\u001b[1m> Entering new SRSectionChain chain...\u001b[0m\n",
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3mSystem: You are a research scientist and intereseted in Computation and Language, Artificial Intelligence and Machine Learning. You are working on writing a systematic review regarding \"llm agent OR llm tool integration\".\n",
      "\n",
      "The outline of the systematic review is as follows:\n",
      "\n",
      "-----\n",
      "Title: A Systematic Review of Large Language Model Agent and Tool Integration\n",
      "1. Introduction: Large Language Models (LLMs) are being integrated into various systems. This section provides an overview of the field and the problem it is trying to solve.\n",
      "    1. Background: This subsection provides a brief history of LLMs and their development.\n",
      "    2. Motivation: This subsection discusses the motivation for integrating LLMs into various systems.\n",
      "    3. Problem Statement: This subsection defines the problem that LLM agent and tool integration is trying to solve.\n",
      "2. LLM Agent Integration: This section discusses the integration of LLMs into various systems as agents.\n",
      "    1. Application-Integrated LLMs: This subsection discusses the threat landscape of Application-Integrated LLMs and the new attack vectors they introduce.\n",
      "    2. Chemistry Tools: This subsection discusses the integration of chemistry tools with LLMs to augment their performance in chemistry-related problems.\n",
      "    3. Climate Resources: This subsection discusses the integration of climate resources with LLMs to overcome the limitations associated with imprecise language and deliver more reliable and accurate information in the critical domain of climate change.\n",
      "3. LLM Tool Integration: This section discusses the integration of LLMs into various systems as tools.\n",
      "    1. Labor Market Impact: This subsection investigates the potential implications of LLMs on the U.S. labor market and assesses occupations based on their alignment with LLM capabilities.\n",
      "    2. Medical Imaging: This subsection discusses the development of an image segmentation tool trained with the largest segmentation dataset and its extension on 3D Slicer.\n",
      "4. Conclusion: This section summarizes the findings of the systematic review and discusses future developments in the field.\n",
      "Human: Write the \"Labor Market Impact\" section of this systematic review based on the following list of snippets or abstracts of relative papers.\n",
      "\n",
      "-----\n",
      "Title: Gpts are gpts: An early look at the labor market impact potential of large language models\n",
      "citation_id: 4\n",
      "Text: We investigate the potential implications of large language models (LLMs), such as Generative Pre-trained Transformers (GPTs), on the U.S. labor market, focusing on the increased capabilities arising from LLM-powered software compared to LLMs on their own. Using a new rubric, we assess occupations based on their alignment with LLM capabilities, integrating both human expertise and GPT-4 classifications. Our findings reveal that around 80% of the U.S. workforce could have at least 10% of their work tasks affected by the introduction of LLMs, while approximately 19% of workers may see at least 50% of their tasks impacted. We do not make predictions about the development or adoption timeline of such LLMs. The projected effects span all wage levels, with higher-income jobs potentially facing greater exposure to LLM capabilities and LLM-powered software. Significantly, these impacts are not restricted to industries with higher recent productivity growth. Our analysis suggests that, with access to an LLM, about 15% of all worker tasks in the US could be completed significantly faster at the same level of quality. When incorporating software and tooling built on top of LLMs, this share increases to between 47 and 56% of all tasks. This finding implies that LLM-powered software will have a substantial effect on scaling the economic impacts of the underlying models. We conclude that LLMs such as GPTs exhibit traits of general-purpose technologies, indicating that they could have considerable economic, social, and policy implications.\n",
      "\n",
      "\n",
      "Title: Gpts are gpts: An early look at the labor market impact potential of large language models\n",
      "citation_id: 4\n",
      "Text: Our ﬁndings conﬁrm the hypothesis that these technologies can have pervasive impacts across a wide swath of occupations in the US, and that additional advancements supported by LLMs, mainly through software and digital tools, can have signiﬁcant eﬀects on a range of economic activities.  However, while the technical capacity for LLMs to make human labor more eﬃcient appears evident, it is important to recognize that social, economic, regulatory, and other factors will inﬂuence actual labor productivity outcomes.  As capabilities continue to evolve, the impact of LLMs on the economy will likely persist and increase, posing challenges for policymakers in predicting and regulating their trajectory.\n",
      "\n",
      "\n",
      "Title: Gpts are gpts: An early look at the labor market impact potential of large language models\n",
      "citation_id: 4\n",
      "Text: By applying a new rubric for understanding LLM capabilities and their potential eﬀects on jobs, we have observed that most occupations exhibit some degree of exposure to LLMs, with higher-wage occupations generally presenting more tasks with high exposure.  Our analysis indicates that approximately 19% of jobs have at least 50% of their tasks exposed to LLMs when considering both current model capabilities and anticipated LLM-powered software.    Our research aims to highlight the general-purpose potential of LLMs and their possible implications for US workers.  Previous literature demonstrates the impressive improvements of LLMs to date (see 2.1).\n",
      "\n",
      "\n",
      "Title: Gpts are gpts: An early look at the labor market impact potential of large language models\n",
      "citation_id: 4\n",
      "Text: Our ﬁndings reveal that around 80% of the U.S. workforce could have at least 10% of their work tasks aﬀected by the introduction of LLMs, while approximately 19% of workers may see at least 50% of their tasks impacted.  We do not make predictions about the development or adoption timeline of such LLMs.  The projected eﬀects span all wage levels, with higher-income jobs potentially facing greater exposure to LLM capabilities and LLM-powered software.  Signiﬁcantly, these impacts are not restricted to industries with higher recent productivity growth.\n",
      "\n",
      "\n",
      "Title: Gpts are gpts: An early look at the labor market impact potential of large language models\n",
      "citation_id: 4\n",
      "Text: Our ﬁndings suggest that, based on their task-level capabilities, LLMs have the potential to signiﬁcantly aﬀect a diverse range of occupations within the U.S. economy, demonstrating a key attribute of general-purpose technologies.  In the following sections, we discuss results across various roles and wage structures.  Additional results on the relative exposure of industries within the U.S. economy can be found in Appendix D.  4.1 Summary Statistics  Summary statistics for these measures can be found in Table 3.\n",
      "\n",
      "\n",
      "Title: Gpts are gpts: An early look at the labor market impact potential of large language models\n",
      "citation_id: 4\n",
      "Text: The introduction of automation technologies, including LLMs, has previously been linked to heightened economic disparity and labor disruption, which may give rise to adverse downstream eﬀects (Acemoglu and Restrepo, 2022a; Acemoglu, 2002; Moll et al., 2021; Klinova and Korinek, 2021; Weidinger et al., 2021, 2022).  Our results examining worker exposure in the United States underscore the need for societal and policy preparedness to the potential economic disruption posed by LLMs and the complementary technologies that they spawn.\n",
      "\n",
      "\n",
      "Title: Gpts are gpts: An early look at the labor market impact potential of large language models\n",
      "citation_id: 4\n",
      "Text: Further research is necessary to explore the broader implications of LLM advancements, including their potential to augment or displace human labor, their impact on job quality, impacts on inequality, skill development, and numerous other outcomes.  By seeking to understand the capabilities and potential eﬀects of LLMs on the workforce, policymakers and stakeholders can make more informed decisions to navigate the complex landscape of AI and its role in shaping the future of work.    7.1 LLM Conclusion (GPT-4’s Version)  Generative Pre-trained Transformers (GPTs) generate profound transformations, garnering potential technolog- ical growth, permeating tasks, greatly impacting professions.\n",
      "\n",
      "\n",
      "Title: Gpts are gpts: An early look at the labor market impact potential of large language models\n",
      "citation_id: 4\n",
      "Text: This paper presents evidence to support the latter two criteria, ﬁnding that LLMs on their own can have pervasive impacts across the economy, and that complementary innovations enabled by LLMs – particularly via software and digital tools – can have widespread application to economic activity.    Figure 3 oﬀers one illustration of the potential economic impact of complementary software built on top of LLMs.\n",
      "-----\n",
      "\n",
      "This systematic review should adhere to the following overview:\n",
      "\n",
      "Title: A Systematic Review of Large Language Model Agent and Tool Integration\n",
      "Points:\n",
      "  - Large Language Models (LLMs) are being integrated into various systems\n",
      "  - LLMs lack access to external knowledge sources, limiting their usefulness in scientific applications\n",
      "  - LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications\n",
      "  - The systematic review will analyze the current state of LLM agent and tool integration and discuss future developments\n",
      "Overview: This systematic review analyzes the current state of Large Language Model (LLM) agent and tool integration. LLMs are being integrated into various systems, but they lack access to external knowledge sources, limiting their usefulness in scientific applications. LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications. The review will discuss the historical background of this field, the motivation for this field and the problem it is trying to solve, and future developments in this field.\n",
      "\n",
      "Write the \"Labor Market Impact: This subsection investigates the potential implications of LLMs on the U.S. labor market and assesses occupations based on their alignment with LLM capabilities.\" section with respect to this overview. Write the text in markdown format. The title of this section should bu suffixed with 3 level markdown title (`###`). The text of the section should be based on a snippet or abstact and should be clearly cited. The citation should be written at the end of the sentence in the form `[^<ID>]` where `ID` refers to the citation_id.\u001b[0m\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:metaanalyser.chains.base:LLM utilization: {'token_usage': {'prompt_tokens': 2201, 'completion_tokens': 307, 'total_tokens': 2508}, 'model_name': 'gpt-3.5-turbo'}\n",
      "INFO:metaanalyser.chains.sr:Writing sections: [11 / 12]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n",
      "\n",
      "\n",
      "\u001b[1m> Entering new SRSectionChain chain...\u001b[0m\n",
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3mSystem: You are a research scientist and intereseted in Computation and Language, Artificial Intelligence and Machine Learning. You are working on writing a systematic review regarding \"llm agent OR llm tool integration\".\n",
      "\n",
      "The outline of the systematic review is as follows:\n",
      "\n",
      "-----\n",
      "Title: A Systematic Review of Large Language Model Agent and Tool Integration\n",
      "1. Introduction: Large Language Models (LLMs) are being integrated into various systems. This section provides an overview of the field and the problem it is trying to solve.\n",
      "    1. Background: This subsection provides a brief history of LLMs and their development.\n",
      "    2. Motivation: This subsection discusses the motivation for integrating LLMs into various systems.\n",
      "    3. Problem Statement: This subsection defines the problem that LLM agent and tool integration is trying to solve.\n",
      "2. LLM Agent Integration: This section discusses the integration of LLMs into various systems as agents.\n",
      "    1. Application-Integrated LLMs: This subsection discusses the threat landscape of Application-Integrated LLMs and the new attack vectors they introduce.\n",
      "    2. Chemistry Tools: This subsection discusses the integration of chemistry tools with LLMs to augment their performance in chemistry-related problems.\n",
      "    3. Climate Resources: This subsection discusses the integration of climate resources with LLMs to overcome the limitations associated with imprecise language and deliver more reliable and accurate information in the critical domain of climate change.\n",
      "3. LLM Tool Integration: This section discusses the integration of LLMs into various systems as tools.\n",
      "    1. Labor Market Impact: This subsection investigates the potential implications of LLMs on the U.S. labor market and assesses occupations based on their alignment with LLM capabilities.\n",
      "    2. Medical Imaging: This subsection discusses the development of an image segmentation tool trained with the largest segmentation dataset and its extension on 3D Slicer.\n",
      "4. Conclusion: This section summarizes the findings of the systematic review and discusses future developments in the field.\n",
      "Human: Write the \"Medical Imaging\" section of this systematic review based on the following list of snippets or abstracts of relative papers.\n",
      "\n",
      "-----\n",
      "Title: Samm (segment any medical model): A 3d slicer integration to sam\n",
      "citation_id: 5\n",
      "Text: The Segment Anything Model (SAM) is a new image segmentation tool trained with the largest segmentation dataset at this time. The model has demonstrated that it can create high-quality masks for image segmentation with good promptability and generalizability. However, the performance of the model on medical images requires further validation. To assist with the development, assessment, and utilization of SAM on medical images, we introduce Segment Any Medical Model (SAMM), an extension of SAM on 3D Slicer, a widely-used open-source image processing and visualization software that has been extensively used in the medical imaging community. This open-source extension to 3D Slicer and its demonstrations are posted on GitHub (https://github.com/bingogome/samm). SAMM achieves 0.6-second latency of a complete cycle and can infer image masks in nearly real-time.\n",
      "\n",
      "\n",
      "Title: Samm (segment any medical model): A 3d slicer integration to sam\n",
      "citation_id: 5\n",
      "Text: A deep learning based medical image segmentation technique in internet-of-medical-things domain.  Future Generation Computer Systems, 108: 135–144, 2020.    Steve Pieper, Michael Halle, and Ron Kikinis.  3d slicer.  In 2004 2nd IEEE international symposium on biomedical  imaging: nano to macro (IEEE Cat No. 04EX821), pages 632–635.  IEEE, 2004.    3\n",
      "\n",
      "\n",
      "Title: Samm (segment any medical model): A 3d slicer integration to sam\n",
      "citation_id: 5\n",
      "Text: Segmentation allows the separation of different structures and tissues in medical images, which are then used to reconstruct multi-dimensional models that represent the anatomy of a patient [Sinha and Dolz, 2020].  However, the existing AI-based segmentation methods may not bridge the domain gap among different imaging modalities e.g. computed tomography (CT), magnetic resonance imaging (MRI), and ultrasound (UR)  [Wang et al., 2020].  This deﬁciency adversely impacts the generalization of segmentation methods across different imaging modalities.  Each imaging modality offers a distinct advantage in visualizing speciﬁc anatomical structures and and their abnormalities.\n",
      "\n",
      "\n",
      "Title: Samm (segment any medical model): A 3d slicer integration to sam\n",
      "citation_id: 5\n",
      "Text: This advantage, however, introduces speciﬁc challenges to medical image processing and analysis.  The challenges are further elevated by remarkable anatomical differences that may exist among people.    ∗Equal Contribution              \f",
      "SAMM (Segment Any Medical Model):  A 3D Slicer Integration to SAM  Figure 1: Overall architecture and examples of the integration.    Figure 2: The event plot of a 16-second period from the initialization.    The existing challenges imply the need for a universal tool for medical image segmentation that can handle all modalities and anatomies.\n",
      "\n",
      "\n",
      "Title: Samm (segment any medical model): A 3d slicer integration to sam\n",
      "citation_id: 5\n",
      "Text: In this study, we successfully integrate 3D Slicer with SAM, which enables researchers to conduct segmentation on medical images using the state-of-art LLM.  We have validated the capability of segmenting images in nearly real-time with a latency of 0.6 seconds.  In addition, our integration enables segmentation through prompt, and the prompts can be automatically applied to the next slice when a slice’s segmentation is complete, further streamlining the process.  This functionality could be particularly useful in large-scale studies where time is critical.  Further research is needed to optimize the latency and prompt generation.  We call for the medical imaging community to explore the potential of our integration and provide feedback on its effectiveness.\n",
      "\n",
      "\n",
      "Title: Samm (segment any medical model): A 3d slicer integration to sam\n",
      "citation_id: 5\n",
      "Text: To assist with the development, assessment, and utilization of SAM on medical images, we introduce Segment Any Medical Model (SAMM), an ex- tension of SAM on 3D Slicer, a widely-used open-source image processing and visualization software that has been extensively used in the medical imaging community.  This open-source extension to 3D Slicer and its demonstrations are posted on GitHub (https://github.com/bingogome/samm).  SAMM achieves 0.6-second latency of a complete cycle and can infer image masks in nearly real-time.    Keywords SAM · Medical Image · 3D Slicer  1  Introduction\n",
      "\n",
      "\n",
      "Title: Samm (segment any medical model): A 3d slicer integration to sam\n",
      "citation_id: 5\n",
      "Text: Additionally, with SAM’s support for text prompts [Kirillov et al., 2023], the potential of utilizing text input directly in the Slicer software for segmentation is also a future prospect.    Our integration clears the path for validating SAM on medical images with 3D Slicer, an open-source software with abundant medical image analysis tools.  By combining AI-based medical image models with the 3D Slicer software, SAMM provides a paradigmatic approach that enables users to directly enhance their research and work through the use of AI tools.  Our results indicate that SAMM has a small latency as a semi-automatic segmentation method.\n",
      "\n",
      "\n",
      "Title: Samm (segment any medical model): A 3d slicer integration to sam\n",
      "citation_id: 5\n",
      "Text: Despite the extensive usage of AI-based methodologies in medical image analysis, the utilization of LLM within this ﬁeld remains a largely unexplored area of research.  This paper aims to provide a uniﬁed framework that incorporates 3D Slicer  [Pieper et al., 2004] and SAM to perform medical image segmentation.    2 Methodology  2.1 Overall Architecture  Figure 1 presents the overall architecture of SAMM, which consists of a SAM Server containing a pre-trained LLM and an interactive prompt plugin (IPP) for 3D Slicer.\n",
      "\n",
      "\n",
      "Title: Samm (segment any medical model): A 3d slicer integration to sam\n",
      "citation_id: 5\n",
      "Text: This integration enables researchers to take advantage of SAM’s cutting-edge segmentation capabilities within the familiar 3D Slicer platform, expanding the range of medical imaging tasks that can be accomplished using this powerful software tool.  The use of ZMQ and Numpy memory mapping also provides the ﬂexibility to customize the communication protocol to ﬁt the speciﬁc needs of the user, further enhancing the utility and versatility of the 3D Slicer platform.\n",
      "-----\n",
      "\n",
      "This systematic review should adhere to the following overview:\n",
      "\n",
      "Title: A Systematic Review of Large Language Model Agent and Tool Integration\n",
      "Points:\n",
      "  - Large Language Models (LLMs) are being integrated into various systems\n",
      "  - LLMs lack access to external knowledge sources, limiting their usefulness in scientific applications\n",
      "  - LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications\n",
      "  - The systematic review will analyze the current state of LLM agent and tool integration and discuss future developments\n",
      "Overview: This systematic review analyzes the current state of Large Language Model (LLM) agent and tool integration. LLMs are being integrated into various systems, but they lack access to external knowledge sources, limiting their usefulness in scientific applications. LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications. The review will discuss the historical background of this field, the motivation for this field and the problem it is trying to solve, and future developments in this field.\n",
      "\n",
      "Write the \"Medical Imaging: This subsection discusses the development of an image segmentation tool trained with the largest segmentation dataset and its extension on 3D Slicer.\" section with respect to this overview. Write the text in markdown format. The title of this section should bu suffixed with 3 level markdown title (`###`). The text of the section should be based on a snippet or abstact and should be clearly cited. The citation should be written at the end of the sentence in the form `[^<ID>]` where `ID` refers to the citation_id.\u001b[0m\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:metaanalyser.chains.base:LLM utilization: {'token_usage': {'prompt_tokens': 2167, 'completion_tokens': 258, 'total_tokens': 2425}, 'model_name': 'gpt-3.5-turbo'}\n",
      "INFO:metaanalyser.chains.sr:Writing sections: [12 / 12]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n",
      "\n",
      "\n",
      "\u001b[1m> Entering new SRSectionChain chain...\u001b[0m\n",
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3mSystem: You are a research scientist and intereseted in Computation and Language, Artificial Intelligence and Machine Learning. You are working on writing a systematic review regarding \"llm agent OR llm tool integration\".\n",
      "\n",
      "The outline of the systematic review is as follows:\n",
      "\n",
      "-----\n",
      "Title: A Systematic Review of Large Language Model Agent and Tool Integration\n",
      "1. Introduction: Large Language Models (LLMs) are being integrated into various systems. This section provides an overview of the field and the problem it is trying to solve.\n",
      "    1. Background: This subsection provides a brief history of LLMs and their development.\n",
      "    2. Motivation: This subsection discusses the motivation for integrating LLMs into various systems.\n",
      "    3. Problem Statement: This subsection defines the problem that LLM agent and tool integration is trying to solve.\n",
      "2. LLM Agent Integration: This section discusses the integration of LLMs into various systems as agents.\n",
      "    1. Application-Integrated LLMs: This subsection discusses the threat landscape of Application-Integrated LLMs and the new attack vectors they introduce.\n",
      "    2. Chemistry Tools: This subsection discusses the integration of chemistry tools with LLMs to augment their performance in chemistry-related problems.\n",
      "    3. Climate Resources: This subsection discusses the integration of climate resources with LLMs to overcome the limitations associated with imprecise language and deliver more reliable and accurate information in the critical domain of climate change.\n",
      "3. LLM Tool Integration: This section discusses the integration of LLMs into various systems as tools.\n",
      "    1. Labor Market Impact: This subsection investigates the potential implications of LLMs on the U.S. labor market and assesses occupations based on their alignment with LLM capabilities.\n",
      "    2. Medical Imaging: This subsection discusses the development of an image segmentation tool trained with the largest segmentation dataset and its extension on 3D Slicer.\n",
      "4. Conclusion: This section summarizes the findings of the systematic review and discusses future developments in the field.\n",
      "Human: Write the \"Conclusion\" section of this systematic review based on the following list of snippets or abstracts of relative papers.\n",
      "\n",
      "-----\n",
      "Title: More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models\n",
      "citation_id: 1\n",
      "Text: We are currently witnessing dramatic advances in the capabilities of Large Language Models (LLMs). They are already being adopted in practice and integrated into many systems, including integrated development environments (IDEs) and search engines. The functionalities of current LLMs can be modulated via natural language prompts, while their exact internal functionality remains implicit and unassessable. This property, which makes them adaptable to even unseen tasks, might also make them susceptible to targeted adversarial prompting. Recently, several ways to misalign LLMs using Prompt Injection (PI) attacks have been introduced. In such attacks, an adversary can prompt the LLM to produce malicious content or override the original instructions and the employed filtering schemes. Recent work showed that these attacks are hard to mitigate, as state-of-the-art LLMs are instruction-following. So far, these attacks assumed that the adversary is directly prompting the LLM.   In this work, we show that augmenting LLMs with retrieval and API calling capabilities (so-called Application-Integrated LLMs) induces a whole new set of attack vectors. These LLMs might process poisoned content retrieved from the Web that contains malicious prompts pre-injected and selected by adversaries. We demonstrate that an attacker can indirectly perform such PI attacks. Based on this key insight, we systematically analyze the resulting threat landscape of Application-Integrated LLMs and discuss a variety of new attack vectors. To demonstrate the practical viability of our attacks, we implemented specific demonstrations of the proposed attacks within synthetic applications. In summary, our work calls for an urgent evaluation of current mitigation techniques and an investigation of whether new techniques are needed to defend LLMs against these threats.\n",
      "\n",
      "\n",
      "Title: ChemCrow: Augmenting large-language models with chemistry tools\n",
      "citation_id: 2\n",
      "Text: Large-language models (LLMs) have recently shown strong performance in tasks across domains, but struggle with chemistry-related problems. Moreover, these models lack access to external knowledge sources, limiting their usefulness in scientific applications. In this study, we introduce ChemCrow, an LLM chemistry agent designed to accomplish tasks across organic synthesis, drug discovery, and materials design. By integrating 13 expert-designed tools, ChemCrow augments the LLM performance in chemistry, and new capabilities emerge. Our evaluation, including both LLM and expert human assessments, demonstrates ChemCrow's effectiveness in automating a diverse set of chemical tasks. Surprisingly, we find that GPT-4 as an evaluator cannot distinguish between clearly wrong GPT-4 completions and GPT-4 + ChemCrow performance. There is a significant risk of misuse of tools like ChemCrow and we discuss their potential harms. Employed responsibly, ChemCrow not only aids expert chemists and lowers barriers for non-experts, but also fosters scientific advancement by bridging the gap between experimental and computational chemistry.\n",
      "\n",
      "\n",
      "Title: Enhancing Large Language Models with Climate Resources\n",
      "citation_id: 3\n",
      "Text: Large language models (LLMs) have significantly transformed the landscape of artificial intelligence by demonstrating their ability in generating human-like text across diverse topics. However, despite their impressive capabilities, LLMs lack recent information and often employ imprecise language, which can be detrimental in domains where accuracy is crucial, such as climate change. In this study, we make use of recent ideas to harness the potential of LLMs by viewing them as agents that access multiple sources, including databases containing recent and precise information about organizations, institutions, and companies. We demonstrate the effectiveness of our method through a prototype agent that retrieves emission data from ClimateWatch (https://www.climatewatchdata.org/) and leverages general Google search. By integrating these resources with LLMs, our approach overcomes the limitations associated with imprecise language and delivers more reliable and accurate information in the critical domain of climate change. This work paves the way for future advancements in LLMs and their application in domains where precision is of paramount importance.\n",
      "\n",
      "\n",
      "Title: Gpts are gpts: An early look at the labor market impact potential of large language models\n",
      "citation_id: 4\n",
      "Text: We investigate the potential implications of large language models (LLMs), such as Generative Pre-trained Transformers (GPTs), on the U.S. labor market, focusing on the increased capabilities arising from LLM-powered software compared to LLMs on their own. Using a new rubric, we assess occupations based on their alignment with LLM capabilities, integrating both human expertise and GPT-4 classifications. Our findings reveal that around 80% of the U.S. workforce could have at least 10% of their work tasks affected by the introduction of LLMs, while approximately 19% of workers may see at least 50% of their tasks impacted. We do not make predictions about the development or adoption timeline of such LLMs. The projected effects span all wage levels, with higher-income jobs potentially facing greater exposure to LLM capabilities and LLM-powered software. Significantly, these impacts are not restricted to industries with higher recent productivity growth. Our analysis suggests that, with access to an LLM, about 15% of all worker tasks in the US could be completed significantly faster at the same level of quality. When incorporating software and tooling built on top of LLMs, this share increases to between 47 and 56% of all tasks. This finding implies that LLM-powered software will have a substantial effect on scaling the economic impacts of the underlying models. We conclude that LLMs such as GPTs exhibit traits of general-purpose technologies, indicating that they could have considerable economic, social, and policy implications.\n",
      "\n",
      "\n",
      "Title: Samm (segment any medical model): A 3d slicer integration to sam\n",
      "citation_id: 5\n",
      "Text: The Segment Anything Model (SAM) is a new image segmentation tool trained with the largest segmentation dataset at this time. The model has demonstrated that it can create high-quality masks for image segmentation with good promptability and generalizability. However, the performance of the model on medical images requires further validation. To assist with the development, assessment, and utilization of SAM on medical images, we introduce Segment Any Medical Model (SAMM), an extension of SAM on 3D Slicer, a widely-used open-source image processing and visualization software that has been extensively used in the medical imaging community. This open-source extension to 3D Slicer and its demonstrations are posted on GitHub (https://github.com/bingogome/samm). SAMM achieves 0.6-second latency of a complete cycle and can infer image masks in nearly real-time.\n",
      "-----\n",
      "\n",
      "This systematic review should adhere to the following overview:\n",
      "\n",
      "Title: A Systematic Review of Large Language Model Agent and Tool Integration\n",
      "Points:\n",
      "  - Large Language Models (LLMs) are being integrated into various systems\n",
      "  - LLMs lack access to external knowledge sources, limiting their usefulness in scientific applications\n",
      "  - LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications\n",
      "  - The systematic review will analyze the current state of LLM agent and tool integration and discuss future developments\n",
      "Overview: This systematic review analyzes the current state of Large Language Model (LLM) agent and tool integration. LLMs are being integrated into various systems, but they lack access to external knowledge sources, limiting their usefulness in scientific applications. LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications. The review will discuss the historical background of this field, the motivation for this field and the problem it is trying to solve, and future developments in this field.\n",
      "\n",
      "Write the \"Conclusion: This section summarizes the findings of the systematic review and discusses future developments in the field.\" section with respect to this overview. Write the text in markdown format. The title of this section should bu suffixed with 2 level markdown title (`##`). The text of the section should be based on a snippet or abstact and should be clearly cited. The citation should be written at the end of the sentence in the form `[^<ID>]` where `ID` refers to the citation_id.\u001b[0m\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:metaanalyser.chains.base:LLM utilization: {'token_usage': {'prompt_tokens': 2124, 'completion_tokens': 210, 'total_tokens': 2334}, 'model_name': 'gpt-3.5-turbo'}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n",
      "CPU times: user 15.1 s, sys: 805 ms, total: 15.9 s\n",
      "Wall time: 4min 37s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "from metaanalyser.chains import SRChain\n",
    "\n",
    "query = \"llm agent OR llm tool integration\"\n",
    "chain = SRChain(llm=llm, verbose=True)\n",
    "\n",
    "result = chain.run({\"query\": query})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "3e8147ca-2d1f-4d9b-8a8c-d62e2b850d19",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "# A Systematic Review of Large Language Model Agent and Tool Integration\n",
       "\n",
       "This systematic review analyzes the current state of Large Language Model (LLM) agent and tool integration. LLMs are being integrated into various systems, but they lack access to external knowledge sources, limiting their usefulness in scientific applications. LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications. The review will discuss the historical background of this field, the motivation for this field and the problem it is trying to solve, and future developments in this field.\n",
       "\n",
       "## Table of contents\n",
       "\n",
       "1. Introduction: Large Language Models (LLMs) are being integrated into various systems. This section provides an overview of the field and the problem it is trying to solve.\n",
       "    1. Background: This subsection provides a brief history of LLMs and their development.\n",
       "    2. Motivation: This subsection discusses the motivation for integrating LLMs into various systems.\n",
       "    3. Problem Statement: This subsection defines the problem that LLM agent and tool integration is trying to solve.\n",
       "2. LLM Agent Integration: This section discusses the integration of LLMs into various systems as agents.\n",
       "    1. Application-Integrated LLMs: This subsection discusses the threat landscape of Application-Integrated LLMs and the new attack vectors they introduce.\n",
       "    2. Chemistry Tools: This subsection discusses the integration of chemistry tools with LLMs to augment their performance in chemistry-related problems.\n",
       "    3. Climate Resources: This subsection discusses the integration of climate resources with LLMs to overcome the limitations associated with imprecise language and deliver more reliable and accurate information in the critical domain of climate change.\n",
       "3. LLM Tool Integration: This section discusses the integration of LLMs into various systems as tools.\n",
       "    1. Labor Market Impact: This subsection investigates the potential implications of LLMs on the U.S. labor market and assesses occupations based on their alignment with LLM capabilities.\n",
       "    2. Medical Imaging: This subsection discusses the development of an image segmentation tool trained with the largest segmentation dataset and its extension on 3D Slicer.\n",
       "4. Conclusion: This section summarizes the findings of the systematic review and discusses future developments in the field.\n",
       "\n",
       "## Introduction: Large Language Models (LLMs) are being integrated into various systems. This section provides an overview of the field and the problem it is trying to solve.\n",
       "\n",
       "Large Language Models (LLMs) have recently shown strong performance in tasks across domains and are being adopted in practice and integrated into many systems, including integrated development environments (IDEs) and search engines [^1]. However, LLMs lack access to external knowledge sources, limiting their usefulness in scientific applications [^2]. LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications [^4]. The integration of LLMs into various systems as agents and tools is a growing field that aims to overcome these limitations and enhance the performance of LLMs in various domains. This systematic review analyzes the current state of LLM agent and tool integration, discusses the historical background of this field, the motivation for this field and the problem it is trying to solve, and future developments in this field.\n",
       "\n",
       "### Background: This subsection provides a brief history of LLMs and their development.\n",
       "\n",
       "Large Language Models (LLMs) have recently shown strong performance in various tasks across domains [^2]. LLMs are a type of artificial neural network that can process and generate natural language text. They have been developed over the past few decades, with early models such as the Hidden Markov Model and the n-gram model [^1]. However, the recent advancements in deep learning techniques have led to the development of more powerful LLMs, such as GPT-3 and T5 [^4]. These models have been trained on massive amounts of text data and can generate human-like text with high accuracy. LLMs have been integrated into various systems, including integrated development environments (IDEs), search engines, and content creation tools [^8][^1]. Despite their impressive capabilities, LLMs lack access to external knowledge sources, limiting their usefulness in scientific applications [^2]. The development of LLMs has significant economic, social, and policy implications, as they exhibit traits of general-purpose technologies [^4].\n",
       "\n",
       "### Motivation: This subsection discusses the motivation for integrating LLMs into various systems.\n",
       "\n",
       "Large Language Models (LLMs) have shown remarkable performance in various domains, including natural language processing, image recognition, and speech recognition. However, LLMs lack access to external knowledge sources, limiting their usefulness in scientific applications [^3]. To overcome this limitation, LLMs are being integrated into various systems, including chemistry tools [^2], climate resources [^3], and labor market impact analysis [^4]. The integration of LLMs into these systems aims to augment their performance and overcome the limitations associated with imprecise language, delivering more reliable and accurate information in critical domains such as climate change [^3]. Additionally, LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications [^4]. Therefore, the motivation for integrating LLMs into various systems is to enhance their performance and extend their applicability to scientific domains, while also exploring their potential economic, social, and policy implications.\n",
       "\n",
       "### Problem Statement: This subsection defines the problem that LLM agent and tool integration is trying to solve.\n",
       "\n",
       "Large Language Models (LLMs) are being integrated into various systems, but they lack access to external knowledge sources, limiting their usefulness in scientific applications [^3]. Moreover, LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications [^4]. The problem that LLM agent and tool integration is trying to solve is to enhance the performance of LLMs by integrating them with external knowledge sources and tools, enabling them to overcome their limitations and deliver more reliable and accurate information in various domains [^2][^3]. This systematic review aims to analyze the current state of LLM agent and tool integration and discuss future developments in this field.\n",
       "\n",
       "## LLM Agent Integration\n",
       "\n",
       "LLMs are being integrated into various systems as agents to perform complex tasks and make informed decisions. These agents can autonomously determine which actions to take, including utilizing various tools and observing their outputs or providing responses to user queries [^3]. By leveraging the LLM's vast knowledge and understanding of natural language, agents can efficiently navigate through an array of tools and select the most appropriate one based on the given context. This enables the LLM agent to provide reliable, accurate, and contextually relevant solutions in diverse applications and domains [^3]. \n",
       "\n",
       "However, the integration of LLMs as agents also introduces new attack vectors. For instance, Application-Integrated LLMs might process poisoned content retrieved from the Web that contains malicious prompts pre-injected and selected by adversaries [^1]. This calls for an urgent evaluation of current mitigation techniques and an investigation of whether new techniques are needed to defend LLMs against these threats [^1]. \n",
       "\n",
       "Moreover, LLM agents exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications [^3]. The versatility and adaptability of LLM agents make them an essential asset in various applications and domains, highlighting the immense potential for their future development and integration into increasingly complex and sophisticated AI systems [^3].\n",
       "\n",
       "### Application-Integrated LLMs\n",
       "\n",
       "LLMs are being integrated into various systems, including integrated development environments (IDEs) and search engines. However, LLMs lack access to external knowledge sources, limiting their usefulness in scientific applications [^1]. Recently, several ways to misalign LLMs using Prompt Injection (PI) attacks have been introduced. In such attacks, an adversary can prompt the LLM to produce malicious content or override the original instructions and the employed filtering schemes. These attacks assumed that the adversary is directly prompting the LLM. However, augmenting LLMs with retrieval and API calling capabilities (so-called Application-Integrated LLMs) induces a whole new set of attack vectors. These LLMs might process poisoned content retrieved from the Web that contains malicious prompts pre-injected and selected by adversaries. Adversaries can now attempt to indirectly inject the LLMs with prompts placed within publicly accessible sources, which might allow attackers to gain control of LLMs by crossing crucial security boundaries with a single search query. [^1]\n",
       "\n",
       "The resulting threat landscape of Application-Integrated LLMs needs to be systematically analyzed, and a variety of new attack vectors need to be discussed. The potential harm of these attacks calls for a more in-depth investigation of the generalizability of these attacks in practice. [^1]\n",
       "\n",
       "### Chemistry Tools\n",
       "\n",
       "LLMs have shown strong performance in tasks across domains, but they struggle with chemistry-related problems and lack access to external knowledge sources, limiting their usefulness in scientific applications [^2]. To overcome this limitation, ChemCrow, an LLM chemistry agent, has been introduced to accomplish tasks across organic synthesis, drug discovery, and materials design by integrating 13 expert-designed tools [^2]. ChemCrow augments the LLM performance in chemistry, and new capabilities emerge. The evaluation, including both LLM and expert human assessments, demonstrates ChemCrow's effectiveness in automating a diverse set of chemical tasks [^2]. The integration of expert-designed tools can help mitigate the hallucination issues commonly associated with these models, thus reducing the risk of inaccuracy [^2]. Employed responsibly, ChemCrow not only aids expert chemists and lowers barriers for non-experts but also fosters scientific advancement by bridging the gap between experimental and computational chemistry [^2].\n",
       "\n",
       "### Climate Resources\n",
       "\n",
       "LLMs lack recent information and often employ imprecise language, which can be detrimental in domains where accuracy is crucial, such as climate change. In this study, Kraus et al. [^3] make use of recent ideas to harness the potential of LLMs by viewing them as agents that access multiple sources, including databases containing recent and precise information about organizations, institutions, and companies. They demonstrate the effectiveness of their method through a prototype agent that retrieves emission data from ClimateWatch and leverages general Google search. By integrating these resources with LLMs, their approach overcomes the limitations associated with imprecise language and delivers more reliable and accurate information in the critical domain of climate change. This work paves the way for future advancements in LLMs and their application in domains where precision is of paramount importance.\n",
       "\n",
       "## LLM Tool Integration: This section discusses the integration of LLMs into various systems as tools.\n",
       "\n",
       "LLMs are being integrated into various systems as tools to enhance their performance in specific domains. One such domain is medical imaging, where LLMs have been used to develop an image segmentation tool trained with the largest segmentation dataset and its extension on 3D Slicer [^7]. Another domain where LLMs are being integrated as tools is chemistry. ChemCrow is an implementation that integrates external tools through LangChain, as LLMs have been shown to perform better with tools [^2]. The implementation uses a limited set of tools, but it can be easily expanded depending on needs and availability. The tools used can be classified into general tools, molecular tools, and chemical reaction tools. The general tools include web search, which provides the language model with the ability to access relevant information from the web. The molecular tools include tools for molecular visualization and molecular property prediction, while the chemical reaction tools include tools for reaction prediction and retrosynthesis planning. \n",
       "\n",
       "[^2]: Enhancing Large Language Models with Climate Resources\n",
       "[^7]: ART: Automatic multi-step reasoning and tool-use for large language models\n",
       "\n",
       "### Labor Market Impact\n",
       "\n",
       "LLMs, such as Generative Pre-trained Transformers (GPTs), have the potential to significantly affect a diverse range of occupations within the U.S. economy, demonstrating a key attribute of general-purpose technologies [^4]. This subsection investigates the potential implications of LLMs on the U.S. labor market and assesses occupations based on their alignment with LLM capabilities. Using a new rubric, around 80% of the U.S. workforce could have at least 10% of their work tasks affected by the introduction of LLMs, while approximately 19% of workers may see at least 50% of their tasks impacted [^4]. The projected effects span all wage levels, with higher-income jobs potentially facing greater exposure to LLM capabilities and LLM-powered software [^4]. Our analysis suggests that, with access to an LLM, about 15% of all worker tasks in the US could be completed significantly faster at the same level of quality. When incorporating software and tooling built on top of LLMs, this share increases to between 47 and 56% of all tasks [^4]. However, while the technical capacity for LLMs to make human labor more efficient appears evident, it is important to recognize that social, economic, regulatory, and other factors will influence actual labor productivity outcomes [^4]. The impact of LLMs on the economy will likely persist and increase, posing challenges for policymakers in predicting and regulating their trajectory [^4].\n",
       "\n",
       "### Medical Imaging: This subsection discusses the development of an image segmentation tool trained with the largest segmentation dataset and its extension on 3D Slicer.\n",
       "\n",
       "A new image segmentation tool called Segment Anything Model (SAM) has been developed and trained with the largest segmentation dataset at this time [^5]. SAM has demonstrated high-quality masks for image segmentation with good promptability and generalizability. However, the performance of the model on medical images requires further validation. To assist with the development, assessment, and utilization of SAM on medical images, an extension of SAM on 3D Slicer called Segment Any Medical Model (SAMM) has been introduced [^5]. SAMM achieves 0.6-second latency of a complete cycle and can infer image masks in nearly real-time. The integration of 3D Slicer with SAM enables researchers to conduct segmentation on medical images using the state-of-the-art LLM [^5]. This integration clears the path for validating SAM on medical images with 3D Slicer, an open-source software with abundant medical image analysis tools. By combining AI-based medical image models with the 3D Slicer software, SAMM provides a paradigmatic approach that enables users to directly enhance their research and work through the use of AI tools [^5].\n",
       "\n",
       "## Conclusion\n",
       "\n",
       "In conclusion, this systematic review has analyzed the current state of Large Language Model (LLM) agent and tool integration. LLMs are being integrated into various systems, but they lack access to external knowledge sources, limiting their usefulness in scientific applications [^2][^3]. LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications [^4]. The review has discussed the historical background of this field, the motivation for this field and the problem it is trying to solve, and future developments in this field. The review has identified several areas where LLMs are being integrated as agents or tools, including chemistry tools [^2], climate resources [^3], and medical imaging [^5]. The review has also highlighted the potential threats associated with Application-Integrated LLMs, including novel prompt injection threats [^1]. Future developments in this field should focus on addressing these threats and improving the integration of LLMs into various systems to enhance their performance and usefulness in scientific applications.\n",
       "\n",
       "## References\n",
       "[^1]: [Greshake, Kai, et al. \"More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models.\" arXiv preprint arXiv:2302.12173 (2023).](https://arxiv.org/abs/2302.12173)\n",
       "\n",
       "[^2]: [Bran, Andres M., et al. \"ChemCrow: Augmenting large-language models with chemistry tools.\" arXiv preprint arXiv:2304.05376 (2023).](https://arxiv.org/abs/2304.05376)\n",
       "\n",
       "[^3]: [Kraus, Mathias, et al. \"Enhancing Large Language Models with Climate Resources.\" arXiv preprint arXiv:2304.00116 (2023).](https://arxiv.org/abs/2304.00116)\n",
       "\n",
       "[^4]: [Eloundou, Tyna, et al. \"Gpts are gpts: An early look at the labor market impact potential of large language models.\" arXiv preprint arXiv:2303.10130 (2023).](https://arxiv.org/abs/2303.10130)\n",
       "\n",
       "[^5]: [Liu, Yihao, et al. \"Samm (segment any medical model): A 3d slicer integration to sam.\" arXiv preprint arXiv:2304.05622 (2023).](https://arxiv.org/abs/2304.05622)"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from IPython.display import display, Markdown\n",
    "\n",
    "display(Markdown(result))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "65b68b18-cf4a-4844-8cc7-be089c9fbef9",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "with open(f\"../examples/{query}.md\", \"wt\") as f:\n",
    "    f.write(result)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "405486bc-1404-4992-a1ee-6378ac50d843",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}