# A Systematic Review of Large Language Model Agent and Tool Integration This systematic review analyzes the current state of Large Language Model (LLM) agent and tool integration. LLMs are being integrated into various systems, but they lack access to external knowledge sources, limiting their usefulness in scientific applications. LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications. The review will discuss the historical background of this field, the motivation for this field and the problem it is trying to solve, and future developments in this field. ## Table of contents 1. Introduction: Large Language Models (LLMs) are being integrated into various systems. This section provides an overview of the field and the problem it is trying to solve. 1. Background: This subsection provides a brief history of LLMs and their development. 2. Motivation: This subsection discusses the motivation for integrating LLMs into various systems. 3. Problem Statement: This subsection defines the problem that LLM agent and tool integration is trying to solve. 2. LLM Agent Integration: This section discusses the integration of LLMs into various systems as agents. 1. Application-Integrated LLMs: This subsection discusses the threat landscape of Application-Integrated LLMs and the new attack vectors they introduce. 2. Chemistry Tools: This subsection discusses the integration of chemistry tools with LLMs to augment their performance in chemistry-related problems. 3. Climate Resources: This subsection discusses the integration of climate resources with LLMs to overcome the limitations associated with imprecise language and deliver more reliable and accurate information in the critical domain of climate change. 3. LLM Tool Integration: This section discusses the integration of LLMs into various systems as tools. 1. Labor Market Impact: This subsection investigates the potential implications of LLMs on the U.S. labor market and assesses occupations based on their alignment with LLM capabilities. 2. Medical Imaging: This subsection discusses the development of an image segmentation tool trained with the largest segmentation dataset and its extension on 3D Slicer. 4. Conclusion: This section summarizes the findings of the systematic review and discusses future developments in the field. ## Introduction: Large Language Models (LLMs) are being integrated into various systems. This section provides an overview of the field and the problem it is trying to solve. Large Language Models (LLMs) have recently shown strong performance in tasks across domains and are being adopted in practice and integrated into many systems, including integrated development environments (IDEs) and search engines [^1]. However, LLMs lack access to external knowledge sources, limiting their usefulness in scientific applications [^2]. LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications [^4]. The integration of LLMs into various systems as agents and tools is a growing field that aims to overcome these limitations and enhance the performance of LLMs in various domains. This systematic review analyzes the current state of LLM agent and tool integration, discusses the historical background of this field, the motivation for this field and the problem it is trying to solve, and future developments in this field. ### Background: This subsection provides a brief history of LLMs and their development. Large Language Models (LLMs) have recently shown strong performance in various tasks across domains [^2]. LLMs are a type of artificial neural network that can process and generate natural language text. They have been developed over the past few decades, with early models such as the Hidden Markov Model and the n-gram model [^1]. However, the recent advancements in deep learning techniques have led to the development of more powerful LLMs, such as GPT-3 and T5 [^4]. These models have been trained on massive amounts of text data and can generate human-like text with high accuracy. LLMs have been integrated into various systems, including integrated development environments (IDEs), search engines, and content creation tools [^8][^1]. Despite their impressive capabilities, LLMs lack access to external knowledge sources, limiting their usefulness in scientific applications [^2]. The development of LLMs has significant economic, social, and policy implications, as they exhibit traits of general-purpose technologies [^4]. ### Motivation: This subsection discusses the motivation for integrating LLMs into various systems. Large Language Models (LLMs) have shown remarkable performance in various domains, including natural language processing, image recognition, and speech recognition. However, LLMs lack access to external knowledge sources, limiting their usefulness in scientific applications [^3]. To overcome this limitation, LLMs are being integrated into various systems, including chemistry tools [^2], climate resources [^3], and labor market impact analysis [^4]. The integration of LLMs into these systems aims to augment their performance and overcome the limitations associated with imprecise language, delivering more reliable and accurate information in critical domains such as climate change [^3]. Additionally, LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications [^4]. Therefore, the motivation for integrating LLMs into various systems is to enhance their performance and extend their applicability to scientific domains, while also exploring their potential economic, social, and policy implications. ### Problem Statement: This subsection defines the problem that LLM agent and tool integration is trying to solve. Large Language Models (LLMs) are being integrated into various systems, but they lack access to external knowledge sources, limiting their usefulness in scientific applications [^3]. Moreover, LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications [^4]. The problem that LLM agent and tool integration is trying to solve is to enhance the performance of LLMs by integrating them with external knowledge sources and tools, enabling them to overcome their limitations and deliver more reliable and accurate information in various domains [^2][^3]. This systematic review aims to analyze the current state of LLM agent and tool integration and discuss future developments in this field. ## LLM Agent Integration LLMs are being integrated into various systems as agents to perform complex tasks and make informed decisions. These agents can autonomously determine which actions to take, including utilizing various tools and observing their outputs or providing responses to user queries [^3]. By leveraging the LLM's vast knowledge and understanding of natural language, agents can efficiently navigate through an array of tools and select the most appropriate one based on the given context. This enables the LLM agent to provide reliable, accurate, and contextually relevant solutions in diverse applications and domains [^3]. However, the integration of LLMs as agents also introduces new attack vectors. For instance, Application-Integrated LLMs might process poisoned content retrieved from the Web that contains malicious prompts pre-injected and selected by adversaries [^1]. This calls for an urgent evaluation of current mitigation techniques and an investigation of whether new techniques are needed to defend LLMs against these threats [^1]. Moreover, LLM agents exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications [^3]. The versatility and adaptability of LLM agents make them an essential asset in various applications and domains, highlighting the immense potential for their future development and integration into increasingly complex and sophisticated AI systems [^3]. ### Application-Integrated LLMs LLMs are being integrated into various systems, including integrated development environments (IDEs) and search engines. However, LLMs lack access to external knowledge sources, limiting their usefulness in scientific applications [^1]. Recently, several ways to misalign LLMs using Prompt Injection (PI) attacks have been introduced. In such attacks, an adversary can prompt the LLM to produce malicious content or override the original instructions and the employed filtering schemes. These attacks assumed that the adversary is directly prompting the LLM. However, augmenting LLMs with retrieval and API calling capabilities (so-called Application-Integrated LLMs) induces a whole new set of attack vectors. These LLMs might process poisoned content retrieved from the Web that contains malicious prompts pre-injected and selected by adversaries. Adversaries can now attempt to indirectly inject the LLMs with prompts placed within publicly accessible sources, which might allow attackers to gain control of LLMs by crossing crucial security boundaries with a single search query. [^1] The resulting threat landscape of Application-Integrated LLMs needs to be systematically analyzed, and a variety of new attack vectors need to be discussed. The potential harm of these attacks calls for a more in-depth investigation of the generalizability of these attacks in practice. [^1] ### Chemistry Tools LLMs have shown strong performance in tasks across domains, but they struggle with chemistry-related problems and lack access to external knowledge sources, limiting their usefulness in scientific applications [^2]. To overcome this limitation, ChemCrow, an LLM chemistry agent, has been introduced to accomplish tasks across organic synthesis, drug discovery, and materials design by integrating 13 expert-designed tools [^2]. ChemCrow augments the LLM performance in chemistry, and new capabilities emerge. The evaluation, including both LLM and expert human assessments, demonstrates ChemCrow's effectiveness in automating a diverse set of chemical tasks [^2]. The integration of expert-designed tools can help mitigate the hallucination issues commonly associated with these models, thus reducing the risk of inaccuracy [^2]. Employed responsibly, ChemCrow not only aids expert chemists and lowers barriers for non-experts but also fosters scientific advancement by bridging the gap between experimental and computational chemistry [^2]. ### Climate Resources LLMs lack recent information and often employ imprecise language, which can be detrimental in domains where accuracy is crucial, such as climate change. In this study, Kraus et al. [^3] make use of recent ideas to harness the potential of LLMs by viewing them as agents that access multiple sources, including databases containing recent and precise information about organizations, institutions, and companies. They demonstrate the effectiveness of their method through a prototype agent that retrieves emission data from ClimateWatch and leverages general Google search. By integrating these resources with LLMs, their approach overcomes the limitations associated with imprecise language and delivers more reliable and accurate information in the critical domain of climate change. This work paves the way for future advancements in LLMs and their application in domains where precision is of paramount importance. ## LLM Tool Integration: This section discusses the integration of LLMs into various systems as tools. LLMs are being integrated into various systems as tools to enhance their performance in specific domains. One such domain is medical imaging, where LLMs have been used to develop an image segmentation tool trained with the largest segmentation dataset and its extension on 3D Slicer [^7]. Another domain where LLMs are being integrated as tools is chemistry. ChemCrow is an implementation that integrates external tools through LangChain, as LLMs have been shown to perform better with tools [^2]. The implementation uses a limited set of tools, but it can be easily expanded depending on needs and availability. The tools used can be classified into general tools, molecular tools, and chemical reaction tools. The general tools include web search, which provides the language model with the ability to access relevant information from the web. The molecular tools include tools for molecular visualization and molecular property prediction, while the chemical reaction tools include tools for reaction prediction and retrosynthesis planning. [^2]: Enhancing Large Language Models with Climate Resources [^7]: ART: Automatic multi-step reasoning and tool-use for large language models ### Labor Market Impact LLMs, such as Generative Pre-trained Transformers (GPTs), have the potential to significantly affect a diverse range of occupations within the U.S. economy, demonstrating a key attribute of general-purpose technologies [^4]. This subsection investigates the potential implications of LLMs on the U.S. labor market and assesses occupations based on their alignment with LLM capabilities. Using a new rubric, around 80% of the U.S. workforce could have at least 10% of their work tasks affected by the introduction of LLMs, while approximately 19% of workers may see at least 50% of their tasks impacted [^4]. The projected effects span all wage levels, with higher-income jobs potentially facing greater exposure to LLM capabilities and LLM-powered software [^4]. Our analysis suggests that, with access to an LLM, about 15% of all worker tasks in the US could be completed significantly faster at the same level of quality. When incorporating software and tooling built on top of LLMs, this share increases to between 47 and 56% of all tasks [^4]. However, while the technical capacity for LLMs to make human labor more efficient appears evident, it is important to recognize that social, economic, regulatory, and other factors will influence actual labor productivity outcomes [^4]. The impact of LLMs on the economy will likely persist and increase, posing challenges for policymakers in predicting and regulating their trajectory [^4]. ### Medical Imaging: This subsection discusses the development of an image segmentation tool trained with the largest segmentation dataset and its extension on 3D Slicer. A new image segmentation tool called Segment Anything Model (SAM) has been developed and trained with the largest segmentation dataset at this time [^5]. SAM has demonstrated high-quality masks for image segmentation with good promptability and generalizability. However, the performance of the model on medical images requires further validation. To assist with the development, assessment, and utilization of SAM on medical images, an extension of SAM on 3D Slicer called Segment Any Medical Model (SAMM) has been introduced [^5]. SAMM achieves 0.6-second latency of a complete cycle and can infer image masks in nearly real-time. The integration of 3D Slicer with SAM enables researchers to conduct segmentation on medical images using the state-of-the-art LLM [^5]. This integration clears the path for validating SAM on medical images with 3D Slicer, an open-source software with abundant medical image analysis tools. By combining AI-based medical image models with the 3D Slicer software, SAMM provides a paradigmatic approach that enables users to directly enhance their research and work through the use of AI tools [^5]. ## Conclusion In conclusion, this systematic review has analyzed the current state of Large Language Model (LLM) agent and tool integration. LLMs are being integrated into various systems, but they lack access to external knowledge sources, limiting their usefulness in scientific applications [^2][^3]. LLMs exhibit traits of general-purpose technologies, indicating considerable economic, social, and policy implications [^4]. The review has discussed the historical background of this field, the motivation for this field and the problem it is trying to solve, and future developments in this field. The review has identified several areas where LLMs are being integrated as agents or tools, including chemistry tools [^2], climate resources [^3], and medical imaging [^5]. The review has also highlighted the potential threats associated with Application-Integrated LLMs, including novel prompt injection threats [^1]. Future developments in this field should focus on addressing these threats and improving the integration of LLMs into various systems to enhance their performance and usefulness in scientific applications. ## References [^1]: [Greshake, Kai, et al. "More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models." arXiv preprint arXiv:2302.12173 (2023).](https://arxiv.org/abs/2302.12173) [^2]: [Bran, Andres M., et al. "ChemCrow: Augmenting large-language models with chemistry tools." arXiv preprint arXiv:2304.05376 (2023).](https://arxiv.org/abs/2304.05376) [^3]: [Kraus, Mathias, et al. "Enhancing Large Language Models with Climate Resources." arXiv preprint arXiv:2304.00116 (2023).](https://arxiv.org/abs/2304.00116) [^4]: [Eloundou, Tyna, et al. "Gpts are gpts: An early look at the labor market impact potential of large language models." arXiv preprint arXiv:2303.10130 (2023).](https://arxiv.org/abs/2303.10130) [^5]: [Liu, Yihao, et al. "Samm (segment any medical model): A 3d slicer integration to sam." arXiv preprint arXiv:2304.05622 (2023).](https://arxiv.org/abs/2304.05622)