Spaces:

camillevanhoffelen
/

langchain-HuggingGPT

Runtime error

App Files Files Community

camille-vanhoffelen commited on May 22, 2023

Commit

b3d3593

1 Parent(s): 1ce354a

First working gradio app for langchain-HuggingGPT

Browse files

Files changed (34) hide show

.gitignore +181 -0
LICENSE +21 -0
app.py +222 -0
hugginggpt/__init__.py +4 -0
hugginggpt/exceptions.py +55 -0
hugginggpt/history.py +29 -0
hugginggpt/huggingface_api.py +13 -0
hugginggpt/llm_factory.py +82 -0
hugginggpt/log.py +10 -0
hugginggpt/model_inference.py +410 -0
hugginggpt/model_scraper.py +90 -0
hugginggpt/model_selection.py +97 -0
hugginggpt/resources.py +104 -0
hugginggpt/response_generation.py +43 -0
hugginggpt/task_parsing.py +149 -0
hugginggpt/task_planning.py +61 -0
logging-config.toml +26 -0
logs/.gitkeep +0 -0
main.py +138 -0
output/.gitkeep +0 -0
output/audios/.gitkeep +0 -0
output/images/.gitkeep +0 -0
output/videos/.gitkeep +0 -0
pdm.lock +0 -0
pyproject.toml +50 -0
requirements.txt +0 -0
resources/banner.txt +4 -0
resources/huggingface-models-metadata.jsonl +0 -0
resources/prompt-templates/model-selection-prompt.json +9 -0
resources/prompt-templates/openai-model-inference-prompt.json +9 -0
resources/prompt-templates/response-generation-prompt.json +8 -0
resources/prompt-templates/task-planning-example-prompt.json +8 -0
resources/prompt-templates/task-planning-examples.json +42 -0
resources/prompt-templates/task-planning-few-shot-prompt.json +11 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,181 @@

+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+# C extensions
+*.so
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+# Translations
+*.mo
+*.pot
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+# Flask stuff:
+instance/
+.webassets-cache
+# Scrapy stuff:
+.scrapy
+# Sphinx documentation
+docs/_build/
+# PyBuilder
+.pybuilder/
+target/
+# Jupyter Notebook
+.ipynb_checkpoints
+# IPython
+profile_default/
+ipython_config.py
+# pyenv
+#   For a library or package, you might want to ignore these files since the code is
+#   intended to run in multiple environments; otherwise, check them in:
+# .python-version
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+# poetry
+#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
+#   This is especially recommended for binary packages to ensure reproducibility, and is more
+#   commonly ignored for libraries.
+#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
+#poetry.lock
+# pdm
+#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
+#pdm.lock
+#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
+#   in version control.
+#   https://pdm.fming.dev/#use-with-ide
+.pdm.toml
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
+__pypackages__/
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+# SageMath parsed files
+*.sage.py
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+# Spyder project settings
+.spyderproject
+.spyproject
+# Rope project settings
+.ropeproject
+# mkdocs documentation
+/site
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+# Pyre type checker
+.pyre/
+# pytype static type analyzer
+.pytype/
+# Cython debug symbols
+cython_debug/
+# PyCharm
+#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
+#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
+#  and can be added to the global gitignore or merged into this file.  For a more nuclear
+#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
+.idea/
+# Logs
+*.log
+*.log*
+# Outputs
+output/images/*
+!output/images/.gitkeep
+output/videos/*
+!output/videos/.gitkeep
+output/audios/*
+!output/audios/.gitkeep
+# PDM
+.pdm-python
+# macos
+*.DS_Store
+# Examples
+!.env.example

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2023 Camille Van Hoffelen
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

app.py ADDED Viewed

	@@ -0,0 +1,222 @@

+import logging
+import os
+import re
+import gradio as gr
+from dotenv import load_dotenv
+from hugginggpt.history import ConversationHistory
+from hugginggpt.llm_factory import create_llms
+from hugginggpt.log import setup_logging
+from hugginggpt.resources import (
+    GENERATED_RESOURCES_DIR,
+    get_resource_url,
+    init_resource_dirs,
+    load_audio,
+    load_image,
+    save_audio,
+    save_image,
+)
+from main import compute
+load_dotenv()
+setup_logging()
+logger = logging.getLogger(__name__)
+init_resource_dirs()
+OPENAI_KEY = os.environ.get("OPENAI_API_KEY")
+HUGGINGFACE_TOKEN = os.environ.get("HUGGINGFACEHUB_API_TOKEN")
+class Client:
+    def __init__(self) -> None:
+        self.llms = None
+        self.llm_history = ConversationHistory()
+        self.last_user_input = ""
+    @property
+    def is_init(self) -> bool:
+        return (
+            os.environ.get("OPENAI_API_KEY")
+            and os.environ.get("OPENAI_API_KEY").startswith("sk-")
+            and os.environ.get("HUGGINGFACEHUB_API_TOKEN")
+            and os.environ.get("HUGGINGFACEHUB_API_TOKEN").startswith("hf_")
+        )
+    def add_text(self, user_input, messages):
+        if not self.is_init:
+            return (
+                "Please set your OpenAI API key and Hugging Face token first!!!",
+                messages,
+            )
+        if not self.llms:
+            self.llms = create_llms()
+        messages = display_message(
+            role="user", message=user_input, messages=messages, save_media=True
+        )
+        self.last_user_input = user_input
+        return "", messages
+    def bot(self, messages):
+        if not self.is_init:
+            return {}, messages
+        user_input = self.last_user_input
+        response, task_summaries = compute(
+            user_input=user_input,
+            history=self.llm_history,
+            llms=self.llms,
+        )
+        messages = display_message(
+            role="assistant", message=response, messages=messages, save_media=False
+        )
+        self.llm_history.add(role="user", content=user_input)
+        self.llm_history.add(role="assistant", content="")
+        return task_summaries, messages
+css = ".json {height: 527px; overflow: scroll;} .json-holder {height: 527px; overflow: scroll;}"
+with gr.Blocks(css=css) as demo:
+    gr.Markdown("<h1><center>langchain-HuggingGPT</center></h1>")
+    gr.Markdown(
+        "<p align='center'><img src='https://i.ibb.co/qNH3Jym/logo.png' height='25' width='95'></p>"
+    )
+    gr.Markdown(
+        "<p align='center' style='font-size: 20px;'>A lightweight implementation of <a href='https://arxiv.org/abs/2303.17580'>HuggingGPT</a> with <a href='https://docs.langchain.com/docs/'>langchain</a>. No local inference, only models available on the <a href='https://huggingface.co/inference-api'>Hugging Face Inference API</a> are used.</p>"
+    )
+    gr.HTML(
+        """<center><a href="https://huggingface.co/spaces/camillevanhoffelen/langchain-HuggingGPT?duplicate=true"><img src="https://bit.ly/3gLdBN6" alt="Duplicate Space"></a>Duplicate the Space and run securely with your OpenAI API Key and Hugging Face Token</center>"""
+    )
+    if not OPENAI_KEY:
+        with gr.Row().style():
+            with gr.Column(scale=0.85):
+                openai_api_key = gr.Textbox(
+                    show_label=False,
+                    placeholder="Set your OpenAI API key here and press Enter",
+                    lines=1,
+                    type="password",
+                ).style(container=False)
+            with gr.Column(scale=0.15, min_width=0):
+                btn1 = gr.Button("Submit").style(full_height=True)
+    if not HUGGINGFACE_TOKEN:
+        with gr.Row().style():
+            with gr.Column(scale=0.85):
+                hugging_face_token = gr.Textbox(
+                    show_label=False,
+                    placeholder="Set your Hugging Face Token here and press Enter",
+                    lines=1,
+                    type="password",
+                ).style(container=False)
+            with gr.Column(scale=0.15, min_width=0):
+                btn3 = gr.Button("Submit").style(full_height=True)
+    with gr.Row().style():
+        with gr.Column(scale=0.6):
+            chatbot = gr.Chatbot([], elem_id="chatbot").style(height=500)
+        with gr.Column(scale=0.4):
+            results = gr.JSON(elem_classes="json")
+    with gr.Row().style():
+        with gr.Column(scale=0.85):
+            txt = gr.Textbox(
+                show_label=False,
+                placeholder="Enter text and press enter. The url must contain the media type. e.g, https://example.com/example.jpg",
+                lines=1,
+            ).style(container=False)
+        with gr.Column(scale=0.15, min_width=0):
+            btn2 = gr.Button("Send").style(full_height=True)
+    def set_key(openai_api_key):
+        os.environ["OPENAI_API_KEY"] = openai_api_key
+        return openai_api_key
+    def set_token(hugging_face_token):
+        os.environ["HUGGINGFACEHUB_API_TOKEN"] = hugging_face_token
+        return hugging_face_token
+    def add_text(state, user_input, messages):
+        return state["client"].add_text(user_input, messages)
+    def bot(state, messages):
+        return state["client"].bot(messages)
+    if not OPENAI_KEY or not HUGGINGFACE_TOKEN:
+        openai_api_key.submit(set_key, [openai_api_key], [openai_api_key])
+        btn1.click(set_key, [openai_api_key], [openai_api_key])
+        hugging_face_token.submit(set_token, [hugging_face_token], [hugging_face_token])
+        btn3.click(set_token, [hugging_face_token], [hugging_face_token])
+    state = gr.State(value={"client": Client()})
+    txt.submit(add_text, [state, txt, chatbot], [txt, chatbot]).then(
+        bot, [state, chatbot], [results, chatbot]
+    )
+    btn2.click(add_text, [state, txt, chatbot], [txt, chatbot]).then(
+        bot, [state, chatbot], [results, chatbot]
+    )
+    gr.Examples(
+        examples=[
+            "Draw me a sheep",
+            "Write a poem about sheep, then read it to me",
+            "Transcribe the audio file found at /audios/499e.flac. Then tell me how similar the transcription is to the following sentence: Sheep are nice.",
+            "Show me a joke and an image of sheep",
+        ],
+        inputs=txt,
+    )
+def display_message(role: str, message: str, messages: list, save_media: bool):
+    # Text
+    messages.append(format_message(role=role, message=message))
+    # Media
+    image_urls, audio_urls = extract_medias(message)
+    for image_url in image_urls:
+        image_url = get_resource_url(image_url)
+        if save_media:
+            image = load_image(image_url)
+            image_url = save_image(image)
+            image_url = GENERATED_RESOURCES_DIR + image_url
+        messages.append(format_message(role=role, message=(image_url,)))
+    for audio_url in audio_urls:
+        audio_url = get_resource_url(audio_url)
+        if save_media:
+            audio = load_audio(audio_url)
+            audio_url = save_audio(audio)
+            audio_url = GENERATED_RESOURCES_DIR + audio_url
+        messages.append(format_message(role=role, message=(audio_url,)))
+    return messages
+def format_message(role, message):
+    if role == "user":
+        return message, None
+    if role == "assistant":
+        return None, message
+    else:
+        raise ValueError("role must be either user or assistant")
+def extract_medias(message: str):
+    image_pattern = re.compile(
+        r"(http(s?):|\/)?([\.\/_\w:-])*?\.(jpg|jpeg|tiff|gif|png)"
+    )
+    image_urls = []
+    for match in image_pattern.finditer(message):
+        if match.group(0) not in image_urls:
+            image_urls.append(match.group(0))
+    audio_pattern = re.compile(r"(http(s?):|\/)?([\.\/_\w:-])*?\.(flac|wav)")
+    audio_urls = []
+    for match in audio_pattern.finditer(message):
+        if match.group(0) not in audio_urls:
+            audio_urls.append(match.group(0))
+    return image_urls, audio_urls
+demo.launch()

hugginggpt/__init__.py ADDED Viewed

	@@ -0,0 +1,4 @@

+from .model_inference import infer
+from .model_selection import select_model
+from .response_generation import generate_response
+from .task_planning import plan_tasks

hugginggpt/exceptions.py ADDED Viewed

	@@ -0,0 +1,55 @@

+import functools
+def wrap_exceptions(exception_cls, message=None):
+    """Wrap exceptions raised by a function with a custom exception class."""
+    def decorated(f):
+        @functools.wraps(f)
+        def wrapped(*args, **kwargs):
+            try:
+                return f(*args, **kwargs)
+            except Exception as e:
+                raise exception_cls(message) from e
+        return wrapped
+    return decorated
+def async_wrap_exceptions(exception_cls, message=None):
+    """Wrap exceptions raised by an async function with a custom exception class."""
+    def decorated(f):
+        @functools.wraps(f)
+        async def wrapped(*args, **kwargs):
+            try:
+                return await f(*args, **kwargs)
+            except Exception as e:
+                raise exception_cls(message) from e
+        return wrapped
+    return decorated
+class TaskPlanningException(Exception):
+    pass
+class TaskParsingException(Exception):
+    pass
+class ModelScrapingException(Exception):
+    pass
+class ModelSelectionException(Exception):
+    pass
+class ModelInferenceException(Exception):
+    pass
+class ResponseGenerationException(Exception):
+    pass

hugginggpt/history.py ADDED Viewed

	@@ -0,0 +1,29 @@

+import json
+class ConversationHistory:
+    """Stores previous user and assistant messages. Used as additional context for task planning."""
+    def __init__(self):
+        self.history = []
+    def add(self, role: str, content: str):
+        self.history.append({"role": role, "content": content})
+    def __str__(self):
+        return json.dumps(self.history)
+    def __repr__(self):
+        return str(self)
+    def __len__(self):
+        return len(self.history)
+    def __getitem__(self, item):
+        return self.history[item]
+    def __setitem__(self, key, value):
+        self.history[key] = value
+    def __delitem__(self, key):
+        del self.history[key]

hugginggpt/huggingface_api.py ADDED Viewed

	@@ -0,0 +1,13 @@

+import os
+from dotenv import load_dotenv
+load_dotenv()
+HUGGINGFACE_INFERENCE_API_URL = "https://api-inference.huggingface.co/models/"
+HUGGINGFACE_INFERENCE_API_STATUS_URL = f"https://api-inference.huggingface.co/status/"
+def get_hf_headers():
+    HUGGINGFACEHUB_API_TOKEN = os.environ.get("HUGGINGFACEHUB_API_TOKEN")
+    return {"Authorization": f"Bearer {HUGGINGFACEHUB_API_TOKEN}"}

hugginggpt/llm_factory.py ADDED Viewed

	@@ -0,0 +1,82 @@

+import logging
+from collections import namedtuple
+import tiktoken
+from langchain import OpenAI
+LLM_NAME = "text-davinci-003"
+# Encoding for text-davinci-003
+ENCODING_NAME = "p50k_base"
+ENCODING = tiktoken.get_encoding(ENCODING_NAME)
+# Max input tokens for text-davinci-003
+LLM_MAX_TOKENS = 4096
+# As specified in huggingGPT paper
+TASK_PLANNING_LOGIT_BIAS = 0.1
+MODEL_SELECTION_LOGIT_BIAS = 5
+logger = logging.getLogger(__name__)
+LLMs = namedtuple(
+    "LLMs",
+    [
+        "task_planning_llm",
+        "model_selection_llm",
+        "model_inference_llm",
+        "response_generation_llm",
+        "output_fixing_llm",
+    ],
+)
+def create_llms() -> LLMs:
+    """Create various LLM agents according to the huggingGPT paper's specifications."""
+    logger.info(f"Creating {LLM_NAME} LLMs")
+    task_parsing_highlight_ids = get_token_ids_for_task_parsing()
+    choose_model_highlight_ids = get_token_ids_for_choose_model()
+    task_planning_llm = OpenAI(
+        model_name=LLM_NAME,
+        temperature=0,
+        logit_bias={
+            token_id: TASK_PLANNING_LOGIT_BIAS
+            for token_id in task_parsing_highlight_ids
+        },
+    )
+    model_selection_llm = OpenAI(
+        model_name=LLM_NAME,
+        temperature=0,
+        logit_bias={
+            token_id: MODEL_SELECTION_LOGIT_BIAS
+            for token_id in choose_model_highlight_ids
+        },
+    )
+    model_inference_llm = OpenAI(model_name=LLM_NAME, temperature=0)
+    response_generation_llm = OpenAI(model_name=LLM_NAME, temperature=0)
+    output_fixing_llm = OpenAI(model_name=LLM_NAME, temperature=0)
+    return LLMs(
+        task_planning_llm=task_planning_llm,
+        model_selection_llm=model_selection_llm,
+        model_inference_llm=model_inference_llm,
+        response_generation_llm=response_generation_llm,
+        output_fixing_llm=output_fixing_llm,
+    )
+def get_token_ids_for_task_parsing() -> list[int]:
+    text = """{"task": "text-classification",  "token-classification", "text2text-generation", "summarization", "translation",  "question-answering", "conversational", "text-generation", "sentence-similarity", "tabular-classification", "object-detection", "image-classification", "image-to-image", "image-to-text", "text-to-image", "visual-question-answering", "document-question-answering", "image-segmentation", "text-to-speech", "automatic-speech-recognition", "audio-to-audio", "audio-classification", "args", "text", "path", "dep", "id", "<GENERATED>-"}"""
+    res = ENCODING.encode(text)
+    res = list(set(res))
+    return res
+def get_token_ids_for_choose_model() -> list[int]:
+    text = """{"id": "reason"}"""
+    res = ENCODING.encode(text)
+    res = list(set(res))
+    return res
+def count_tokens(text: str) -> int:
+    return len(ENCODING.encode(text))

hugginggpt/log.py ADDED Viewed

	@@ -0,0 +1,10 @@

+import logging.config
+import tomllib
+LOGGING_CONFIG_FILE = "logging-config.toml"
+def setup_logging():
+    with open("logging-config.toml", "rb") as f:
+        config = tomllib.load(f)
+        logging.config.dictConfig(config)

hugginggpt/model_inference.py ADDED Viewed

	@@ -0,0 +1,410 @@

+import base64
+import json
+import logging
+import random
+from io import BytesIO
+from typing import Any
+import requests
+from PIL import Image, ImageDraw
+from langchain import LLMChain
+from langchain.llms.base import BaseLLM
+from langchain.prompts import load_prompt
+from pydantic import BaseModel, Json
+from hugginggpt.exceptions import ModelInferenceException, wrap_exceptions
+from hugginggpt.huggingface_api import (HUGGINGFACE_INFERENCE_API_URL, get_hf_headers)
+from hugginggpt.model_selection import Model
+from hugginggpt.resources import (
+    audio_from_bytes,
+    encode_audio,
+    encode_image,
+    get_prompt_resource,
+    get_resource_url,
+    image_from_bytes,
+    load_image,
+    save_audio,
+    save_image,
+)
+from hugginggpt.task_parsing import Task
+logger = logging.getLogger(__name__)
+@wrap_exceptions(ModelInferenceException, "Error during model inference")
+def infer(task: Task, model_id: str, llm: BaseLLM, session: requests.Session):
+    """Execute a task either with LLM or huggingface inference API."""
+    if model_id == "openai":
+        return infer_openai(task=task, llm=llm)
+    else:
+        return infer_huggingface(task=task, model_id=model_id, session=session)
+def infer_openai(task: Task, llm: BaseLLM):
+    logger.info("Starting OpenAI inference")
+    prompt_template = load_prompt(
+        get_prompt_resource("openai-model-inference-prompt.json")
+    )
+    llm_chain = LLMChain(prompt=prompt_template, llm=llm)
+    # Need to replace double quotes with single quotes for correct response generation
+    output = llm_chain.predict(
+        task=task.json(), task_name=task.task, args=task.args, stop=["<im_end>"]
+    )
+    result = {"generated text": output}
+    logger.debug(f"Inference result: {result}")
+    return result
+def infer_huggingface(task: Task, model_id: str, session: requests.Session):
+    logger.info("Starting huggingface inference")
+    url = HUGGINGFACE_INFERENCE_API_URL + model_id
+    huggingface_task = create_huggingface_task(task=task)
+    data = huggingface_task.inference_inputs
+    headers = get_hf_headers()
+    response = session.post(url, headers=headers, data=data)
+    response.raise_for_status()
+    result = huggingface_task.parse_response(response)
+    logger.debug(f"Inference result: {result}")
+    return result
+# NLP Tasks
+# deepset/roberta-base-squad2 was removed from huggingface_models-metadata.jsonl because it is currently broken
+# Example added to task-planning-examples.json compared to original paper
+class QuestionAnswering:
+    def __init__(self, task: Task):
+        self.task = task
+    @property
+    def inference_inputs(self):
+        data = {
+            "inputs": {
+                "question": self.task.args["question"],
+                "context": self.task.args["context"]
+                if "context" in self.task.args
+                else "",
+            }
+        }
+        return json.dumps(data)
+    def parse_response(self, response):
+        return response.json()
+# Example added to task-planning-examples.json compared to original paper
+class SentenceSimilarity:
+    def __init__(self, task: Task):
+        self.task = task
+    @property
+    def inference_inputs(self):
+        data = {
+            "inputs": {
+                "source_sentence": self.task.args["text1"],
+                "sentences": [self.task.args["text2"]],
+            }
+        }
+        # Using string to bypass requests' form encoding
+        return json.dumps(data)
+    def parse_response(self, response):
+        return response.json()
+# Example added to task-planning-examples.json compared to original paper
+class TextClassification:
+    def __init__(self, task: Task):
+        self.task = task
+    @property
+    def inference_inputs(self):
+        return self.task.args["text"]
+        # return {"inputs": self.task.args["text"]}
+    def parse_response(self, response):
+        return response.json()
+class TokenClassification:
+    def __init__(self, task: Task):
+        self.task = task
+    @property
+    def inference_inputs(self):
+        return self.task.args["text"]
+    def parse_response(self, response):
+        return response.json()
+# CV Tasks
+class VisualQuestionAnswering:
+    def __init__(self, task: Task):
+        self.task = task
+    @property
+    def inference_inputs(self):
+        img_data = encode_image(self.task.args["image"])
+        img_base64 = base64.b64encode(img_data).decode("utf-8")
+        data = {
+            "inputs": {
+                "question": self.task.args["text"],
+                "image": img_base64,
+            }
+        }
+        return json.dumps(data)
+    def parse_response(self, response):
+        return response.json()
+class DocumentQuestionAnswering:
+    def __init__(self, task: Task):
+        self.task = task
+    @property
+    def inference_inputs(self):
+        img_data = encode_image(self.task.args["image"])
+        img_base64 = base64.b64encode(img_data).decode("utf-8")
+        data = {
+            "inputs": {
+                "question": self.task.args["text"],
+                "image": img_base64,
+            }
+        }
+        return json.dumps(data)
+    def parse_response(self, response):
+        return response.json()
+class TextToImage:
+    def __init__(self, task: Task):
+        self.task = task
+    @property
+    def inference_inputs(self):
+        return self.task.args["text"]
+    def parse_response(self, response):
+        image = image_from_bytes(response.content)
+        path = save_image(image)
+        return {"generated image": path}
+class ImageSegmentation:
+    def __init__(self, task: Task):
+        self.task = task
+    @property
+    def inference_inputs(self):
+        return encode_image(self.task.args["image"])
+    def parse_response(self, response):
+        image_url = get_resource_url(self.task.args["image"])
+        image = load_image(image_url)
+        colors = []
+        for i in range(len(response.json())):
+            colors.append(
+                (
+                    random.randint(100, 255),
+                    random.randint(100, 255),
+                    random.randint(100, 255),
+                    155,
+                )
+            )
+        predicted_results = []
+        for i, pred in enumerate(response.json()):
+            mask = pred.pop("mask").encode("utf-8")
+            mask = base64.b64decode(mask)
+            mask = Image.open(BytesIO(mask), mode="r")
+            mask = mask.convert("L")
+            layer = Image.new("RGBA", mask.size, colors[i])
+            image.paste(layer, (0, 0), mask)
+            predicted_results.append(pred)
+        path = save_image(image)
+        return {
+            "generated image with segmentation mask": path,
+            "predicted": predicted_results,
+        }
+# Not yet implemented in huggingface inference API
+class ImageToImage:
+    def __init__(self, task: Task):
+        self.task = task
+    @property
+    def inference_inputs(self):
+        img_data = encode_image(self.task.args["image"])
+        img_base64 = base64.b64encode(img_data).decode("utf-8")
+        data = {
+            "inputs": {
+                "image": img_base64,
+            }
+        }
+        if "text" in self.task.args:
+            data["inputs"]["prompt"] = self.task.args["text"]
+        return json.dumps(data)
+    def parse_response(self, response):
+        image = image_from_bytes(response.content)
+        path = save_image(image)
+        return {"generated image": path}
+class ObjectDetection:
+    def __init__(self, task: Task):
+        self.task = task
+    @property
+    def inference_inputs(self):
+        return encode_image(self.task.args["image"])
+    def parse_response(self, response):
+        image_url = get_resource_url(self.task.args["image"])
+        image = load_image(image_url)
+        draw = ImageDraw.Draw(image)
+        labels = list(item["label"] for item in response.json())
+        color_map = {}
+        for label in labels:
+            if label not in color_map:
+                color_map[label] = (
+                    random.randint(0, 255),
+                    random.randint(0, 100),
+                    random.randint(0, 255),
+                )
+        for item in response.json():
+            box = item["box"]
+            draw.rectangle(
+                ((box["xmin"], box["ymin"]), (box["xmax"], box["ymax"])),
+                outline=color_map[item["label"]],
+                width=2,
+            )
+            draw.text(
+                (box["xmin"] + 5, box["ymin"] - 15),
+                item["label"],
+                fill=color_map[item["label"]],
+            )
+        path = save_image(image)
+        return {
+            "generated image with predicted box": path,
+            "predicted": response.json(),
+        }
+# Example added to task-planning-examples.json compared to original paper
+class ImageClassification:
+    def __init__(self, task: Task):
+        self.task = task
+    @property
+    def inference_inputs(self):
+        return encode_image(self.task.args["image"])
+    def parse_response(self, response):
+        return response.json()
+class ImageToText:
+    def __init__(self, task: Task):
+        self.task = task
+    @property
+    def inference_inputs(self):
+        return encode_image(self.task.args["image"])
+    def parse_response(self, response):
+        return {"generated text": response.json()[0].get("generated_text", "")}
+# Audio Tasks
+class TextToSpeech:
+    def __init__(self, task: Task):
+        self.task = task
+    @property
+    def inference_inputs(self):
+        return self.task.args["text"]
+    def parse_response(self, response):
+        audio = audio_from_bytes(response.content)
+        path = save_audio(audio)
+        return {"generated audio": path}
+class AudioToAudio:
+    def __init__(self, task: Task):
+        self.task = task
+    @property
+    def inference_inputs(self):
+        return encode_audio(self.task.args["audio"])
+    def parse_response(self, response):
+        result = response.json()
+        blob = result[0].items()["blob"]
+        content = base64.b64decode(blob.encode("utf-8"))
+        audio = audio_from_bytes(content)
+        path = save_audio(audio)
+        return {"generated audio": path}
+class AutomaticSpeechRecognition:
+    def __init__(self, task: Task):
+        self.task = task
+    @property
+    def inference_inputs(self):
+        return encode_audio(self.task.args["audio"])
+    def parse_response(self, response):
+        return response.json()
+class AudioClassification:
+    def __init__(self, task: Task):
+        self.task = task
+    @property
+    def inference_inputs(self):
+        return encode_audio(self.task.args["audio"])
+    def parse_response(self, response):
+        return response.json()
+HUGGINGFACE_TASKS = {
+    "question-answering": QuestionAnswering,
+    "sentence-similarity": SentenceSimilarity,
+    "text-classification": TextClassification,
+    "token-classification": TokenClassification,
+    "visual-question-answering": VisualQuestionAnswering,
+    "document-question-answering": DocumentQuestionAnswering,
+    "text-to-image": TextToImage,
+    "image-segmentation": ImageSegmentation,
+    "image-to-image": ImageToImage,
+    "object-detection": ObjectDetection,
+    "image-classification": ImageClassification,
+    "image-to-text": ImageToText,
+    "text-to-speech": TextToSpeech,
+    "automatic-speech-recognition": AutomaticSpeechRecognition,
+    "audio-to-audio": AudioToAudio,
+    "audio-classification": AudioClassification,
+}
+def create_huggingface_task(task: Task):
+    if task.task in HUGGINGFACE_TASKS:
+        return HUGGINGFACE_TASKS[task.task](task)
+    else:
+        raise NotImplementedError(f"Task {task.task} not supported")
+class TaskSummary(BaseModel):
+    task: Task
+    inference_result: Json[Any]
+    model: Model

hugginggpt/model_scraper.py ADDED Viewed

	@@ -0,0 +1,90 @@

+import asyncio
+import json
+import logging
+from collections import defaultdict
+from aiohttp import ClientSession
+from hugginggpt.exceptions import ModelScrapingException, async_wrap_exceptions
+from hugginggpt.huggingface_api import (HUGGINGFACE_INFERENCE_API_STATUS_URL, get_hf_headers)
+logger = logging.getLogger(__name__)
+def read_huggingface_models_metadata():
+    """Reads the metadata of all huggingface models from the local models cache file."""
+    with open("resources/huggingface-models-metadata.jsonl") as f:
+        models = [json.loads(line) for line in f]
+    models_map = defaultdict(list)
+    for model in models:
+        models_map[model["task"]].append(model)
+    return models_map
+HUGGINGFACE_MODELS_MAP = read_huggingface_models_metadata()
+@async_wrap_exceptions(
+    ModelScrapingException,
+    "Failed to find compatible models already loaded in the huggingface inference API.",
+)
+async def get_top_k_models(
+    task: str, top_k: int, max_description_length: int, session: ClientSession
+):
+    """Returns the best k available huggingface models for a given task, sorted by number of likes."""
+    # Number of potential candidates changed from top 10 to top_k*2
+    candidates = HUGGINGFACE_MODELS_MAP[task][: top_k * 2]
+    logger.debug(f"Task: {task}; All candidate models: {[c['id'] for c in candidates]}")
+    available_models = await filter_available_models(
+        candidates=candidates, session=session
+    )
+    logger.debug(
+        f"Task: {task}; Available models: {[c['id'] for c in available_models]}"
+    )
+    top_k_available_models = available_models[:top_k]
+    if not top_k_available_models:
+        raise Exception(f"No available models for task: {task}")
+    logger.debug(
+        f"Task: {task}; Top {top_k} available models: {[c['id'] for c in top_k_available_models]}"
+    )
+    top_k_models_info = [
+        {
+            "id": model["id"],
+            "likes": model.get("likes"),
+            "description": model.get("description", "")[:max_description_length],
+            "tags": model.get("meta").get("tags") if model.get("meta") else None,
+        }
+        for model in top_k_available_models
+    ]
+    return top_k_models_info
+async def filter_available_models(candidates, session: ClientSession):
+    """Filters out models that are not available or loaded in the huggingface inference API.
+    Runs concurrently."""
+    async with asyncio.TaskGroup() as tg:
+        tasks = [
+            tg.create_task(model_status(model_id=c["id"], session=session))
+            for c in candidates
+        ]
+    results = await asyncio.gather(*tasks)
+    available_model_ids = [model_id for model_id, status in results if status]
+    return [c for c in candidates if c["id"] in available_model_ids]
+async def model_status(model_id: str, session: ClientSession) -> tuple[str, bool]:
+    url = HUGGINGFACE_INFERENCE_API_STATUS_URL + model_id
+    headers = get_hf_headers()
+    r = await session.get(url, headers=headers)
+    status = r.status
+    json_response = await r.json()
+    logger.debug(f"Model {model_id} status: {status}, response: {json_response}")
+    return (
+        (model_id, True)
+        if model_is_available(status=status, json_response=json_response)
+        else (model_id, False)
+    )
+def model_is_available(status: int, json_response: dict[str, any]):
+    return status == 200 and "loaded" in json_response and json_response["loaded"]

hugginggpt/model_selection.py ADDED Viewed

	@@ -0,0 +1,97 @@

+import asyncio
+import json
+import logging
+import aiohttp
+from langchain import LLMChain
+from langchain.llms.base import BaseLLM
+from langchain.output_parsers import OutputFixingParser, PydanticOutputParser
+from langchain.prompts import load_prompt
+from pydantic import BaseModel, Field
+from hugginggpt.exceptions import ModelSelectionException, async_wrap_exceptions
+from hugginggpt.model_scraper import get_top_k_models
+from hugginggpt.resources import get_prompt_resource
+from hugginggpt.task_parsing import Task
+logger = logging.getLogger(__name__)
+class Model(BaseModel):
+    id: str = Field(description="ID of the model")
+    reason: str = Field(description="Reason for selecting this model")
+async def select_hf_models(
+    user_input: str,
+    tasks: list[Task],
+    model_selection_llm: BaseLLM,
+    output_fixing_llm: BaseLLM,
+) -> dict[int, Model]:
+    """Use LLM agent to select the best available HuggingFace model for each task, given model metadata.
+    Runs concurrently."""
+    async with aiohttp.ClientSession() as session:
+        async with asyncio.TaskGroup() as tg:
+            aio_tasks = []
+            for task in tasks:
+                aio_tasks.append(
+                    tg.create_task(
+                        select_model(
+                            user_input=user_input,
+                            task=task,
+                            model_selection_llm=model_selection_llm,
+                            output_fixing_llm=output_fixing_llm,
+                            session=session,
+                        )
+                    )
+                )
+        results = await asyncio.gather(*aio_tasks)
+        return {task_id: model for task_id, model in results}
+@async_wrap_exceptions(ModelSelectionException, "Failed to select model")
+async def select_model(
+    user_input: str,
+    task: Task,
+    model_selection_llm: BaseLLM,
+    output_fixing_llm: BaseLLM,
+    session: aiohttp.ClientSession,
+) -> (int, Model):
+    logger.info(f"Starting model selection for task: {task.task}")
+    top_k_models = await get_top_k_models(
+        task=task.task, top_k=5, max_description_length=100, session=session
+    )
+    if task.task in [
+        "summarization",
+        "translation",
+        "conversational",
+        "text-generation",
+        "text2text-generation",
+    ]:
+        model = Model(
+            id="openai",
+            reason="Text generation tasks are best handled by OpenAI models",
+        )
+    else:
+        prompt_template = load_prompt(
+            get_prompt_resource("model-selection-prompt.json")
+        )
+        llm_chain = LLMChain(prompt=prompt_template, llm=model_selection_llm)
+        # Need to replace double quotes with single quotes for correct response generation
+        task_str = task.json().replace('"', "'")
+        models_str = json.dumps(top_k_models).replace('"', "'")
+        output = await llm_chain.apredict(
+            user_input=user_input, task=task_str, models=models_str, stop=["<im_end>"]
+        )
+        logger.debug(f"Model selection raw output: {output}")
+        parser = PydanticOutputParser(pydantic_object=Model)
+        fixing_parser = OutputFixingParser.from_llm(
+            parser=parser, llm=output_fixing_llm
+        )
+        model = fixing_parser.parse(output)
+    logger.info(f"For task: {task.task}, selected model: {model}")
+    return task.id, model

hugginggpt/resources.py ADDED Viewed

	@@ -0,0 +1,104 @@

+import os
+import uuid
+from io import BytesIO
+import requests
+from PIL import Image
+from diffusers.utils.testing_utils import load_image
+from pydub import AudioSegment
+RESOURCES_DIR = "resources"
+PROMPT_TEMPLATES_DIR = "prompt-templates"
+GENERATED_RESOURCES_DIR = "output"
+def get_prompt_resource(prompt_name: str) -> str:
+    return os.path.join(RESOURCES_DIR, PROMPT_TEMPLATES_DIR, prompt_name)
+def get_resource_url(resource_arg: str) -> str:
+    if resource_arg.startswith("http"):
+        return resource_arg
+    else:
+        return GENERATED_RESOURCES_DIR + resource_arg
+# Images
+def image_to_bytes(image: Image) -> bytes:
+    image_byte = BytesIO()
+    image.save(image_byte, format="png")
+    image_data = image_byte.getvalue()
+    return image_data
+def image_from_bytes(img_data: bytes) -> Image:
+    return Image.open(BytesIO(img_data))
+def encode_image(image_arg: str) -> bytes:
+    image_url = get_resource_url(image_arg)
+    image = load_image(image_url)
+    img_data = image_to_bytes(image)
+    return img_data
+def save_image(img: Image) -> str:
+    name = str(uuid.uuid4())[:4]
+    path = f"/images/{name}.png"
+    img.save(GENERATED_RESOURCES_DIR + path)
+    return path
+# Audios
+def load_audio(audio_path: str) -> AudioSegment:
+    if audio_path.startswith("http://") or audio_path.startswith("https://"):
+        audio_data = requests.get(audio_path).content
+        audio = AudioSegment.from_file(BytesIO(audio_data))
+    elif os.path.isfile(audio_path):
+        audio = AudioSegment.from_file(audio_path)
+    else:
+        raise ValueError(
+            f"Incorrect path or url, URLs must start with `http://` or `https://`, and {audio_path} is not a valid path"
+        )
+    return audio
+def audio_to_bytes(audio: AudioSegment) -> bytes:
+    audio_byte = BytesIO()
+    audio.export(audio_byte, format="flac")
+    audio_data = audio_byte.getvalue()
+    return audio_data
+def audio_from_bytes(audio_data: bytes) -> AudioSegment:
+    return AudioSegment.from_file(BytesIO(audio_data))
+def encode_audio(audio_arg: str) -> bytes:
+    audio_url = get_resource_url(audio_arg)
+    audio = load_audio(audio_url)
+    audio_data = audio_to_bytes(audio)
+    return audio_data
+def save_audio(audio: AudioSegment) -> str:
+    name = str(uuid.uuid4())[:4]
+    path = f"/audios/{name}.flac"
+    with open(GENERATED_RESOURCES_DIR + path, "wb") as f:
+        audio.export(f, format="flac")
+    return path
+def prepend_resource_dir(s: str) -> str:
+    """Prepend the resource dir to all resource paths in the string"""
+    for resource_type in ["images", "audios", "videos"]:
+        s = s.replace(
+            f" /{resource_type}/", f" {GENERATED_RESOURCES_DIR}/{resource_type}/"
+        )
+    return s
+def init_resource_dirs():
+    os.makedirs(GENERATED_RESOURCES_DIR + "/images", exist_ok=True)
+    os.makedirs(GENERATED_RESOURCES_DIR + "/audios", exist_ok=True)
+    os.makedirs(GENERATED_RESOURCES_DIR + "/videos", exist_ok=True)

hugginggpt/response_generation.py ADDED Viewed

	@@ -0,0 +1,43 @@

+import json
+import logging
+from langchain import LLMChain
+from langchain.llms.base import BaseLLM
+from langchain.prompts import load_prompt
+from hugginggpt.exceptions import ResponseGenerationException, wrap_exceptions
+from hugginggpt.model_inference import TaskSummary
+from hugginggpt.resources import get_prompt_resource, prepend_resource_dir
+logger = logging.getLogger(__name__)
+@wrap_exceptions(ResponseGenerationException, "Failed to generate assistant response")
+def generate_response(
+    user_input: str, task_summaries: list[TaskSummary], llm: BaseLLM
+) -> str:
+    """Use LLM agent to generate a response to the user's input, given task results."""
+    logger.info("Starting response generation")
+    sorted_task_summaries = sorted(task_summaries, key=lambda ts: ts.task.id)
+    task_results_str = task_summaries_to_json(sorted_task_summaries)
+    prompt_template = load_prompt(
+        get_prompt_resource("response-generation-prompt.json")
+    )
+    llm_chain = LLMChain(prompt=prompt_template, llm=llm)
+    response = llm_chain.predict(
+        user_input=user_input, task_results=task_results_str, stop=["<im_end>"]
+    )
+    logger.info(f"Generated response: {response}")
+    return response
+def format_response(response: str) -> str:
+    """Format the response to be more readable for user."""
+    response = response.strip()
+    response = prepend_resource_dir(response)
+    return response
+def task_summaries_to_json(task_summaries: list[TaskSummary]) -> str:
+    dicts = [ts.dict() for ts in task_summaries]
+    return json.dumps(dicts)

hugginggpt/task_parsing.py ADDED Viewed

	@@ -0,0 +1,149 @@

+import copy
+import logging
+from pydantic import BaseModel, Field
+from hugginggpt.exceptions import TaskParsingException, wrap_exceptions
+logger = logging.getLogger(__name__)
+GENERATED_TOKEN = "<GENERATED>"
+class Task(BaseModel):
+    # This field is called 'task' and not 'name' to help with prompt engineering
+    task: str = Field(description="Name of the Machine Learning task")
+    id: int = Field(description="ID of the task")
+    dep: list[int] = Field(
+        description="List of IDs of the tasks that this task depends on"
+    )
+    args: dict[str, str] = Field(description="Arguments for the task")
+    def depends_on_generated_resources(self) -> bool:
+        """Returns True if the task args contains <GENERATED> placeholder tokens, False otherwise"""
+        return self.dep != [-1] and any(
+            GENERATED_TOKEN in v for v in self.args.values()
+        )
+    @wrap_exceptions(TaskParsingException, "Failed to replace generated resources")
+    def replace_generated_resources(self, task_summaries: list):
+        """Replaces <GENERATED> placeholder tokens in args with the generated resources from the task summaries"""
+        logger.info("Replacing generated resources")
+        generated_resources = {
+            k: parse_task_id(v) for k, v in self.args.items() if GENERATED_TOKEN in v
+        }
+        logger.info(
+            f"Resources to replace, resource type -> task id: {generated_resources}"
+        )
+        for resource_type, task_id in generated_resources.items():
+            matches = [
+                v
+                for k, v in task_summaries[task_id].inference_result.items()
+                if self.is_matching_generated_resource(k, resource_type)
+            ]
+            if len(matches) == 1:
+                logger.info(
+                    f"Match for generated {resource_type} in inference result of task {task_id}"
+                )
+                generated_resource = matches[0]
+                logger.info(f"Replacing {resource_type} with {generated_resource}")
+                self.args[resource_type] = generated_resource
+                return self
+            else:
+                raise Exception(
+                    f"Cannot find unique required generated {resource_type} in inference result of task {task_id}"
+                )
+    def is_matching_generated_resource(self, arg_key: str, resource_type: str) -> bool:
+        """Returns True if arg_key contains generated resource of the correct type"""
+        # If text, then match all arg keys that contain "text"
+        if resource_type.startswith("text"):
+            return "text" in arg_key
+        # If not text, then arg key must start with "generated" and the correct resource type
+        else:
+            return arg_key.startswith("generated " + resource_type)
+class Tasks(BaseModel):
+    __root__: list[Task] = Field(description="List of Machine Learning tasks")
+    def __iter__(self):
+        return iter(self.__root__)
+    def __getitem__(self, item):
+        return self.__root__[item]
+    def __len__(self):
+        return len(self.__root__)
+@wrap_exceptions(TaskParsingException, "Failed to parse tasks")
+def parse_tasks(tasks_str: str) -> list[Task]:
+    """Parses tasks from task planning json string"""
+    if tasks_str == "[]":
+        raise ValueError("Task string empty, cannot parse")
+    logger.info(f"Parsing tasks string: {tasks_str}")
+    tasks_str = tasks_str.strip()
+    # Cannot use PydanticOutputParser because it fails when parsing top level list JSON string
+    tasks = Tasks.parse_raw(tasks_str)
+    # __root__ extracts list[Task] from Tasks object
+    tasks = unfold(tasks.__root__)
+    tasks = fix_dependencies(tasks)
+    logger.info(f"Parsed tasks: {tasks}")
+    return tasks
+def parse_task_id(resource_str: str) -> int:
+    """Parse task id from generated resource string, e.g. <GENERATED>-4 -> 4"""
+    return int(resource_str.split("-")[1])
+def fix_dependencies(tasks: list[Task]) -> list[Task]:
+    """Ignores parsed tasks dependencies, and instead infers from task arguments"""
+    for task in tasks:
+        task.dep = infer_deps_from_args(task)
+    return tasks
+def infer_deps_from_args(task: Task) -> list[int]:
+    """If GENERATED arg value, add to list of unique deps. If none, deps = [-1]"""
+    deps = [parse_task_id(v) for v in task.args.values() if GENERATED_TOKEN in v]
+    if not deps:
+        deps = [-1]
+    # deduplicate
+    return list(set(deps))
+def unfold(tasks: list[Task]) -> list[Task]:
+    """A folded task has several generated resources folded into a single argument"""
+    unfolded_tasks = []
+    for task in tasks:
+        folded_args = find_folded_args(task)
+        if folded_args:
+            unfolded_tasks.extend(split(task, folded_args))
+        else:
+            unfolded_tasks.append(task)
+    return unfolded_tasks
+def split(task: Task, folded_args: tuple[str, str]) -> list[Task]:
+    """Split folded task into two same tasks, but separated generated resource arguments"""
+    key, value = folded_args
+    generated_items = value.split(",")
+    split_tasks = []
+    for item in generated_items:
+        new_task = copy.deepcopy(task)
+        dep_task_id = parse_task_id(item)
+        new_task.dep = [dep_task_id]
+        new_task.args[key] = item.strip()
+        split_tasks.append(new_task)
+    return split_tasks
+def find_folded_args(task: Task) -> tuple[str, str] | None:
+    """Finds folded args, e.g: 'image': '<GENERATED>-1,<GENERATED>-2'"""
+    for key, value in task.args.items():
+        if value.count(GENERATED_TOKEN) > 1:
+            logger.debug(f"Task {task.id} is folded")
+            return key, value
+    return None

hugginggpt/task_planning.py ADDED Viewed

	@@ -0,0 +1,61 @@

+import logging
+from langchain import LLMChain
+from langchain.llms.base import BaseLLM
+from langchain.prompts import load_prompt
+from hugginggpt.exceptions import TaskPlanningException, wrap_exceptions
+from hugginggpt.history import ConversationHistory
+from hugginggpt.llm_factory import LLM_MAX_TOKENS, count_tokens
+from hugginggpt.resources import get_prompt_resource
+from hugginggpt.task_parsing import Task, parse_tasks
+logger = logging.getLogger(__name__)
+MAIN_PROMPT_TOKENS = 800
+MAX_HISTORY_TOKENS = LLM_MAX_TOKENS - MAIN_PROMPT_TOKENS
+@wrap_exceptions(TaskPlanningException, "Failed to plan tasks")
+def plan_tasks(
+    user_input: str, history: ConversationHistory, llm: BaseLLM
+) -> list[Task]:
+    """Use LLM agent to plan tasks in order solve user request."""
+    logger.info("Starting task planning")
+    task_planning_prompt_template = load_prompt(
+        get_prompt_resource("task-planning-few-shot-prompt.json")
+    )
+    llm_chain = LLMChain(prompt=task_planning_prompt_template, llm=llm)
+    history_truncated = truncate_history(history)
+    output = llm_chain.predict(
+        user_input=user_input, history=history_truncated, stop=["<im_end>"]
+    )
+    logger.info(f"Task planning raw output: {output}")
+    tasks = parse_tasks(output)
+    return tasks
+def truncate_history(history: ConversationHistory) -> ConversationHistory:
+    """Truncate history to fit within the max token limit for the task planning LLM"""
+    example_prompt_template = load_prompt(
+        get_prompt_resource("task-planning-example-prompt.json")
+    )
+    token_counter = 0
+    n_messages = 0
+    # Iterate through history backwards in pairs, to ensure most recent messages are kept
+    for i in range(0, len(history), 2):
+        user_message = history[-(i + 2)]
+        assistant_message = history[-(i + 1)]
+        # Turn messages into LLM prompt string
+        history_text = example_prompt_template.format(
+            example_input=user_message["content"],
+            example_output=assistant_message["content"],
+        )
+        n_tokens = count_tokens(history_text)
+        if token_counter + n_tokens <= MAX_HISTORY_TOKENS:
+            n_messages += 2
+            token_counter += n_tokens
+        else:
+            break
+    start = len(history) - n_messages
+    return history[start:]

logging-config.toml ADDED Viewed

	@@ -0,0 +1,26 @@

+version = 1
+disable_existing_loggers = false
+[root]
+level = "DEBUG"
+handlers = ["debug_file", "errors_file"]
+[formatters.simple]
+format = '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+[handlers.debug_file]
+class = "logging.handlers.TimedRotatingFileHandler"
+level = "DEBUG"
+formatter = "simple"
+filename = "logs/debug.log"
+when = "midnight"
+encoding = "utf8"
+[handlers.errors_file]
+class = "logging.handlers.TimedRotatingFileHandler"
+level = "ERROR"
+formatter = "simple"
+filename = "logs/errors.log"
+when = "midnight"
+encoding = "utf8"

logs/.gitkeep ADDED Viewed

File without changes

main.py ADDED Viewed

	@@ -0,0 +1,138 @@

+import asyncio
+import json
+import logging
+import click
+import requests
+from dotenv import load_dotenv
+from hugginggpt import generate_response, infer, plan_tasks
+from hugginggpt.history import ConversationHistory
+from hugginggpt.llm_factory import LLMs, create_llms
+from hugginggpt.log import setup_logging
+from hugginggpt.model_inference import TaskSummary
+from hugginggpt.model_selection import select_hf_models
+from hugginggpt.response_generation import format_response
+load_dotenv()
+setup_logging()
+logger = logging.getLogger(__name__)
+@click.command()
+@click.option("-p", "--prompt", type=str, help="Prompt for huggingGPT")
+def main(prompt):
+    _print_banner()
+    llms = create_llms()
+    if prompt:
+        standalone_mode(user_input=prompt, llms=llms)
+    else:
+        interactive_mode(llms=llms)
+def standalone_mode(user_input: str, llms: LLMs) -> str:
+    try:
+        response, task_summaries = compute(
+            user_input=user_input,
+            history=ConversationHistory(),
+            llms=llms,
+        )
+        pretty_response = format_response(response)
+        print(pretty_response)
+        return pretty_response
+    except Exception as e:
+        logger.exception("")
+        print(
+            f"Sorry, encountered error: {e}. Please try again. Check logs if problem persists."
+        )
+def interactive_mode(llms: LLMs):
+    print("Please enter your request. End the conversation with 'exit'")
+    history = ConversationHistory()
+    while True:
+        try:
+            user_input = click.prompt("User")
+            if user_input.lower() == "exit":
+                break
+            logger.info(f"User input: {user_input}")
+            response, task_summaries = compute(
+                user_input=user_input,
+                history=history,
+                llms=llms,
+            )
+            pretty_response = format_response(response)
+            print(f"Assistant:{pretty_response}")
+            history.add(role="user", content=user_input)
+            history.add(role="assistant", content=response)
+        except Exception as e:
+            logger.exception("")
+            print(
+                f"Sorry, encountered error: {e}. Please try again. Check logs if problem persists."
+            )
+def compute(
+    user_input: str,
+    history: ConversationHistory,
+    llms: LLMs,
+) -> (str, list[TaskSummary]):
+    tasks = plan_tasks(
+        user_input=user_input, history=history, llm=llms.task_planning_llm
+    )
+    sorted(tasks, key=lambda t: max(t.dep))
+    logger.info(f"Sorted tasks: {tasks}")
+    hf_models = asyncio.run(
+        select_hf_models(
+            user_input=user_input,
+            tasks=tasks,
+            model_selection_llm=llms.model_selection_llm,
+            output_fixing_llm=llms.output_fixing_llm,
+        )
+    )
+    task_summaries = []
+    with requests.Session() as session:
+        for task in tasks:
+            logger.info(f"Starting task: {task}")
+            if task.depends_on_generated_resources():
+                task = task.replace_generated_resources(task_summaries=task_summaries)
+            model = hf_models[task.id]
+            inference_result = infer(
+                task=task,
+                model_id=model.id,
+                llm=llms.model_inference_llm,
+                session=session,
+            )
+            task_summaries.append(
+                TaskSummary(
+                    task=task,
+                    model=model,
+                    inference_result=json.dumps(inference_result),
+                )
+            )
+            logger.info(f"Finished task: {task}")
+    logger.info("Finished all tasks")
+    logger.debug(f"Task summaries: {task_summaries}")
+    response = generate_response(
+        user_input=user_input,
+        task_summaries=task_summaries,
+        llm=llms.response_generation_llm,
+    )
+    return response, task_summaries
+def _print_banner():
+    with open("resources/banner.txt", "r") as f:
+        banner = f.read()
+        logger.info("\n" + banner)
+if __name__ == "__main__":
+    main()

output/.gitkeep ADDED Viewed

File without changes

output/audios/.gitkeep ADDED Viewed

File without changes

output/images/.gitkeep ADDED Viewed

File without changes

output/videos/.gitkeep ADDED Viewed

File without changes

pdm.lock ADDED Viewed

The diff for this file is too large to render. See raw diff

pyproject.toml ADDED Viewed

	@@ -0,0 +1,50 @@

+[tool.pdm]
+[tool.pdm.dev-dependencies]
+dev = []
+test = [
+    "pytest>=7.3.0",
+    "pytest-cov>=4.0.0",
+    "pytest-asyncio>=0.21.0",
+    "aioresponses>=0.7.4",
+    "responses>=0.23.1",
+]
+ide = [
+    "setuptools>=67.6.1",
+]
+[tool.pdm.scripts]
+hugginggpt = "python main.py"
+[tool.pytest]
+[tool.pytest.ini_options]
+asyncio_mode = "auto"
+norecursedirs = "tests/helpers"
+[project]
+name = "langchain-huggingGPT"
+version = "0.1.0"
+description = ""
+authors = [
+    {name = "camille-vanhoffelen", email = "camille-vanhoffelen@users.noreply.github.com"},
+]
+dependencies = [
+    "click>=8.1.3",
+    "python-dotenv>=1.0.0",
+    "langchain>=0.0.137",
+    "openai>=0.27.4",
+    "huggingface-hub>=0.13.4",
+    "tiktoken>=0.3.3",
+    "diffusers>=0.15.1",
+    "Pillow>=9.5.0",
+    "pydub>=0.25.1",
+    "aiohttp>=3.8.4",
+    "aiodns>=3.0.0",
+    "gradio>=3.32.0",
+]
+requires-python = ">=3.11"
+readme = "README.md"
+license = {text = "MIT"}
+[build-system]
+requires = ["pdm-backend"]
+build-backend = "pdm.backend"

requirements.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

resources/banner.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+                                                             __  _ ___
+ |  _. ._   _   _ |_   _. o ._    |_|      _   _  o ._   _  /__ |_) |
+ | (_| | | (_| (_ | | (_| | | |   | | |_| (_| (_| | | | (_| \_| |   |
+            _|                             _|  _|        _|

resources/huggingface-models-metadata.jsonl ADDED Viewed

The diff for this file is too large to render. See raw diff

resources/prompt-templates/model-selection-prompt.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "_type": "prompt",
+  "input_variables": [
+    "user_input",
+    "models",
+    "task"
+  ],
+  "template": "#2 Model Selection Stage: Given the user request and the parsed tasks, the AI assistant helps the user to select a suitable model from a list of models to process the user request. The assistant should focus more on the description of the model and find the model that has the most potential to solve requests and tasks. Also, prefer models with local inference endpoints for speed and stability.\n<im_start>user\n{user_input}<im_end>\n<im_start>assistant\n{task}<im_end>\n<im_start>user\nPlease choose the most suitable model from {models} for the task {task}. The output must be in a strict JSON format: {{\"id\": \"id\", \"reason\": \"your detail reasons for the choice\"}}.<im_end>\n<im_start>assistant\n"
+}

resources/prompt-templates/openai-model-inference-prompt.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "_type": "prompt",
+  "input_variables": [
+    "task",
+    "task_name",
+    "args"
+  ],
+  "template": "Model Inference Stage: the AI assistant needs to execute a task for the user.\n<im_start>user\nHere is the task in JSON format {task}. Now you are a {task_name} system, the arguments are {args}. Just help me do {task_name} and give me the result. The result must be in text form without any urls.<im_end>\n<im_start>assistant"
+}

resources/prompt-templates/response-generation-prompt.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+  "_type": "prompt",
+  "input_variables": [
+    "user_input",
+    "task_results"
+  ],
+  "template": "#4 Response Generation Stage: With the task execution logs, the AI assistant needs to describe the process and inference results.\n<im_start>user\n{user_input}<im_end>\n<im_start>assistant\nBefore give you a response, I want to introduce my workflow for your request, which is shown in the following JSON data: {task_results}. Do you have any demands regarding my response?<im_end>\n<im_start>user\nYes. Please first think carefully and directly answer my request based on the inference results. Some of the inferences may not always turn out to be correct and require you to make careful consideration in making decisions. Then please detail your workflow including the used models and inference results for my request in your friendly tone. Please filter out information that is not relevant to my request. Tell me the complete path or urls of files in inference results. If there is nothing in the results, please tell me you can't make it.<im_end>\n<im_start>assistant"
+}

resources/prompt-templates/task-planning-example-prompt.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+  "_type": "prompt",
+  "input_variables": [
+    "example_input",
+    "example_output"
+  ],
+  "template": "<im_start>user\n{example_input}<im_end>\n<im_start>assistant\n{example_output}<im_end>"
+}

resources/prompt-templates/task-planning-examples.json ADDED Viewed

	@@ -0,0 +1,42 @@

+[
+    {
+        "example_input": "Give you some pictures e1.jpg, e2.png, e3.jpg, help me count the number of sheep?",
+        "example_output": "[{{\"task\": \"image-to-text\", \"id\": 0, \"dep\": [-1], \"args\": {{\"image\": \"e1.jpg\" }}}}, {{\"task\": \"object-detection\", \"id\": 1, \"dep\": [-1], \"args\": {{\"image\": \"e1.jpg\" }}}}, {{\"task\": \"visual-question-answering\", \"id\": 2, \"dep\": [1], \"args\": {{\"image\": \"<GENERATED>-1\", \"text\": \"How many sheep in the picture\"}}}}, {{\"task\": \"image-to-text\", \"id\": 3, \"dep\": [-1], \"args\": {{\"image\": \"e2.png\" }}}}, {{\"task\": \"object-detection\", \"id\": 4, \"dep\": [-1], \"args\": {{\"image\": \"e2.png\" }}}}, {{\"task\": \"visual-question-answering\", \"id\": 5, \"dep\": [4], \"args\": {{\"image\": \"<GENERATED>-4\", \"text\": \"How many sheep in the picture\"}}}}, {{\"task\": \"image-to-text\", \"id\": 6, \"dep\": [-1], \"args\": {{\"image\": \"e3.jpg\" }}}},  {{\"task\": \"object-detection\", \"id\": 7, \"dep\": [-1], \"args\": {{\"image\": \"e3.jpg\" }}}}, {{\"task\": \"visual-question-answering\", \"id\": 8, \"dep\": [7], \"args\": {{\"image\": \"<GENERATED>-7\", \"text\": \"How many sheep in the picture\"}}}}]"
+    },
+    {
+        "example_input":"Look at /e.jpg, can you tell me how many objects in the picture? Give me a picture similar to this one.",
+        "example_output":"[{{\"task\": \"image-to-text\", \"id\": 0, \"dep\": [-1], \"args\": {{\"image\": \"/e.jpg\" }}}}, {{\"task\": \"object-detection\", \"id\": 1, \"dep\": [-1], \"args\": {{\"image\": \"/e.jpg\" }}}}, {{\"task\": \"visual-question-answering\", \"id\": 2, \"dep\": [1], \"args\": {{\"image\": \"<GENERATED>-1\", \"text\": \"how many objects in the picture?\" }}}}, {{\"task\": \"text-to-image\", \"id\": 3, \"dep\": [0], \"args\": {{\"text\": \"<GENERATED-0>\" }}}}, {{\"task\": \"image-to-image\", \"id\": 4, \"dep\": [-1], \"args\": {{\"image\": \"/e.jpg\" }}}}]"
+    },
+    {
+        "example_input":"given a document /images/e.jpeg, answer me what is the student amount? And describe the image with your voice",
+        "example_output":"{{\"task\": \"document-question-answering\", \"id\": 0, \"dep\": [-1], \"args\": {{\"image\": \"/images/e.jpeg\", \"text\": \"what is the student amount?\" }}}}, {{\"task\": \"visual-question-answering\", \"id\": 1, \"dep\": [-1], \"args\": {{\"image\": \"/images/e.jpeg\", \"text\": \"what is the student amount?\" }}}}, {{\"task\": \"image-to-text\", \"id\": 2, \"dep\": [-1], \"args\": {{\"image\": \"/images/e.jpg\" }}}}, {{\"task\": \"text-to-speech\", \"id\": 3, \"dep\": [2], \"args\": {{\"text\": \"<GENERATED>-2\" }}}}]"
+    },
+    {
+        "example_input": "Given an image /example.jpg, generate a new image where instead of reading a book, the girl is using her phone",
+        "example_output": "[{{\"task\": \"image-to-image\", \"id\": 0, \"dep\": [-1], \"args\": {{\"image\": \"/example.jpg\", \"text\": \"instead of reading a book, the girl is using her phone\"}}}}]"
+    },
+    {
+        "example_input": "please show me an image of (based on the text) 'a boy is running' and dub it",
+        "example_output": "[{{\"task\": \"text-to-image\", \"id\": 0, \\\"dep\\\": [-1], \"args\": {{\"text\": \"a boy is running\" }}}}, {{\"task\": \"text-to-speech\", \"id\": 1, \"dep\": [-1], \"args\": {{\"text\": \"a boy is running\" }}}}]"
+    },
+    {
+        "example_input": "please show me a joke and an image of cat",
+        "example_output": "[{{\"task\": \"conversational\", \"id\": 0, \"dep\": [-1], \"args\": {{\"text\": \"please show me a joke of cat\" }}}}, {{\"task\": \"text-to-image\", \"id\": 1, \"dep\": [-1], \"args\": {{\"text\": \"a photo of cat\" }}}}]"
+    },
+    {
+        "example_input": "Please tell me how similar the two following sentences are: I like to surf. Surfing is my passion.",
+        "example_output": "[{{\"task\": \"sentence-similarity\", \"id\": 0, \"dep\": [-1], \"args\": {{\"text1\": \"I like to surf.\", \"text2\": \"Surfing is my passion.\"}}}}]"
+    },
+    {
+        "example_input": "Please tell me if the following sentence is positive or negative: Surfing is the best!",
+        "example_output": "[{{\"task\": \"text-classification\", \"id\": 0, \"dep\": [-1], \"args\": {{\"text\": \"Surfing is the best!\"}}}}]"
+    },
+    {
+        "example_input": "Classify the image found at /images/de78.png.",
+        "example_output": "[{{\"task\": \"image-classification\", \"id\": 0, \"dep\": [-1], \"args\": {{\"image\": \"/images/de78.png\"}}}}]"
+    },
+    {
+        "example_input": "Paris is the capital of France. Berlin is the capital of Germany. Based on the previous facts, answer the following question: What is the capital of France?",
+        "example_output": "[{{\"task\": \"question-answering\", \"id\": 0, \"dep\": [-1], \"args\": {{\"question\": \"What is the capital of France?\", \"context\": \"Paris is the capital of France. Berlin is the capital of Germany.\"}}}}]"
+    }
+]

resources/prompt-templates/task-planning-few-shot-prompt.json ADDED Viewed

	@@ -0,0 +1,11 @@

+{
+  "_type": "few_shot",
+  "input_variables": [
+    "user_input",
+    "history"
+  ],
+  "prefix": "#1 Task Planning Stage: The AI assistant can parse user input to several tasks: [{{\"task\": task, \"id\": task_id, \"dep\": dependency_task_id, \"args\": {{\"text\": text or <GENERATED>-dep_id, \"image\": image_url or <GENERATED>-dep_id, \"audio\": audio_url or <GENERATED>-dep_id}}}}]. The special tag \"<GENERATED>-dep_id\" refer to the one generated text/image/audio in the dependency task (Please consider whether the dependency task generates resources of this type.) and \"dep_id\" must be in \"dep\" list. The \"dep\" field denotes the ids of the previous prerequisite tasks which generate a new resource that the current task relies on. The \"args\" field must in [\"text\", \"image\", \"audio\"], nothing else. The task MUST be selected from the following options: \"token-classification\", \"text2text-generation\", \"summarization\", \"translation\", \"question-answering\", \"conversational\", \"text-generation\", \"sentence-similarity\", \"tabular-classification\", \"object-detection\", \"image-classification\", \"image-to-image\", \"image-to-text\", \"text-to-image\", \"visual-question-answering\", \"document-question-answering\", \"image-segmentation\", \"depth-estimation\", \"text-to-speech\", \"automatic-speech-recognition\", \"audio-to-audio\", \"audio-classification\". There may be multiple tasks of the same type. Think step by step about all the tasks needed to resolve the user's request. Parse out as few tasks as possible while ensuring that the user request can be resolved. Pay attention to the dependencies and order among tasks. If the user input can't be parsed, you need to reply empty JSON [].",
+  "example_prompt_path": "resources/prompt-templates/task-planning-example-prompt.json",
+  "examples": "resources/prompt-templates/task-planning-examples.json",
+  "suffix": "<im_start>user\nThe chat log [ {history} ] may contain the resources I mentioned. Now I input {{ {user_input} }}. Pay attention to the input and output types of tasks and the dependencies between tasks.<im_end>\n<im_start>assistant\n"
+}