Extensions

Extensions are defined by files named script.py inside subfolders of text-generation-webui/extensions. They are loaded at startup if the folder name is specified after the --extensions flag.

For instance, extensions/silero_tts/script.py gets loaded with python server.py --extensions silero_tts.

text-generation-webui-extensions

The repository above contains a directory of user extensions.

If you create an extension, you are welcome to host it in a GitHub repository and submit a PR adding it to the list.

Built-in extensions

Extension	Description
api	Creates an API with two endpoints, one for streaming at `/api/v1/stream` port 5005 and another for blocking at `/api/v1/generate` port 5000. This is the main API for the webui.
openai	Creates an API that mimics the OpenAI API and can be used as a drop-in replacement.
multimodal	Adds multimodality support (text+images). For a detailed description see README.md in the extension directory.
google_translate	Automatically translates inputs and outputs using Google Translate.
silero_tts	Text-to-speech extension using Silero. When used in chat mode, responses are replaced with an audio widget.
elevenlabs_tts	Text-to-speech extension using the ElevenLabs API. You need an API key to use it.
whisper_stt	Allows you to enter your inputs in chat mode using your microphone.
sd_api_pictures	Allows you to request pictures from the bot in chat mode, which will be generated using the AUTOMATIC1111 Stable Diffusion API. See examples here.
character_bias	Just a very simple example that adds a hidden string at the beginning of the bot's reply in chat mode.
send_pictures	Creates an image upload field that can be used to send images to the bot in chat mode. Captions are automatically generated using BLIP.
gallery	Creates a gallery with the chat characters and their pictures.
superbooga	An extension that uses ChromaDB to create an arbitrarily large pseudocontext, taking as input text files, URLs, or pasted text. Based on https://github.com/kaiokendev/superbig.
ngrok	Allows you to access the web UI remotely using the ngrok reverse tunnel service (free). It's an alternative to the built-in Gradio `--share` feature.
perplexity_colors	Colors each token in the output text by its associated probability, as derived from the model logits.

How to write an extension

The extensions framework is based on special functions and variables that you can define in script.py. The functions are the following:

Function	Description
`def setup()`	Is executed when the extension gets imported.
`def ui()`	Creates custom gradio elements when the UI is launched.
`def custom_css()`	Returns custom CSS as a string. It is applied whenever the web UI is loaded.
`def custom_js()`	Same as above but for javascript.
`def input_modifier(string, state)`	Modifies the input string before it enters the model. In chat mode, it is applied to the user message. Otherwise, it is applied to the entire prompt.
`def output_modifier(string, state)`	Modifies the output string before it is presented in the UI. In chat mode, it is applied to the bot's reply. Otherwise, it is applied to the entire output.
`def chat_input_modifier(text, visible_text, state)`	Modifies both the visible and internal inputs in chat mode. Can be used to hijack the chat input with custom content.
`def bot_prefix_modifier(string, state)`	Applied in chat mode to the prefix for the bot's reply.
`def state_modifier(state)`	Modifies the dictionary containing the UI input parameters before it is used by the text generation functions.
`def history_modifier(history)`	Modifies the chat history before the text generation in chat mode begins.
`def custom_generate_reply(...)`	Overrides the main text generation function.
`def custom_generate_chat_prompt(...)`	Overrides the prompt generator in chat mode.
`def tokenizer_modifier(state, prompt, input_ids, input_embeds)`	Modifies the `input_ids`/`input_embeds` fed to the model. Should return `prompt`, `input_ids`, `input_embeds`. See the `multimodal` extension for an example.
`def custom_tokenized_length(prompt)`	Used in conjunction with `tokenizer_modifier`, returns the length in tokens of `prompt`. See the `multimodal` extension for an example.

Additionally, you can define a special params dictionary. In it, the display_name key is used to define the displayed name of the extension in the UI, and the is_tab key is used to define whether the extension should appear in a new tab. By default, extensions appear at the bottom of the "Text generation" tab.

Example:

params = {
    "display_name": "Google Translate",
    "is_tab": True,
}

The params dict may also contain variables that you want to be customizable through a settings.yaml file. For instance, assuming the extension is in extensions/google_translate, the variable language string in

params = {
    "display_name": "Google Translate",
    "is_tab": True,
    "language string": "jp"
}

can be customized by adding a key called google_translate-language string to settings.yaml:

google_translate-language string: 'fr'

That is, the syntax for the key is extension_name-variable_name.

Using multiple extensions at the same time

You can activate more than one extension at a time by providing their names separated by spaces after --extensions. The input, output, and bot prefix modifiers will be applied in the specified order.

Example:

python server.py --extensions enthusiasm translate # First apply enthusiasm, then translate
python server.py --extensions translate enthusiasm # First apply translate, then enthusiasm

Do note, that for:

custom_generate_chat_prompt
custom_generate_reply
custom_tokenized_length

only the first declaration encountered will be used and the rest will be ignored.

A full example

The source code below can be found at extensions/example/script.py.

"""
An example of extension. It does nothing, but you can add transformations
before the return statements to customize the webui behavior.

Starting from history_modifier and ending in output_modifier, the
functions are declared in the same order that they are called at
generation time.
"""

import gradio as gr
import torch
from transformers import LogitsProcessor

from modules import chat, shared
from modules.text_generation import (
    decode,
    encode,
    generate_reply,
)

params = {
    "display_name": "Example Extension",
    "is_tab": False,
}

class MyLogits(LogitsProcessor):
    """
    Manipulates the probabilities for the next token before it gets sampled.
    Used in the logits_processor_modifier function below.
    """
    def __init__(self):
        pass

    def __call__(self, input_ids, scores):
        # probs = torch.softmax(scores, dim=-1, dtype=torch.float)
        # probs[0] /= probs[0].sum()
        # scores = torch.log(probs / (1 - probs))
        return scores

def history_modifier(history):
    """
    Modifies the chat history.
    Only used in chat mode.
    """
    return history

def state_modifier(state):
    """
    Modifies the state variable, which is a dictionary containing the input
    values in the UI like sliders and checkboxes.
    """
    return state

def chat_input_modifier(text, visible_text, state):
    """
    Modifies the user input string in chat mode (visible_text).
    You can also modify the internal representation of the user
    input (text) to change how it will appear in the prompt.
    """
    return text, visible_text

def input_modifier(string, state):
    """
    In default/notebook modes, modifies the whole prompt.

    In chat mode, it is the same as chat_input_modifier but only applied
    to "text", here called "string", and not to "visible_text".
    """
    return string

def bot_prefix_modifier(string, state):
    """
    Modifies the prefix for the next bot reply in chat mode.
    By default, the prefix will be something like "Bot Name:".
    """
    return string

def tokenizer_modifier(state, prompt, input_ids, input_embeds):
    """
    Modifies the input ids and embeds.
    Used by the multimodal extension to put image embeddings in the prompt.
    Only used by loaders that use the transformers library for sampling.
    """
    return prompt, input_ids, input_embeds

def logits_processor_modifier(processor_list, input_ids):
    """
    Adds logits processors to the list, allowing you to access and modify
    the next token probabilities.
    Only used by loaders that use the transformers library for sampling.
    """
    processor_list.append(MyLogits())
    return processor_list

def output_modifier(string, state):
    """
    Modifies the LLM output before it gets presented.

    In chat mode, the modified version goes into history['visible'],
    and the original version goes into history['internal'].
    """
    return string

def custom_generate_chat_prompt(user_input, state, **kwargs):
    """
    Replaces the function that generates the prompt from the chat history.
    Only used in chat mode.
    """
    result = chat.generate_chat_prompt(user_input, state, **kwargs)
    return result

def custom_css():
    """
    Returns a CSS string that gets appended to the CSS for the webui.
    """
    return ''

def custom_js():
    """
    Returns a javascript string that gets appended to the javascript
    for the webui.
    """
    return ''

def setup():
    """
    Gets executed only once, when the extension is imported.
    """
    pass

def ui():
    """
    Gets executed when the UI is drawn. Custom gradio elements and
    their corresponding event handlers should be defined here.

    To learn about gradio components, check out the docs:
    https://gradio.app/docs/
    """
    pass