bf16_vs_fp8 / docs /model_support.md
zjasper666's picture
Upload folder using huggingface_hub
8655a4b verified

A newer version of the Gradio SDK is available: 5.6.0

Upgrade

Model Support

This document describes how to support a new model in FastChat.

Content

Local Models

To support a new local model in FastChat, you need to correctly handle its prompt template and model loading. The goal is to make the following command run with the correct prompts.

python3 -m fastchat.serve.cli --model [YOUR_MODEL_PATH]

You can run this example command to learn the code logic.

python3 -m fastchat.serve.cli --model lmsys/vicuna-7b-v1.5

You can add --debug to see the actual prompt sent to the model.

Steps

FastChat uses the Conversation class to handle prompt templates and BaseModelAdapter class to handle model loading.

  1. Implement a conversation template for the new model at fastchat/conversation.py. You can follow existing examples and use register_conv_template to add a new one. Please also add a link to the official reference code if possible.
  2. Implement a model adapter for the new model at fastchat/model/model_adapter.py. You can follow existing examples and use register_model_adapter to add a new one.
  3. (Optional) add the model name to the "Supported models" section above and add more information in fastchat/model/model_registry.py.

After these steps, the new model should be compatible with most FastChat features, such as CLI, web UI, model worker, and OpenAI-compatible API server. Please do some testing with these features as well.

Supported models

API-Based Models

To support an API-based model, consider learning from the existing OpenAI example. If the model is compatible with OpenAI APIs, then a configuration file is all that's needed without any additional code. For custom protocols, implementation of a streaming generator in fastchat/serve/api_provider.py is required, following the provided examples. Currently, FastChat is compatible with OpenAI, Anthropic, Google Vertex AI, Mistral, Nvidia NGC, YandexGPT and Reka.

Steps to Launch a WebUI with an API Model

  1. Specify the endpoint information in a JSON configuration file. For instance, create a file named api_endpoints.json:
{
  "gpt-3.5-turbo": {
    "model_name": "gpt-3.5-turbo",
    "api_type": "openai",
    "api_base": "https://api.openai.com/v1",
    "api_key": "sk-******",
    "anony_only": false,
    "recommended_config": {
      "temperature": 0.7,
      "top_p": 1.0
    },
    "text-arena": true,
    "vision-arena": false,
  }
}
  • "api_type" can be one of the following: openai, anthropic, gemini, mistral, yandexgpt or reka. For custom APIs, add a new type and implement it accordingly.
  • "anony_only" indicates whether to display this model in anonymous mode only.
  • "recommended_config" indicates the recommended generation parameters for temperature and top_p.
  • "text-arena" indicates whether the model should be displayed in the Text Arena.
  • "vision-arena" indicates whether the model should be displayed in the Vision Arena.
  1. Launch the Gradio web server with the argument --register api_endpoints.json:
python3 -m fastchat.serve.gradio_web_server --controller "" --share --register api_endpoints.json

Now, you can open a browser and interact with the model.