metadata

title: GPT4V-Image-Captioner
app_file: gpt-caption.py
sdk: gradio
sdk_version: 4.21.0

GPT4V-Image-Captioner / GPT4V图像打标器

中文版说明

We now have sd-webui-GPT4V-Image-Captioner for SD WebUI

This is a multifunctional image processing toolbox built with Gradio, capable of tagging images using the GPT-4-vision or Claude 3 API, the cogVLM model, Qwen-VL(Alibaba Cloud), the Moondream model.

Key features include:

One-click installation and use
Single image and multi-image batch tagging
Choice of online GPT4V or Claude 3 or Qwen-VL(Alibaba Cloud) & local CogVLM and Moondream models
Visual tag analysis and processing
Image pre-compression
Keyword filtering and watermark image recognition

Developers: Jiaye, LEOSAM是只兔狲, SleeeepyZhou, Fok, GPT4. Welcome everyone to add more new features to this project.

Please note that the Claude 3 feature is not finished yet.

To use Claude 3, simply replace the API key and URL with the Claude 3 API key and URL (/v1/messages), and changing the model name to "claude-3-opus" (or sonnet).

Installation and Startup Guide

Windows (If the automatic installation fails, please refer to the Manual Installation Instructions)

Open Command Prompt as administrator and navigate to the directory where you want to clone the repository.

Clone the repository using the following command:

git clone https://github.com/jiayev/GPT4V-Image-Captioner

Double-click install_windows.bat to run and install all necessary dependencies.
After the installation is complete, you can launch the GPT4V-Image-Captioner by double-clicking start_windows.bat.
Hold down Ctrl and click on the URL in the terminal (or copy the URL to your browser), which will open the Gradio app interface in your default browser.
Enter the official OpenAI or third-party GPT-4V API Key and API Url at the top of the interface. After setting the image address, you can start tagging the image.

Linux / macOS

Open a terminal and navigate to the directory where you want to clone the repository.

Clone the repository using the following command:

git clone https://github.com/jiayev/GPT4V-Image-Captioner

Navigate to the cloned directory:
```
cd GPT4V-Image-Captioner
```
Make the install and start scripts executable with the following command:
```
chmod +x install_linux_mac.sh; chmod +x start_linux_mac.sh
```
Execute the install script:
```
./install_linux_mac.sh
```
Launch the GPT4V-Image-Captioner in the terminal by executing the launch script:
```
./start_linux_mac.sh
```
Copy the URL displayed in the terminal and open it in your browser to access the Gradio app interface.
Enter the official OpenAI or third-party GPT-4V API Key and API Url at the top of the interface. After setting the image address, you can start tagging the image.

Windows Manual Installation Instructions

Open the Command Prompt by pressing Win + R, typing cmd, and then pressing Enter.
Clone the repository to your local machine using the following command:
```
git clone https://github.com/jiayev/GPT4V-Image-Captioner
```
Once cloning is complete, navigate to the cloned directory:
```
cd GPT4V-Image-Captioner
```
Before installing any dependencies, make sure that Python is installed on your system. Check for Python's presence by typing the following command and pressing Enter in the Command Prompt:
```
python --version
```
If Python is not installed, you will get an error message. In that case, please visit the Python official download page and follow the instructions to install it.
Create a virtual environment named myenv to avoid contaminating the global Python environment:
```
python -m venv myenv
```
Activate the virtual environment you just created:
```
myenv\Scripts\activate
```
Update pip to date:
```
python -m pip install --upgrade pip
```

Install libraries within the virtual environment:

pip install scipy networkx wordcloud matplotlib Pillow tqdm gradio requests

After completing the steps above, you can start GPT4V-Image-Captioner by double-clicking the start_windows.bat file.