Spaces:
Runtime error
title: GPT4V-Image-Captioner
app_file: gpt-caption.py
sdk: gradio
sdk_version: 4.21.0
GPT4V-Image-Captioner / GPT4V图像打标器
We now have sd-webui-GPT4V-Image-Captioner for SD WebUI
This is a multifunctional image processing toolbox built with Gradio, capable of tagging images using the GPT-4-vision or Claude 3 API, the cogVLM model, Qwen-VL(Alibaba Cloud), the Moondream model.
Key features include:
- One-click installation and use
- Single image and multi-image batch tagging
- Choice of online GPT4V or Claude 3 or Qwen-VL(Alibaba Cloud) & local CogVLM and Moondream models
- Visual tag analysis and processing
- Image pre-compression
- Keyword filtering and watermark image recognition
Developers: Jiaye, LEOSAM是只兔狲, SleeeepyZhou, Fok, GPT4. Welcome everyone to add more new features to this project.
Please note that the Claude 3 feature is not finished yet.
To use Claude 3, simply replace the API key and URL with the Claude 3 API key and URL (/v1/messages), and changing the model name to "claude-3-opus" (or sonnet).
Installation and Startup Guide
Windows (If the automatic installation fails, please refer to the Manual Installation Instructions)
- Open Command Prompt as administrator and navigate to the directory where you want to clone the repository.
- Clone the repository using the following command:
git clone https://github.com/jiayev/GPT4V-Image-Captioner
- Double-click
install_windows.bat
to run and install all necessary dependencies. - After the installation is complete, you can launch the GPT4V-Image-Captioner by double-clicking
start_windows.bat
. - Hold down Ctrl and click on the URL in the terminal (or copy the URL to your browser), which will open the Gradio app interface in your default browser.
- Enter the official OpenAI or third-party GPT-4V API Key and API Url at the top of the interface. After setting the image address, you can start tagging the image.
Linux / macOS
- Open a terminal and navigate to the directory where you want to clone the repository.
- Clone the repository using the following command:
git clone https://github.com/jiayev/GPT4V-Image-Captioner
- Navigate to the cloned directory:
cd GPT4V-Image-Captioner
- Make the install and start scripts executable with the following command:
chmod +x install_linux_mac.sh; chmod +x start_linux_mac.sh
- Execute the install script:
./install_linux_mac.sh
- Launch the GPT4V-Image-Captioner in the terminal by executing the launch script:
./start_linux_mac.sh
- Copy the URL displayed in the terminal and open it in your browser to access the Gradio app interface.
- Enter the official OpenAI or third-party GPT-4V API Key and API Url at the top of the interface. After setting the image address, you can start tagging the image.
Windows Manual Installation Instructions
Open the Command Prompt by pressing
Win + R
, typingcmd
, and then pressingEnter
.Clone the repository to your local machine using the following command:
git clone https://github.com/jiayev/GPT4V-Image-Captioner
Once cloning is complete, navigate to the cloned directory:
cd GPT4V-Image-Captioner
Before installing any dependencies, make sure that Python is installed on your system. Check for Python's presence by typing the following command and pressing
Enter
in the Command Prompt:python --version
If Python is not installed, you will get an error message. In that case, please visit the Python official download page and follow the instructions to install it.
Create a virtual environment named
myenv
to avoid contaminating the global Python environment:python -m venv myenv
Activate the virtual environment you just created:
myenv\Scripts\activate
Update
pip
to date:python -m pip install --upgrade pip
Install libraries within the virtual environment:
pip install scipy networkx wordcloud matplotlib Pillow tqdm gradio requests
After completing the steps above, you can start GPT4V-Image-Captioner by double-clicking the
start_windows.bat
file.