datasets: | |
- reasonseg | |
language: en | |
license: other | |
pipeline_tag: image-segmentation | |
library_name: transformers | |
tags: | |
- vision | |
- segmentation | |
# Seg-Zero-7B | |
This model is based on the paper [Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement](https://huggingface.co/papers/2503.06520). It uses a decoupled architecture with a reasoning model and a segmentation model. It's trained via reinforcement learning using GRPO without explicit reasoning data, leading to robust zero-shot generalization and emergent test-time reasoning. | |
Code: https://github.com/dvlab-research/Seg-Zero | |
## Description | |
This is a Seg-Zero-7B model. It introduces a decoupled architecture consisting of a reasoning model and a segmentation model. The reasoning model interprets user intentions, generates explicit reasoning chains, and produces positional prompts, which are subsequently used by the segmentation model to generate pixel-level masks. | |
## Usage | |
```python | |
from transformers import AutoModelForCausalLM, AutoTokenizer | |
import torch | |
# load model | |
model = AutoModelForCausalLM.from_pretrained("Ricky06662/Seg-Zero-7B") | |
tokenizer = AutoTokenizer.from_pretrained("Ricky06662/Seg-Zero-7B") | |
``` | |
## Installation | |
```bash | |
git clone https://github.com/dvlab-research/Seg-Zero.git | |
cd Seg-Zero | |
conda create -n seg_zero python=3.11 | |
conda activate seg_zero | |
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 | |
pip install -e . | |
pip install sam2 | |
pip install matplotlib | |
``` | |
## Inference | |
```bash | |
python inference_scripts/infer.py | |
``` | |
The default question is: | |
> "the unusual object in the image." | |
You will get the thinking process in the command line and the mask will be saved in the **inference_scripts** folder. You can also provide your own image_path and text: | |
```bash | |
python inference_scripts/infer.py --image_path "your_image_path" --text "your question text" | |
``` |