|
--- |
|
license: mit |
|
--- |
|
|
|
<div align="center"> |
|
<h1>SegGPT: Segmenting Everything In Context </h1> |
|
|
|
[Xinlong Wang](https://www.xloong.wang/)<sup>1*</sup>, [Xiaosong Zhang](https://scholar.google.com/citations?user=98exn6wAAAAJ&hl=en)<sup>1*</sup>, [Yue Cao](http://yue-cao.me/)<sup>1*</sup>, [Wen Wang](https://scholar.google.com/citations?user=1ks0R04AAAAJ&hl)<sup>2</sup>, [Chunhua Shen](https://cshen.github.io/)<sup>2</sup>, [Tiejun Huang](https://scholar.google.com/citations?user=knvEK4AAAAAJ&hl=en)<sup>1,3</sup> |
|
|
|
<sup>1</sup>[BAAI](https://www.baai.ac.cn/english.html), <sup>2</sup>[ZJU](https://www.zju.edu.cn/english/), <sup>3</sup>[PKU](https://english.pku.edu.cn/) |
|
|
|
Enjoy the [Demo](https://huggingface.co/spaces/BAAI/SegGPT) and [Code](https://github.com/baaivision/Painter/edit/main/SegGPT) |
|
|
|
|
|
<br> |
|
|
|
|
|
![teaser](./seggpt_teaser.png) |
|
|
|
|
|
</div> |
|
|
|
|
|
We present SegGPT, a generalist model for segmenting everything in context. With only one single model, SegGPT can perform arbitrary segmentation tasks in images or videos via in-context inference, such as object instance, stuff, part, contour, and text. |
|
SegGPT is evaluated on a broad range of tasks, including few-shot semantic segmentation, video object segmentation, semantic segmentation, and panoptic segmentation. |
|
Our results show strong capabilities in segmenting in-domain and out-of-domain targets, either qualitatively or quantitatively. |
|
|
|
[[Paper]](https://arxiv.org/abs/2304.03284) |
|
[[Code]](https://github.com/baaivision/Painter/edit/main/SegGPT) |
|
[[Demo]](https://huggingface.co/spaces/BAAI/SegGPT) |
|
|
|
## **Model** |
|
|
|
A pre-trained SegGPT model is available at [🤗 HF link](https://huggingface.co/BAAI/SegGPT/blob/main/seggpt_vit_large.pth). |
|
|
|
|
|
|
|
## Citation |
|
|
|
``` |
|
@article{SegGPT, |
|
title={SegGPT: Segmenting Everything In Context}, |
|
author={Wang, Xinlong and Zhang, Xiaosong and Cao, Yue and Wang, Wen and Shen, Chunhua and Huang, Tiejun}, |
|
journal={arXiv preprint arXiv:2304.03284}, |
|
year={2023} |
|
} |
|
``` |
|
|
|
## Contact |
|
|
|
**We are hiring** at all levels at BAAI Vision Team, including full-time researchers, engineers and interns. |
|
If you are interested in working with us on **foundation model, visual perception and multimodal learning**, please contact [Xinlong Wang](https://www.xloong.wang/) (`wangxinlong@baai.ac.cn`) and [Yue Cao](http://yue-cao.me/) (`caoyue@baai.ac.cn`). |
|
|
|
|