koajoel
/

PolyFormer

Model card Files Files and versions Community

koajoel commited on May 19, 2023

Commit

495976e

•

1 Parent(s): e5575e6

Update README.md

Files changed (1) hide show

README.md +31 -0

README.md CHANGED Viewed

@@ -1,3 +1,34 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
 ---
+# PolyFormer: Referring Image Segmentation as Sequential Polygon Generation
+[Project](https://polyformer.github.io/) | [GitHub](https://github.com/amazon-science/polygon-transformer) | [Demo](https://huggingface.co/spaces/koajoel/PolyFormer)
+## Model description
+PolyFormer is a unified framework for referring image segmentation (RIS) and referring expression comprehension (REC) by formulating them as a sequence-to-sequence (seq2seq) prediction problem. For more details, please refer to our paper:
+[PolyFormer: Referring Image Segmentation as Sequential Polygon Generation](https://arxiv.org/abs/2302.07387)
+Jiang Liu*, Hui Ding*, Zhaowei Cai, Yuting Zhang, Ravi Kumar Satzoda, Vijay Mahadevan, R. Manmatha, [CVPR 2023](https://cvpr2023.thecvf.com/Conferences/2023/AcceptedPapers)
+## Training data
+We pre-train PolyFormer on the REC task using Visual Genome, RefCOCO, RefCOCO+, RefCOCOg, and Flickr30k-entities, and the finetune on REC + RIS task using RefCOCO, RefCOCO+,
+and RefCOCOg.
+* PolyFormer-B: Swin-B as the visual encoder, BERT-base as the text encoder, 6 transformer encoder layers and 6 decoder layers.
+* PolyFormer-L: Swin-L as the visual encoder, BERT-base as the text encoder, 12 transformer encoder layers and 12 decoder layers.
+## Citation
+If you find PolyFormer useful in your research, please cite the following paper:
+``` latex
+@article{liu2023polyformer,
+  title={PolyFormer: Referring Image Segmentation as Sequential Polygon Generation},
+  author={Liu, Jiang and Ding, Hui and Cai, Zhaowei and Zhang, Yuting and Satzoda, Ravi Kumar and Mahadevan, Vijay and Manmatha, R},
+  journal={arXiv preprint arXiv:2302.07387},
+  year={2023}
+}
+```