vision_papers / pages /11_SegGPT.py
lbourdois's picture
Upload 174 files
94e735e verified
raw
history blame
2.96 kB
import streamlit as st
from streamlit_extras.switch_page_button import switch_page
st.title("SegGPT")
st.success("""[Original tweet](https://x.com/mervenoyann/status/1773056450790666568) (March 27, 2024)""", icon="ℹ️")
st.markdown(""" """)
st.markdown("""SegGPT is a vision generalist on image segmentation, quite like GPT for computer vision ✨
It comes with the last release of πŸ€— Transformers 🎁
Technical details, demo and how-to's under this!
""")
st.markdown(""" """)
st.image("pages/SegGPT/image_1.jpeg", use_column_width=True)
st.markdown(""" """)
st.markdown("""SegGPT is an extension of the <a href='Painter' target='_self'>Painter</a> where you speak to images with images: the model takes in an image prompt, transformed version of the image prompt, the actual image you want to see the same transform, and expected to output the transformed image.
SegGPT consists of a vanilla ViT with a decoder on top (linear, conv, linear). The model is trained on diverse segmentation examples, where they provide example image-mask pairs, the actual input to be segmented, and the decoder head learns to reconstruct the mask output. πŸ‘‡πŸ»
""", unsafe_allow_html=True)
st.markdown(""" """)
st.image("pages/SegGPT/image_2.jpg", use_column_width=True)
st.markdown(""" """)
st.markdown("""
This generalizes pretty well!
The authors do not claim state-of-the-art results as the model is mainly used zero-shot and few-shot inference. They also do prompt tuning, where they freeze the parameters of the model and only optimize the image tensor (the input context).
""")
st.markdown(""" """)
st.image("pages/SegGPT/image_3.jpg", use_column_width=True)
st.markdown(""" """)
st.markdown("""Thanks to πŸ€— Transformers you can use this model easily! See [here](https://t.co/U5pVpBhkfK).
""")
st.markdown(""" """)
st.image("pages/SegGPT/image_4.jpeg", use_column_width=True)
st.markdown(""" """)
st.markdown("""
I have built an app for you to try it out. I combined SegGPT with Depth Anything Model, so you don't have to upload image mask prompts in your prompt pair πŸ€—
Try it [here](https://t.co/uJIwqJeYUy). Also check out the [collection](https://t.co/HvfjWkAEzP).
""")
st.markdown(""" """)
st.image("pages/SegGPT/image_5.jpeg", use_column_width=True)
st.markdown(""" """)
st.info("""
Ressources:
[SegGPT: Segmenting Everything In Context](https://arxiv.org/abs/2304.03284)
by Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, Tiejun Huang (2023)
[GitHub](https://github.com/baaivision/Painter)""", icon="πŸ“š")
st.markdown(""" """)
st.markdown(""" """)
st.markdown(""" """)
col1, col2, col3 = st.columns(3)
with col1:
if st.button('Previous paper', use_container_width=True):
switch_page("Painter")
with col2:
if st.button('Home', use_container_width=True):
switch_page("Home")
with col3:
if st.button('Next paper', use_container_width=True):
switch_page("Grounding DINO")