How <seg[value]> tokens generate the masks in segmentation tasks?

#10

by cmgzy - opened May 30

May 30

In https://huggingface.co/blog/paligemma#referring-expression-segmentation,
the authors said "The segmentation tokens can be further processed to generate segmentation masks."

I understand what the <loc[value]> tokens mean by "Each detection is represented by four location coordinates in the order y_min, x_min, y_max, x_max, followed by the label that was detected in that box", but cannot figure out how <seg[value]> tokens generate the masks. Could anyone clarify? Thanks!

merve

Google org May 30

@cmgzy hello, they are decoded by a VAE to generate the mask, the code can be found in the Space. LMK if anything's unclear.

merve changed discussion status to closed Jun 4

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment