šļø Remember when last April, @Meta released Segment Anything Model (SAM) paper and it was too good to be true. š¤Æ
They have now released Segment Anything Model 2 (SAM 2) and it's mind-blowingly great! š
SAM 2 is the first unified model for segmenting objects across images and videos. You can use a click, box, or mask as the input to select an object on any image or frame of video. š¼ļøš¹
SAM consists of an image encoder to encode images, a prompt encoder to encode prompts, then outputs of these two are given to a mask decoder to generate masks. š
The biggest jump of SAM2 from SAM is using memory to have consistent masking across frames! They call it masklet prediction! š§
They have also released the dataset, SA-V This dataset is truly huge, with 190.9K manual annotations and 451.7K automatic! š