matlok 's Collections
LMM

Papers - Video - Encoders - C-ViViT - MaskGiT

MaskGiT is trained to reconstruct masked tokens z predicted by a frozen C-ViViT encoder and conditioned on T5X tokens of a given prompt p0