Amazing model and very promissing

#5
by plo33 - opened

Hi guys, I'm here just to say: Amazing model. A lot of multimodality methods.

I'm getting 0.07 ~ 0.14ms inference time in the CAPTION_TO_PHRASE_GROUNDING mode on an RTX 3080 10GB. I think edge devices can benefit from this model aswell.

Sign up or log in to comment