Object detection capabilities ?

#38
by syrineM - opened

Are Gemma 3 multimodal models trained on object detection ? If yes what is the format of the bounding boxes they are trained to output ? This info is specified for Gemini models (Bounding boxes in the [y_min, x_min, y_max, x_max] format. The top left corner is the origin. The x and y axis go horizontally and vertically, respectively. Coordinate values are normalized to 0-1000 for every image). Is it the same for Gemma 3 ?
Thank you.

Google org

Hi @syrineM ,

While Gemma 3 can identify objects in images and is used for computer vision tasks, the available documentation does not explicitly confirm that the models are trained to directly output bounding box coordinates as a standard feature like the Gemini models.

However, Gemma 3 likely follows the same format as the Gemini models because it is trained using the architecture of the Gemini models, meaning Gemma 3 models serve as the base versions of the Gemini models.

For more information, could you please refer this link.

Thank you.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment