Object detection capabilities ?

#38

by syrineM - opened 11 days ago

Discussion

syrineM

11 days ago

•

edited 11 days ago

Are Gemma 3 multimodal models trained on object detection ? If yes what is the format of the bounding boxes they are trained to output ? This info is specified for Gemini models (Bounding boxes in the [y_min, x_min, y_max, x_max] format. The top left corner is the origin. The x and y axis go horizontally and vertically, respectively. Coordinate values are normalized to 0-1000 for every image). Is it the same for Gemma 3 ?
Thank you.

GopiUppari

Google org 10 days ago

Hi @syrineM ,

While Gemma 3 can identify objects in images and is used for computer vision tasks, the available documentation does not explicitly confirm that the models are trained to directly output bounding box coordinates as a standard feature like the Gemini models.

However, Gemma 3 likely follows the same format as the Gemini models because it is trained using the architecture of the Gemini models, meaning Gemma 3 models serve as the base versions of the Gemini models.

For more information, could you please refer this link.

Thank you.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment