Object detection capabilities ?
Are Gemma 3 multimodal models trained on object detection ? If yes what is the format of the bounding boxes they are trained to output ? This info is specified for Gemini models (Bounding boxes in the [y_min, x_min, y_max, x_max] format. The top left corner is the origin. The x and y axis go horizontally and vertically, respectively. Coordinate values are normalized to 0-1000 for every image). Is it the same for Gemma 3 ?
Thank you.
Hi @syrineM ,
While Gemma 3 can identify objects in images and is used for computer vision tasks, the available documentation does not explicitly confirm that the models are trained to directly output bounding box coordinates as a standard feature like the Gemini models.
However, Gemma 3 likely follows the same format as the Gemini models because it is trained using the architecture of the Gemini models, meaning Gemma 3 models serve as the base versions of the Gemini models.
For more information, could you please refer this link.
Thank you.