Different input resolutions?

#4
by TaylorBrown2686 - opened

Hi,

I have tested my onnx file on Jupyter and noticed that it can work with any input resolution I provide it. I tried with many images with varying objects and sizes and it all looks great.

When translating this to Unity and passing in resolutions other than 640x640, it just seems to break and draw chaotic boxes in random spots with no order to it. How can I go about using a resolution that is not 640x640 and outputting the proper boxes as shown in the Jupyter runtime of the same model?

Unity Technologies org

Can you share the jupyter code? Perhaps it is resizing the image before passing it into the model?

It's quite simple. I trained like this (and exported):

%pip install ultralytics
import ultralytics
ultralytics.checks()
!yolo train model=yolov8n.pt data=coco.yaml epochs=3 imgsz=640
!yolo export model=yolov8n.pt format=onnx

Then proceeded to run the onnx file specifically via:

!yolo predict model=yolov8n.onnx source='street_test.jpg'

I haven't worked with other similar AI models before and my development experience isn't in AI or CV, so this may be oversimplified and I could be misinterpreting what is going on behind the scenes. This current setup can process any image perfectly though. It only begins to show issues when imported into Unity and relying on Sentis as the inference engine.

EDIT: I should note that this was trained with the COCO dataset which defaults all the images to 640x640 resolution.

Unity Technologies org

The input to the model must always be the same as what it was trained on. However the image you display on the screen can be any resolution. So for example you can display a high definition video on the screen, but when passing it to the model using ToTensor it gets converted to the correct size behind the scenes. This is the common way of doing things to not overload the models with too much data.

Great, that makes sense. Since I see good results from running the model in Jupyter this shouldn't cause any issues even when squishing the image down, which causes some distortion as a result.

I will report back if I encounter any more issues with implementing this. Thanks for the help!

Sign up or log in to comment