If the text model is supposed to have vision, it doesn't work

by juanml82 - opened 1 day ago

What it says on the title, I was testing using the native Generate text node to caption an image and to generate a json prompt out of it and it doesn't work, it hallucinates a completely different image

Kijai

Comfy Org org 1 day ago

The models have it, but the code to run it doesn't exist currently, it's being addressed in this PR: https://github.com/Comfy-Org/ComfyUI/pull/14298

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment