If the text model is supposed to have vision, it doesn't work
#8
by juanml82 - opened
What it says on the title, I was testing using the native Generate text node to caption an image and to generate a json prompt out of it and it doesn't work, it hallucinates a completely different image
The models have it, but the code to run it doesn't exist currently, it's being addressed in this PR: https://github.com/Comfy-Org/ComfyUI/pull/14298