Really dumb qustion

#1
by nonetrix - opened

How the heck do I upload images in SillyTavern? This is the first time I have used a vision model and I can't find anything useful on how to do it exactly?? I was able to find this and upload a image, but I don't think it's being passed to the model at all since I check what Kobold sees and it shows it with the image removed and the model says something completely unrelated

image.png

Seems like it should maybe be here?

image.png

I don't see it

The Chaotic Neutrals org

How the heck do I upload images in SillyTavern? This is the first time I have used a vision model and I can't find anything useful on how to do it exactly?? I was able to find this and upload a image, but I don't think it's being passed to the model at all since I check what Kobold sees and it shows it with the image removed and the model says something completely unrelated

image.png

Seems like it should maybe be here?

image.png

I don't see it

Generate caption, make sure your caption source is set to kcpp.

This comment has been hidden

Generate caption isn't inline? Is there no way to do it inline in text completion mode?

The Chaotic Neutrals org
edited Jun 14

Generate caption isn't inline? Is there no way to do it inline in text completion mode?

I dont handle how st does css elements for chat, i make models chief.

That isn't even what I was talking about but sure... That just captions the image and sends it to the model, a lot of information is likely lost I wasn't talking about CSS at all

The Chaotic Neutrals org
edited Jun 14

That isn't even what I was stalking about but sure... That just captions the image and sends it to the model, a lot of information is likely lost I wasn't talking about CSS at all

Usually when people talk about inline elements regarding ST the first thing i think of is how the chat displays.

The image portion of the model is inside the llava projector, and upcasted into the matrix of the main model during inference via text completions with kcpp/lcpp. (this is why you need the projector)

There are no multimodal tensors in the hathor weights uploaded here.

Whatever I give up lol

nonetrix changed discussion status to closed

Sign up or log in to comment