Text Generation
Transformers
PyTorch
English
llava_stablelm_epoch
Multimodal
StableLM
Inference Endpoints

any idea how to make it describe what's physically there and omit the artistic critique?

#4
by nawal2 - opened

Using this image:

IMG_20231210_220649_640x480.png

and this llama.cpp code:

 ./server -m models/obsidian-f16.gguf --mmproj models/mmproj-obsidian-f16.gguf --host 0.0.0.0  -ngl 42

and this prompt:

<|im_start|>user
What does this image contain? Describe each item, including the color in the description. Only describe physical objects present in the image. Do not make any other comments.\n[img-1]
###
<|im_start|>assistant''',

it replies:

The image features a wooden table with three Legos blocks on it. Two of the blocks are red, and one is blue. They are placed in a way that makes them look like they are standing up against a white wall. This arrangement creates an interesting visual effect that adds depth to the scene. The Legos are positioned in such a way that they appear to be looking at the camera, capturing attention with their vibrant colors and unique design.

This is convincing and all, but I don't really want it to offer a judgement of the blocks. Has anyone had any success making it obey?

Sign up or log in to comment