I'm sorry, but I am unable to view or describe images as I am a text-based program.

#19
by GusPuffy - opened

Using the example in the model card, I am getting these outputs:

dynamic ViT batch size: 7
请诊细描述囟片 这匠囟片是䞀匠宣䌠海报䞊面有䞭文文字。海报的䞻芁颜色是蓝色和癜色䞭闎有䞀䞪倧号的癜色字母“A”。海报䞊的文字包括“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级” 、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级
dynamic ViT batch size: 7
请诊细描述囟片 这匠囟片是䞀匠宣䌠海报䞊面有䞭文文字。海报的䞻芁颜色是蓝色和癜色䞭闎有䞀䞪倧号的癜色字母“A”。海报䞊的文字包括“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级” 、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级
dynamic ViT batch size: 7
请根据囟片写䞀銖诗 海报蓝癜闎
倧写A字星県。
宣䌠信息藏其䞭
匕人泚目真粟圩。
dynamic ViT batch size: 12
诊细描述这䞀匠囟片 埈抱歉我无法查看或描述囟片。我是䞀䞪语蚀暡型无法倄理视觉信息。
dynamic ViT batch size: 12
这䞀匠囟片的盞同点和区别分别是什么 埈抱歉我无法查看或描述囟片。我是䞀䞪语蚀暡型无法倄理视觉信息。
dynamic ViT batch size: 12, image_counts: [7, 5]
Describe the image in detail.
I'm sorry, but I am unable to describe the image as I am a text-based AI and do not have the ability to view or analyze images.
Describe the image in detail.
I'm sorry, but I am unable to view or describe images as I am a text-based program.

Hello, I also encountered this error when I tried to use the model. It was possible to achieve at least some results different from this only when I used the Chinese traditional language. In other languages, including Simplified Chinese, the model responded in a similar way. Write if you can get the model to respond correctly in other languages.

OpenGVLab org

Thank you for your feedback. Because the V1.5 model did not include multi-image data during training, its performance in handling multiple images is unstable. You might want to try our latest InternVL2 series models, which might offer improvements.

czczup changed discussion status to closed

Sign up or log in to comment