IDEA-CCNL
/

Ziya-BLIP2-14B-Visual-v1

@@ -84,7 +84,7 @@ Firstly, the evaluation on the VQA effectiveness shows that the Ziya-Visual mode
 ![](assets/gqa.png)
-其次我们使用LLaVA[2]的做法利用GPT-4打分评价，该方法利用coco数据集中的caption和物体检测框信息输入给GPT-4；然后将Ziya-Visual和VisualGLM的图像问答的回答再输入到GPT-4，要求GPT-4从回答的有用性、相关性、准确性、细节程度进行评分（1-10分）；LLaVA中将对话任务划分为conv（简单对话），detail（细节对话）和complex（复杂推理），all是三种对话任务的综合平均分。最终评价结果如下，可以看到在简单对话和细节对话中，Ziya-Viusual优于VisualGLM，在复杂推理中略输于VisualGLM，最终总体平均结果优于VisualGLM。在对比mPLUG-Owl中我们得到的结论是类似的，Ziya-Visual总体平均结果优于mPLUG-Owl。
 Secondly, we used the LLaVA approach to score the evaluation using the GPT-4, which uses the caption and object detection box information from the coco dataset to input to the GPT-4; the responses to the image quiz from Ziya-Visual and VisualGLM are then input to the GPT-4, which is asked to score the responses in terms of usefulness, relevance, accuracy, and The responses were then fed back into GPT-4, which was asked to rate the responses in terms of usefulness, relevance, accuracy, and level of detail (on a scale of 1-10); LLaVA divided the dialogue tasks into conv (simple dialogue), detail (detailed dialogue) and complex (complex reasoning), and all was the combined average score of the three dialogue tasks. The final evaluation results are as follows, and it can be seen that Ziya-Viusual outperforms VisualGLM in simple and detail dialogues, slightly loses out to VisualGLM in complex reasoning, and finally outperforms VisualGLM in overall average results.
 In comparing mPLUG-Owl we reach a similar conclusion, with Ziya-Viusual outperforming mPLUG-Owl on average overall.

 ![](assets/gqa.png)
+其次我们使用LLaVA的做法利用GPT-4打分评价，该方法利用coco数据集中的caption和物体检测框信息输入给GPT-4；然后将Ziya-Visual和VisualGLM的图像问答的回答再输入到GPT-4，要求GPT-4从回答的有用性、相关性、准确性、细节程度进行评分（1-10分）；LLaVA中将对话任务划分为conv（简单对话），detail（细节对话）和complex（复杂推理），all是三种对话任务的综合平均分。最终评价结果如下，可以看到在简单对话和细节对话中，Ziya-Viusual优于VisualGLM，在复杂推理中略输于VisualGLM，最终总体平均结果优于VisualGLM。在对比mPLUG-Owl中我们得到的结论是类似的，Ziya-Visual总体平均结果优于mPLUG-Owl。
 Secondly, we used the LLaVA approach to score the evaluation using the GPT-4, which uses the caption and object detection box information from the coco dataset to input to the GPT-4; the responses to the image quiz from Ziya-Visual and VisualGLM are then input to the GPT-4, which is asked to score the responses in terms of usefulness, relevance, accuracy, and The responses were then fed back into GPT-4, which was asked to rate the responses in terms of usefulness, relevance, accuracy, and level of detail (on a scale of 1-10); LLaVA divided the dialogue tasks into conv (simple dialogue), detail (detailed dialogue) and complex (complex reasoning), and all was the combined average score of the three dialogue tasks. The final evaluation results are as follows, and it can be seen that Ziya-Viusual outperforms VisualGLM in simple and detail dialogues, slightly loses out to VisualGLM in complex reasoning, and finally outperforms VisualGLM in overall average results.
 In comparing mPLUG-Owl we reach a similar conclusion, with Ziya-Viusual outperforming mPLUG-Owl on average overall.