{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# chatflash 弱点查询" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 结论\n", "- Perception\n", " - Object Recognition: 76.8 -> 65.9 -> 50\n", " - Action Recognition:80.2 -> 64.7 -> 55.6\n", " - Attribute Perception: 82.8 -> 74.0 -> 55.6 (较少)\n", " - Spatial Perception:83.3 -> 47.6 (较少) -> 0.0 (较少)\n", " - Temporal Perception: 量较少\n", "\n", "Attribute Perception 似乎不太需要,medium 的分数就不错了\n", "\n", "- Reasoning\n", " - Object Reasoning: 75.0 -> 66.4 -> 52.9\n", " - Action Reasoning: 70.2 -> 65.5 -> 54.4\n", " - Temporal Reasoning:83.3 -> 49.3 -> 47.3 (较少)\n", " - Spatial Reasoning: 85.2 -> 77.8 -> 81.8\n", "\n", "- Others: \n", " - Counting Problem: 56.8 -> 38.9 -> 31.2\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import json\n", "\n", "path = '/share/minghao/VideoProjects/VideoChat-Flash/lmms-eval_videochat/videochat-flash-7B@448_eval_log_videomme.json'\n", "\n", "with open(path, 'r') as f:\n", " datas = json.load(f)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "score_task_type = {}\n", "score_duration_task_type = {\n", " 'short': {},\n", " 'medium': {},\n", " 'long': {}\n", "}\n", "\n", "for data in datas['logs']:\n", " duration = data['doc']['duration']\n", " task_type = data['doc']['task_type']\n", " pred_answer = data['videomme_percetion_score']['pred_answer']\n", " answer = data['videomme_percetion_score']['answer']\n", " if pred_answer == answer:\n", " correct_flag = True\n", " else:\n", " correct_flag = False\n", "\n", " if task_type not in score_task_type:\n", " if correct_flag:\n", " score_task_type[task_type] = {'Total':1, 'Correct':1}\n", " else:\n", " score_task_type[task_type] = {'Total':1, 'Correct':0}\n", " else:\n", " if correct_flag:\n", " score_task_type[task_type]['Total'] += 1\n", " score_task_type[task_type]['Correct'] += 1\n", " else:\n", " score_task_type[task_type]['Total'] += 1\n", "\n", "\n", " if task_type not in score_duration_task_type[duration]:\n", " if correct_flag:\n", " score_duration_task_type[duration][task_type] = {'Total':1, 'Correct':1}\n", " else:\n", " score_duration_task_type[duration][task_type] = {'Total':1, 'Correct':0}\n", " else:\n", " if correct_flag:\n", " score_duration_task_type[duration][task_type]['Total'] += 1\n", " score_duration_task_type[duration][task_type]['Correct'] += 1\n", " else:\n", " score_duration_task_type[duration][task_type]['Total'] += 1\n", "\n", "\n", "for task_type, info in score_task_type.items():\n", " info['Acc'] = round(info['Correct'] / info['Total'] * 100, 1)\n", " \n", "for duration, task_type_info in score_duration_task_type.items():\n", " for task_type, info in task_type_info.items():\n", " info['Acc'] = round(info['Correct'] / info['Total'] * 100, 1)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'Counting Problem': {'Total': 268, 'Correct': 123, 'Acc': 45.9},\n", " 'Information Synopsis': {'Total': 323, 'Correct': 259, 'Acc': 80.2},\n", " 'Object Recognition': {'Total': 354, 'Correct': 243, 'Acc': 68.6},\n", " 'Action Reasoning': {'Total': 285, 'Correct': 169, 'Acc': 59.3},\n", " 'Object Reasoning': {'Total': 454, 'Correct': 276, 'Acc': 60.8},\n", " 'Temporal Perception': {'Total': 55, 'Correct': 47, 'Acc': 85.5},\n", " 'Attribute Perception': {'Total': 222, 'Correct': 170, 'Acc': 76.6},\n", " 'Temporal Reasoning': {'Total': 177, 'Correct': 88, 'Acc': 49.7},\n", " 'Action Recognition': {'Total': 313, 'Correct': 217, 'Acc': 69.3},\n", " 'OCR Problems': {'Total': 139, 'Correct': 90, 'Acc': 64.7},\n", " 'Spatial Perception': {'Total': 54, 'Correct': 35, 'Acc': 64.8},\n", " 'Spatial Reasoning': {'Total': 56, 'Correct': 46, 'Acc': 82.1}}" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "score_task_type" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Temporal Reasoning: 49.7: short 的时候还好,但是视频长了就不行了\n", "# Counting Problem: 45.9: 视频时间越长,分数越低,short 时也远远小于平均分\n", "# Action Reasoning: 59.3: Action Reasoning 随着视频长度增加而增加,也是说明定位能力不行\n", "# Object Reasoning 60.8: 和 Action Reasoning 一样" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'short': {'Counting Problem': {'Total': 125, 'Correct': 71, 'Acc': 56.8},\n", " 'Information Synopsis': {'Total': 82, 'Correct': 72, 'Acc': 87.8},\n", " 'Object Recognition': {'Total': 168, 'Correct': 129, 'Acc': 76.8},\n", " 'Action Reasoning': {'Total': 47, 'Correct': 33, 'Acc': 70.2},\n", " 'Object Reasoning': {'Total': 80, 'Correct': 60, 'Acc': 75.0},\n", " 'Temporal Perception': {'Total': 18, 'Correct': 16, 'Acc': 88.9},\n", " 'Attribute Perception': {'Total': 122, 'Correct': 101, 'Acc': 82.8},\n", " 'Temporal Reasoning': {'Total': 13, 'Correct': 9, 'Acc': 69.2},\n", " 'Action Recognition': {'Total': 131, 'Correct': 105, 'Acc': 80.2},\n", " 'OCR Problems': {'Total': 57, 'Correct': 45, 'Acc': 78.9},\n", " 'Spatial Perception': {'Total': 30, 'Correct': 25, 'Acc': 83.3},\n", " 'Spatial Reasoning': {'Total': 27, 'Correct': 23, 'Acc': 85.2}},\n", " 'medium': {'Counting Problem': {'Total': 95, 'Correct': 37, 'Acc': 38.9},\n", " 'Object Reasoning': {'Total': 134, 'Correct': 89, 'Acc': 66.4},\n", " 'Object Recognition': {'Total': 132, 'Correct': 87, 'Acc': 65.9},\n", " 'OCR Problems': {'Total': 68, 'Correct': 40, 'Acc': 58.8},\n", " 'Temporal Reasoning': {'Total': 73, 'Correct': 36, 'Acc': 49.3},\n", " 'Information Synopsis': {'Total': 78, 'Correct': 64, 'Acc': 82.1},\n", " 'Action Reasoning': {'Total': 58, 'Correct': 38, 'Acc': 65.5},\n", " 'Spatial Perception': {'Total': 21, 'Correct': 10, 'Acc': 47.6},\n", " 'Attribute Perception': {'Total': 73, 'Correct': 54, 'Acc': 74.0},\n", " 'Action Recognition': {'Total': 119, 'Correct': 77, 'Acc': 64.7},\n", " 'Spatial Reasoning': {'Total': 18, 'Correct': 14, 'Acc': 77.8},\n", " 'Temporal Perception': {'Total': 31, 'Correct': 28, 'Acc': 90.3}},\n", " 'long': {'Information Synopsis': {'Total': 163, 'Correct': 123, 'Acc': 75.5},\n", " 'Object Reasoning': {'Total': 240, 'Correct': 127, 'Acc': 52.9},\n", " 'Attribute Perception': {'Total': 27, 'Correct': 15, 'Acc': 55.6},\n", " 'Action Reasoning': {'Total': 180, 'Correct': 98, 'Acc': 54.4},\n", " 'Temporal Reasoning': {'Total': 91, 'Correct': 43, 'Acc': 47.3},\n", " 'Object Recognition': {'Total': 54, 'Correct': 27, 'Acc': 50.0},\n", " 'Temporal Perception': {'Total': 6, 'Correct': 3, 'Acc': 50.0},\n", " 'Counting Problem': {'Total': 48, 'Correct': 15, 'Acc': 31.2},\n", " 'Action Recognition': {'Total': 63, 'Correct': 35, 'Acc': 55.6},\n", " 'Spatial Reasoning': {'Total': 11, 'Correct': 9, 'Acc': 81.8},\n", " 'OCR Problems': {'Total': 14, 'Correct': 5, 'Acc': 35.7},\n", " 'Spatial Perception': {'Total': 3, 'Correct': 0, 'Acc': 0.0}}}" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "score_duration_task_type" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "import json\n", "statics_path = f'/share/minghao/VideoProjects/Sythesis2/Videomme/videomme.json'\n", "with open(statics_path, 'w') as file:\n", " json.dump(score_duration_task_type, file, indent=4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 任务定义" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# videomme 先搞一下\n", "import torch\n", "import pandas as pd\n", "\n", "datas_path = '/share_2/mm_data_dir/data_1/Video-MME/videomme/test-00000-of-00001.parquet'\n", "df = pd.read_parquet(datas_path)\n", "datas_dict_list = df.to_dict(orient=\"records\")\n", "record_datas_by_task_type = {}\n", "\n", "target_duration = ['medium', 'long']\n", "\n", "for data in datas_dict_list:\n", " task_type = data['task_type']\n", " duration = data['duration']\n", " if duration not in target_duration:\n", " continue\n", " if task_type not in record_datas_by_task_type:\n", " record_datas_by_task_type[task_type] = [data]\n", " else:\n", " record_datas_by_task_type[task_type].append(data)\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "########## Temporal Reasoning ##########\n", "Q: What is the correct chronological order in which the following parts of the video appear?\n", "(a) Human lungs.\n", "(b) Protein folding distortion.\n", "(c) Mice, plants and cells.\n", "Options: A. (a)(b)(c). B. (c)(a)(b). C. (c)(b)(a). D. (b)(c)(a).\n", "Answer: C\n", "\n", "\n", "Q: What is the order in which the following long jump processes appear in the video?\n", "① 8.53m Rome 2018\n", "② 8.53m London 2018\n", "③ 8.56m Shanghai 2018\n", "④ 8.53m Birmingham 2018\n", "Options: A. ②①④③. B. ④①②③. C. ②③①④. D. ①②③④.\n", "Answer: A\n", "\n", "\n", "Q: In what order are the following planets introduced in the video?\n", "Options: A. Venus, Jupiter, Neptune. B. Mercury, Jupiter, Mars. C. Venus, Neptune, Jupiter. D. Jupiter, Mercury, Neptune.\n", "Answer: A\n", "\n", "\n", "Q: What is the second to last news item in this segment?\n", "Options: A. CBC agrees to meeting on safer-supply drugs. B. Kate Middleton photo manipulation concern. C. Entrepreneur explains why local radio isn't dead. D. Canadians stranded in Haiti as violence escalates.\n", "Answer: C\n", "\n", "\n", "Q: What is the order in which the following is introduced in the video?\n", "① Sleepwalking and the Brain.\n", "② How Much Control Do We Have of Our Brain?\n", "③ Emotions and the Brain.\n", "④ Creativity and the Brain.\n", "Options: A. ①②③④. B. ①③②④. C. ②①④③. D. ①③②④.\n", "Answer: B\n", "\n", "\n", "Q: Which of the following describes the heroine's weekly fitness programme correctly?\n", "Options: A. Glutes and hamstrings on Monday, pulls on Tuesday, Quads and calves on Wednesday, rest on Thursday when the lady goes to the hairdresser's and gets her hair done, push on Friday and Cardio and Core on Saturday. B. Glutes and hamstrings on Monday, push on Tuesday, Quads and calves on Wednesday, rest on Thursday when the lady goes to the hairdresser's and gets her hair done, pulls on Friday and Cardio and Core on Saturday. C. Glutes and hamstrings on Monday, pulls on Tuesday, Wednesday off while the lady goes to the hairdresser's and gets her hair done, Quads and calves on Thursday, push on Friday and Cardio and Core on Saturday. D. None of the above.\n", "Answer: B\n", "\n", "\n", "Q: In what sequence are the following topics introduced in this video?\n", "(a) Different types of grills.\n", "(b) Venting.\n", "(c) Lighting considerations.\n", "(d) Cooking options.\n", "(e) Outdoor kitchen configurations.\n", "Options: A. (b)(c)(a)(d)(d). B. (d)(a)(c)(b)(e). C. (a)(c)(b)(d)(e). D. (c)(b)(a)(d)(a).\n", "Answer: D\n", "\n", "\n", "Q: What activities are recorded sequentially in the video?\n", "Options: A. Explore ice sheet, take sled dog rides, visit residents. B. Explore ice sheet, visit residents, take sled dog rides. C. Take sled dog rides, visit residents, explore ice sheet. D. Visit residents, explore ice sheet, take sled dog rides.\n", "Answer: A\n", "\n", "\n", "Q: In which order do the following events happen in this video?\n", "(a) Mr. Bean falls in love with the beautiful singer Roxy and has a wish to get an autograph from her. His attempts are mostly foiled by the bodyguard until he manages to get a kiss mark from Roxy using her handkerchief and Bean is happy.\n", "(b) Whilst digging for treasure, Mr. Bean builds his own metal detector and goes to hunt for treasure but fails. When he manages to get the treasure and bring it to his flat.\n", "(c) Mr. Bean and Irma are off for a day at the seaside, where his trunk gets accidentally swapped with that of a stage magician.\n", "(d) Bean attends a hypnotism show. He unwittingly volunteers to be hypnotised; when the hypnotist makes him think he's a dog, he runs away and then back at home chases Scrapper around the garden and the house.\n", "Options: A. (b)(a)(c)(d). B. (a)(c)(d)(b). C. (c)(d)(a)(b). D. (d)(c)(b)(a).\n", "Answer: C\n", "\n", "\n", "Q: Which animals are introduced in sequence in the video?\n", "Options: A. Argentinosaurus, Giant Rhinoceros, Titanoboa, Leedsichthys. B. Leedsichthys, Argentinosaurus, Giant Rhinoceros, Titanoboa. C. Titanoboa, Leedsichthys, Argentinosaurus, Giant Rhinoceros. D. Giant Rhinoceros, Titanoboa, Leedsichthys, Argentinosaurus.\n", "Answer: D\n", "\n", "\n", "Q: When the people in the video saw the turtles for the first time, how many days into their journey was it?\n", "Options: A. Day 2. B. Day 3. C. Day 1. D. Day 4.\n", "Answer: B\n", "\n", "\n", "Q: In what order do the 2012 London men's 100m final, the 2018 Beijing men's 100m final, and the 2012 London women's 100m final appear in the video?\n", "Options: A. The 2008 men's 100m final in Beijing appeared first, then the 2012 London women's 100m final, and the 2012 London men's 100m final appeared last in the video. B. The 2012 London men's 100m final appears first, then the 2012 London women's 100m final, and the 2008 Beijing men's 100m final appears last in the video. C. The 2012 London women's 100m final appears first, then the 2012 London men's 100m final, and the 2008 Beijing men's 100m final appears last in the video. D. Neither.\n", "Answer: C\n", "\n", "\n", "Q: Which of the following options correctly sorts the sequence of events in the video?\n", "Options: A. Walking, eating, hanging laundry, watching friends unbox watches. B. Eating, hanging laundry, walking, watching friends unboxing watches. C. Hanging laundry, eating, walking, watching friends unboxing watches. D. Hanging laundry, eating, watching friends unboxing watches, taking a walk.\n", "Answer: D\n", "\n", "\n", "Q: What is the order in which the following is introduced in the video?\n", "① Insulin Resistance Explained.\n", "② Case studies.\n", "③ How to Reverse Type 2 Diabetes.\n", "Options: A. ①②③. B. ①③②. C. ③①②. D. ①③②.\n", "Answer: D\n", "\n", "\n", "Q: In what order do the following events appear in the video?\n", "①Receiving a parcel\n", "②Making bibimbap\n", "③Cleaning the floor\n", "Options: A. ①②③. B. ②①③. C. ②③①. D. ③②①.\n", "Answer: B\n", "\n", "\n", "Q: Which of the following is the correct order of events in the video?\n", "Options: A. The court receives case materials, defense attorneys meet with clients, key witnesses testify, and prosecutors compile evidence lists. B. The police arrested two suspects, the prosecutor discussed the case together, the defense lawyer discussed the composition of the jury, and the prosecutor produced a gun as evidence. C. Detectives collect evidence from the scene, prosecutors and police meet to discuss the case, and during court arguments, the defense raises objections to the evidence. D. Psychology experts testify in court, the prosecutor takes out a gun as evidence, the defense lawyer discusses the composition of the jury, and the prosecutor discusses the case together.\n", "Answer: B\n", "\n", "\n", "Q: According to this video, in which order do the following events happened when the performer made the first two pigeons?\n", "(a) He duplicated the pigeon and produced another.\n", "(b) He produced a pigeon from a burning leather.\n", "(c) He put the pigeons into a cage.\n", "Options: A. (a)(b)(c). B. (b)(c)(a). C. (c)(b)(a). D. (b)(a)(c).\n", "Answer: D\n", "\n", "\n", "Q: According to the order in the video, what is the fourth GPT introduced (except GPT Finder)?\n", "Options: A. Diagrams. B. Logo Creator. C. Prompt Perfect. D. GPT Finder.\n", "Answer: B\n", "\n", "\n", "Q: How often do the people in the video take water breaks per workout?\n", "Options: A. 10 minutes. B. 5 minutes. C. 15 minutes. D. No regular breaks.\n", "Answer: B\n", "\n", "\n", "Q: What is the order of the swimming strokes used by the athletes in the video?\n", "Options: A. Backstroke, Breaststroke, Butterfly, Freestyle. B. Breaststroke, Butterfly, Backstroke, Freestyle. C. Butterfly, Backstroke, Breaststroke, Freestyle. D. Butterfly, freestyle, backstroke, breaststroke.\n", "Answer: C\n", "\n", "\n" ] } ], "source": [ "import random\n", "\n", "check_task_type = ['Temporal Reasoning'] # 'Attribute Perception', 'Action Recognition'\n", "\n", "for task_type, datas in record_datas_by_task_type.items():\n", " if task_type not in check_task_type:\n", " continue\n", "\n", " print('#'*10, task_type,'#'*10)\n", " samples = random.sample(datas,20)\n", " for sample in samples:\n", " print(f'Q: {sample[\"question\"]}')\n", " print(f'Options: {\" \".join(sample[\"options\"])}')\n", " print(f'Answer: {sample[\"answer\"]}')\n", " print('\\n')" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.15" } }, "nbformat": 4, "nbformat_minor": 2 }