Spaces:

TIGER-Lab
/

VideoScore

Running on Zero

File size: 30,846 Bytes

e87d958

[
    {
        "id": "real_r003679",
        "images": [
            "r003679_00.jpg",
            "r003679_01.jpg",
            "r003679_02.jpg",
            "r003679_03.jpg",
            "r003679_04.jpg",
            "r003679_05.jpg",
            "r003679_06.jpg",
            "r003679_07.jpg",
            "r003679_08.jpg",
            "r003679_09.jpg",
            "r003679_10.jpg",
            "r003679_11.jpg",
            "r003679_12.jpg",
            "r003679_13.jpg",
            "r003679_14.jpg",
            "r003679_15.jpg",
            "r003679_16.jpg",
            "r003679_17.jpg",
            "r003679_18.jpg",
            "r003679_19.jpg",
            "r003679_20.jpg",
            "r003679_21.jpg",
            "r003679_22.jpg",
            "r003679_23.jpg"
        ],
        "conversations": [
            {
                "from": "human",
                "value": "\nSuppose you are an expert in judging and evaluating the quality of AI-generated videos, \nplease watch the following frames of a given video and see the text prompt for generating the video, \nthen give scores from 7 different dimensions:\n(1) visual quality, \n(2) object consistency,\n(3) dynamic degree,\n(4) motion smoothness,\n(5) text-to-video alignment,\n(6) factual consistency, \n(7) overall score\nfor each dimension, output a number from [1,2,3], in which '1' stands for 'Bad', '2' stands for 'Average', '3' stands for 'Good'.\nHere is an output example: \nvisual quality: 3\nobject consistency: 2 \ndynamic degree: 2\nmotion smoothness: 1\ntext-to-video alignment: 1\nfactual consistency: 2\noverall score: 1\n\nFor this item, the text prompt is dog rolls over and falls off couch,\nall the frames of video are as follows: \n\n<image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> \n"
            },
            {
                "from": "gpt",
                "value": "visual quality: 3\n object consistency: 3\n dynamic degree: 3\n motion smoothness: 3\n text-to-video alignment: 3\n factual consistency: 3\n overall score: 3\n"
            }
        ],
        "video": "r003679.mp4",
        "prompt": "dog rolls over and falls off couch"
    },
    {
        "id": "real_r100916",
        "images": [
            "r100916_00.jpg",
            "r100916_01.jpg",
            "r100916_02.jpg",
            "r100916_03.jpg",
            "r100916_04.jpg",
            "r100916_05.jpg",
            "r100916_06.jpg",
            "r100916_07.jpg",
            "r100916_08.jpg",
            "r100916_09.jpg",
            "r100916_10.jpg",
            "r100916_11.jpg",
            "r100916_12.jpg",
            "r100916_13.jpg",
            "r100916_14.jpg",
            "r100916_15.jpg"
        ],
        "conversations": [
            {
                "from": "human",
                "value": "\nSuppose you are an expert in judging and evaluating the quality of AI-generated videos, \nplease watch the following frames of a given video and see the text prompt for generating the video, \nthen give scores from 7 different dimensions:\n(1) visual quality, \n(2) object consistency,\n(3) dynamic degree,\n(4) motion smoothness,\n(5) text-to-video alignment,\n(6) factual consistency, \n(7) overall score\nfor each dimension, output a number from [1,2,3], in which '1' stands for 'Bad', '2' stands for 'Average', '3' stands for 'Good'.\nHere is an output example: \nvisual quality: 3\nobject consistency: 2 \ndynamic degree: 2\nmotion smoothness: 1\ntext-to-video alignment: 1\nfactual consistency: 2\noverall score: 1\n\nFor this item, the text prompt is A young man is standing in front of a glacier with mountains in the background.,\nall the frames of video are as follows: \n\n<image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> \n"
            },
            {
                "from": "gpt",
                "value": "visual quality: 3\n object consistency: 3\n dynamic degree: 3\n motion smoothness: 3\n text-to-video alignment: 3\n factual consistency: 3\n overall score: 3\n"
            }
        ],
        "video": "r100916.mp4",
        "prompt": "A young man is standing in front of a glacier with mountains in the background."
    },
    {
        "id": "real_r004061",
        "images": [
            "r004061_00.jpg",
            "r004061_01.jpg",
            "r004061_02.jpg",
            "r004061_03.jpg",
            "r004061_04.jpg",
            "r004061_05.jpg",
            "r004061_06.jpg",
            "r004061_07.jpg",
            "r004061_08.jpg",
            "r004061_09.jpg",
            "r004061_10.jpg",
            "r004061_11.jpg",
            "r004061_12.jpg",
            "r004061_13.jpg",
            "r004061_14.jpg",
            "r004061_15.jpg",
            "r004061_16.jpg",
            "r004061_17.jpg",
            "r004061_18.jpg",
            "r004061_19.jpg",
            "r004061_20.jpg",
            "r004061_21.jpg",
            "r004061_22.jpg",
            "r004061_23.jpg"
        ],
        "conversations": [
            {
                "from": "human",
                "value": "\nSuppose you are an expert in judging and evaluating the quality of AI-generated videos, \nplease watch the following frames of a given video and see the text prompt for generating the video, \nthen give scores from 7 different dimensions:\n(1) visual quality, \n(2) object consistency,\n(3) dynamic degree,\n(4) motion smoothness,\n(5) text-to-video alignment,\n(6) factual consistency, \n(7) overall score\nfor each dimension, output a number from [1,2,3], in which '1' stands for 'Bad', '2' stands for 'Average', '3' stands for 'Good'.\nHere is an output example: \nvisual quality: 3\nobject consistency: 2 \ndynamic degree: 2\nmotion smoothness: 1\ntext-to-video alignment: 1\nfactual consistency: 2\noverall score: 1\n\nFor this item, the text prompt is a man wearing a striped blanket over his backpack runs behind two other men.,\nall the frames of video are as follows: \n\n<image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> \n"
            },
            {
                "from": "gpt",
                "value": "visual quality: 3\n object consistency: 3\n dynamic degree: 3\n motion smoothness: 3\n text-to-video alignment: 3\n factual consistency: 3\n overall score: 3\n"
            }
        ],
        "video": "r004061.mp4",
        "prompt": "a man wearing a striped blanket over his backpack runs behind two other men."
    },
    {
        "id": "worsen_gen_b404675",
        "images": [
            "b404675_00.jpg",
            "b404675_01.jpg",
            "b404675_02.jpg",
            "b404675_03.jpg",
            "b404675_04.jpg",
            "b404675_05.jpg",
            "b404675_06.jpg",
            "b404675_07.jpg",
            "b404675_08.jpg",
            "b404675_09.jpg",
            "b404675_10.jpg",
            "b404675_11.jpg",
            "b404675_12.jpg",
            "b404675_13.jpg",
            "b404675_14.jpg",
            "b404675_15.jpg"
        ],
        "conversations": [
            {
                "from": "human",
                "value": "\nSuppose you are an expert in judging and evaluating the quality of AI-generated videos, \nplease watch the following frames of a given video and see the text prompt for generating the video, \nthen give scores from 7 different dimensions:\n(1) visual quality, \n(2) object consistency,\n(3) dynamic degree,\n(4) motion smoothness,\n(5) text-to-video alignment,\n(6) factual consistency, \n(7) overall score\nfor each dimension, output a number from [1,2,3], in which '1' stands for 'Bad', '2' stands for 'Average', '3' stands for 'Good'.\nHere is an output example: \nvisual quality: 3\nobject consistency: 2 \ndynamic degree: 2\nmotion smoothness: 1\ntext-to-video alignment: 1\nfactual consistency: 2\noverall score: 1\n\nFor this item, the text prompt is After a few years, Ghulam Muhammad became the owner of his business  ,\nall the frames of video are as follows: \n\n<image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> \n"
            },
            {
                "from": "gpt",
                "value": "visual quality: 1\n object consistency: 1\n dynamic degree: 3\n motion smoothness: 3\n text-to-video alignment: 2\n factual consistency: 3\n overall score: 1\n"
            }
        ],
        "video": "b404675.mp4",
        "prompt": "After a few years, Ghulam Muhammad became the owner of his business"
    },
    {
        "id": "worsen_gen_b402727",
        "images": [
            "b402727_00.jpg",
            "b402727_01.jpg",
            "b402727_02.jpg",
            "b402727_03.jpg",
            "b402727_04.jpg",
            "b402727_05.jpg",
            "b402727_06.jpg",
            "b402727_07.jpg",
            "b402727_08.jpg",
            "b402727_09.jpg",
            "b402727_10.jpg",
            "b402727_11.jpg",
            "b402727_12.jpg",
            "b402727_13.jpg",
            "b402727_14.jpg",
            "b402727_15.jpg"
        ],
        "conversations": [
            {
                "from": "human",
                "value": "\nSuppose you are an expert in judging and evaluating the quality of AI-generated videos, \nplease watch the following frames of a given video and see the text prompt for generating the video, \nthen give scores from 7 different dimensions:\n(1) visual quality, \n(2) object consistency,\n(3) dynamic degree,\n(4) motion smoothness,\n(5) text-to-video alignment,\n(6) factual consistency, \n(7) overall score\nfor each dimension, output a number from [1,2,3], in which '1' stands for 'Bad', '2' stands for 'Average', '3' stands for 'Good'.\nHere is an output example: \nvisual quality: 3\nobject consistency: 2 \ndynamic degree: 2\nmotion smoothness: 1\ntext-to-video alignment: 1\nfactual consistency: 2\noverall score: 1\n\nFor this item, the text prompt is view down to the path in the field, moving forward, night, dark.  ,\nall the frames of video are as follows: \n\n<image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> \n"
            },
            {
                "from": "gpt",
                "value": "visual quality: 1\n object consistency: 1\n dynamic degree: 3\n motion smoothness: 3\n text-to-video alignment: 3\n factual consistency: 3\n overall score: 1\n"
            }
        ],
        "video": "b402727.mp4",
        "prompt": "view down to the path in the field, moving forward, night, dark."
    },
    {
        "id": "worsen_gen_b304986",
        "images": [
            "b304986_00.jpg",
            "b304986_01.jpg",
            "b304986_02.jpg",
            "b304986_03.jpg",
            "b304986_04.jpg",
            "b304986_05.jpg",
            "b304986_06.jpg",
            "b304986_07.jpg",
            "b304986_08.jpg",
            "b304986_09.jpg",
            "b304986_10.jpg",
            "b304986_11.jpg",
            "b304986_12.jpg",
            "b304986_13.jpg",
            "b304986_14.jpg",
            "b304986_15.jpg"
        ],
        "conversations": [
            {
                "from": "human",
                "value": "\nSuppose you are an expert in judging and evaluating the quality of AI-generated videos, \nplease watch the following frames of a given video and see the text prompt for generating the video, \nthen give scores from 7 different dimensions:\n(1) visual quality, \n(2) object consistency,\n(3) dynamic degree,\n(4) motion smoothness,\n(5) text-to-video alignment,\n(6) factual consistency, \n(7) overall score\nfor each dimension, output a number from [1,2,3], in which '1' stands for 'Bad', '2' stands for 'Average', '3' stands for 'Good'.\nHere is an output example: \nvisual quality: 3\nobject consistency: 2 \ndynamic degree: 2\nmotion smoothness: 1\ntext-to-video alignment: 1\nfactual consistency: 2\noverall score: 1\n\nFor this item, the text prompt is Old Philippino man, close up, subtle smile, teats of joy, in old wood room  ,\nall the frames of video are as follows: \n\n<image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> \n"
            },
            {
                "from": "gpt",
                "value": "visual quality: 3\n object consistency: 1\n dynamic degree: 3\n motion smoothness: 1\n text-to-video alignment: 2\n factual consistency: 3\n overall score: 1\n"
            }
        ],
        "video": "b304986.mp4",
        "prompt": "Old Philippino man, close up, subtle smile, teats of joy, in old wood room"
    },
    {
        "id": "static_d500937",
        "images": [
            "d500937_00.jpg",
            "d500937_01.jpg",
            "d500937_02.jpg",
            "d500937_03.jpg",
            "d500937_04.jpg",
            "d500937_05.jpg",
            "d500937_06.jpg",
            "d500937_07.jpg",
            "d500937_08.jpg",
            "d500937_09.jpg",
            "d500937_10.jpg",
            "d500937_11.jpg",
            "d500937_12.jpg",
            "d500937_13.jpg",
            "d500937_14.jpg",
            "d500937_15.jpg"
        ],
        "conversations": [
            {
                "from": "human",
                "value": "\nSuppose you are an expert in judging and evaluating the quality of AI-generated videos, \nplease watch the following frames of a given video and see the text prompt for generating the video, \nthen give scores from 7 different dimensions:\n(1) visual quality, \n(2) object consistency,\n(3) dynamic degree,\n(4) motion smoothness,\n(5) text-to-video alignment,\n(6) factual consistency, \n(7) overall score\nfor each dimension, output a number from [1,2,3], in which '1' stands for 'Bad', '2' stands for 'Average', '3' stands for 'Good'.\nHere is an output example: \nvisual quality: 3\nobject consistency: 2 \ndynamic degree: 2\nmotion smoothness: 1\ntext-to-video alignment: 1\nfactual consistency: 2\noverall score: 1\n\nFor this item, the text prompt is write y in the garden with flowers  ,\nall the frames of video are as follows: \n\n<image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> \n"
            },
            {
                "from": "gpt",
                "value": "visual quality: 1\n object consistency: 3\n dynamic degree: 1\n motion smoothness: 1\n text-to-video alignment: 1\n factual consistency: 1\n overall score: 1\n"
            }
        ],
        "video": "d500937.mp4",
        "prompt": "write y in the garden with flowers"
    },
    {
        "id": "static_d500506",
        "images": [
            "d500506_00.jpg",
            "d500506_01.jpg",
            "d500506_02.jpg",
            "d500506_03.jpg",
            "d500506_04.jpg",
            "d500506_05.jpg",
            "d500506_06.jpg",
            "d500506_07.jpg",
            "d500506_08.jpg",
            "d500506_09.jpg",
            "d500506_10.jpg",
            "d500506_11.jpg",
            "d500506_12.jpg",
            "d500506_13.jpg",
            "d500506_14.jpg",
            "d500506_15.jpg"
        ],
        "conversations": [
            {
                "from": "human",
                "value": "\nSuppose you are an expert in judging and evaluating the quality of AI-generated videos, \nplease watch the following frames of a given video and see the text prompt for generating the video, \nthen give scores from 7 different dimensions:\n(1) visual quality, \n(2) object consistency,\n(3) dynamic degree,\n(4) motion smoothness,\n(5) text-to-video alignment,\n(6) factual consistency, \n(7) overall score\nfor each dimension, output a number from [1,2,3], in which '1' stands for 'Bad', '2' stands for 'Average', '3' stands for 'Good'.\nHere is an output example: \nvisual quality: 3\nobject consistency: 2 \ndynamic degree: 2\nmotion smoothness: 1\ntext-to-video alignment: 1\nfactual consistency: 2\noverall score: 1\n\nFor this item, the text prompt is a boy paiting a beautiful landscape  ,\nall the frames of video are as follows: \n\n<image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> \n"
            },
            {
                "from": "gpt",
                "value": "visual quality: 1\n object consistency: 3\n dynamic degree: 1\n motion smoothness: 1\n text-to-video alignment: 1\n factual consistency: 1\n overall score: 1\n"
            }
        ],
        "video": "d500506.mp4",
        "prompt": "a boy paiting a beautiful landscape"
    },
    {
        "id": "static_d401950",
        "images": [
            "d401950_00.jpg",
            "d401950_01.jpg",
            "d401950_02.jpg",
            "d401950_03.jpg",
            "d401950_04.jpg",
            "d401950_05.jpg",
            "d401950_06.jpg",
            "d401950_07.jpg",
            "d401950_08.jpg",
            "d401950_09.jpg",
            "d401950_10.jpg",
            "d401950_11.jpg",
            "d401950_12.jpg",
            "d401950_13.jpg",
            "d401950_14.jpg",
            "d401950_15.jpg"
        ],
        "conversations": [
            {
                "from": "human",
                "value": "\nSuppose you are an expert in judging and evaluating the quality of AI-generated videos, \nplease watch the following frames of a given video and see the text prompt for generating the video, \nthen give scores from 7 different dimensions:\n(1) visual quality, \n(2) object consistency,\n(3) dynamic degree,\n(4) motion smoothness,\n(5) text-to-video alignment,\n(6) factual consistency, \n(7) overall score\nfor each dimension, output a number from [1,2,3], in which '1' stands for 'Bad', '2' stands for 'Average', '3' stands for 'Good'.\nHere is an output example: \nvisual quality: 3\nobject consistency: 2 \ndynamic degree: 2\nmotion smoothness: 1\ntext-to-video alignment: 1\nfactual consistency: 2\noverall score: 1\n\nFor this item, the text prompt is sneaker with exclusive design worth 1 billion  ,\nall the frames of video are as follows: \n\n<image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> \n"
            },
            {
                "from": "gpt",
                "value": "visual quality: 1\n object consistency: 3\n dynamic degree: 1\n motion smoothness: 1\n text-to-video alignment: 1\n factual consistency: 1\n overall score: 1\n"
            }
        ],
        "video": "d401950.mp4",
        "prompt": "sneaker with exclusive design worth 1 billion"
    },
    {
        "id": "insf_a500251",
        "images": [
            "a500251_00.jpg",
            "a500251_01.jpg",
            "a500251_02.jpg",
            "a500251_03.jpg",
            "a500251_04.jpg",
            "a500251_05.jpg",
            "a500251_06.jpg",
            "a500251_07.jpg",
            "a500251_08.jpg",
            "a500251_09.jpg",
            "a500251_10.jpg",
            "a500251_11.jpg",
            "a500251_12.jpg",
            "a500251_13.jpg",
            "a500251_14.jpg",
            "a500251_15.jpg"
        ],
        "conversations": [
            {
                "from": "human",
                "value": "\nSuppose you are an expert in judging and evaluating the quality of AI-generated videos, \nplease watch the following frames of a given video and see the text prompt for generating the video, \nthen give scores from 7 different dimensions:\n(1) visual quality, \n(2) object consistency,\n(3) dynamic degree,\n(4) motion smoothness,\n(5) text-to-video alignment,\n(6) factual consistency, \n(7) overall score\nfor each dimension, output a number from [1,2,3], in which '1' stands for 'Bad', '2' stands for 'Average', '3' stands for 'Good'.\nHere is an output example: \nvisual quality: 3\nobject consistency: 2 \ndynamic degree: 2\nmotion smoothness: 1\ntext-to-video alignment: 1\nfactual consistency: 2\noverall score: 1\n\nFor this item, the text prompt is :sci fi war in the moon  ,\nall the frames of video are as follows: \n\n<image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> \n"
            },
            {
                "from": "gpt",
                "value": "visual quality: 1\n object consistency: 1\n dynamic degree: 3\n motion smoothness: 1\n text-to-video alignment: 1\n factual consistency: 1\n overall score: 1\n"
            }
        ],
        "video": "a500251.mp4",
        "prompt": ":sci fi war in the moon"
    },
    {
        "id": "insf_a400480",
        "images": [
            "a400480_00.jpg",
            "a400480_01.jpg",
            "a400480_02.jpg",
            "a400480_03.jpg",
            "a400480_04.jpg",
            "a400480_05.jpg",
            "a400480_06.jpg",
            "a400480_07.jpg",
            "a400480_08.jpg",
            "a400480_09.jpg",
            "a400480_10.jpg",
            "a400480_11.jpg",
            "a400480_12.jpg",
            "a400480_13.jpg",
            "a400480_14.jpg",
            "a400480_15.jpg"
        ],
        "conversations": [
            {
                "from": "human",
                "value": "\nSuppose you are an expert in judging and evaluating the quality of AI-generated videos, \nplease watch the following frames of a given video and see the text prompt for generating the video, \nthen give scores from 7 different dimensions:\n(1) visual quality, \n(2) object consistency,\n(3) dynamic degree,\n(4) motion smoothness,\n(5) text-to-video alignment,\n(6) factual consistency, \n(7) overall score\nfor each dimension, output a number from [1,2,3], in which '1' stands for 'Bad', '2' stands for 'Average', '3' stands for 'Good'.\nHere is an output example: \nvisual quality: 3\nobject consistency: 2 \ndynamic degree: 2\nmotion smoothness: 1\ntext-to-video alignment: 1\nfactual consistency: 2\noverall score: 1\n\nFor this item, the text prompt is for a dog accessories brand a welcome image: brand name: hot doggies  ,\nall the frames of video are as follows: \n\n<image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> \n"
            },
            {
                "from": "gpt",
                "value": "visual quality: 1\n object consistency: 1\n dynamic degree: 3\n motion smoothness: 1\n text-to-video alignment: 1\n factual consistency: 1\n overall score: 1\n"
            }
        ],
        "video": "a400480.mp4",
        "prompt": "for a dog accessories brand a welcome image: brand name: hot doggies"
    },
    {
        "id": "insf_a500010",
        "images": [
            "a500010_00.jpg",
            "a500010_01.jpg",
            "a500010_02.jpg",
            "a500010_03.jpg",
            "a500010_04.jpg",
            "a500010_05.jpg",
            "a500010_06.jpg",
            "a500010_07.jpg",
            "a500010_08.jpg",
            "a500010_09.jpg",
            "a500010_10.jpg",
            "a500010_11.jpg",
            "a500010_12.jpg",
            "a500010_13.jpg",
            "a500010_14.jpg",
            "a500010_15.jpg"
        ],
        "conversations": [
            {
                "from": "human",
                "value": "\nSuppose you are an expert in judging and evaluating the quality of AI-generated videos, \nplease watch the following frames of a given video and see the text prompt for generating the video, \nthen give scores from 7 different dimensions:\n(1) visual quality, \n(2) object consistency,\n(3) dynamic degree,\n(4) motion smoothness,\n(5) text-to-video alignment,\n(6) factual consistency, \n(7) overall score\nfor each dimension, output a number from [1,2,3], in which '1' stands for 'Bad', '2' stands for 'Average', '3' stands for 'Good'.\nHere is an output example: \nvisual quality: 3\nobject consistency: 2 \ndynamic degree: 2\nmotion smoothness: 1\ntext-to-video alignment: 1\nfactual consistency: 2\noverall score: 1\n\nFor this item, the text prompt is a druid in the forest cinematic  ,\nall the frames of video are as follows: \n\n<image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> \n"
            },
            {
                "from": "gpt",
                "value": "visual quality: 1\n object consistency: 1\n dynamic degree: 3\n motion smoothness: 1\n text-to-video alignment: 1\n factual consistency: 1\n overall score: 1\n"
            }
        ],
        "video": "a500010.mp4",
        "prompt": "a druid in the forest cinematic"
    },
    {
        "id": "lab_3005033",
        "images": [
            "3005033_00.jpg",
            "3005033_01.jpg",
            "3005033_02.jpg",
            "3005033_03.jpg",
            "3005033_04.jpg",
            "3005033_05.jpg",
            "3005033_06.jpg",
            "3005033_07.jpg",
            "3005033_08.jpg",
            "3005033_09.jpg",
            "3005033_10.jpg",
            "3005033_11.jpg",
            "3005033_12.jpg",
            "3005033_13.jpg",
            "3005033_14.jpg",
            "3005033_15.jpg"
        ],
        "conversations": [
            {
                "from": "human",
                "value": "\nSuppose you are an expert in judging and evaluating the quality of AI-generated videos, \nplease watch the following frames of a given video and see the text prompt for generating the video, \nthen give scores from 7 different dimensions:\n(1) visual quality, \n(2) object consistency,\n(3) dynamic degree,\n(4) motion smoothness,\n(5) text-to-video alignment,\n(6) factual consistency, \n(7) overall score\nfor each dimension, output a number from [1,2,3], in which '1' stands for 'Bad', '2' stands for 'Average', '3' stands for 'Good'.\nHere is an output example: \nvisual quality: 3\nobject consistency: 2 \ndynamic degree: 2\nmotion smoothness: 1\ntext-to-video alignment: 1\nfactual consistency: 2\noverall score: 1\n\nFor this item, the text prompt is a robot vacuum traveling through southeast asian countries  ,\nall the frames of video are as follows: \n\n<image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> \n"
            },
            {
                "from": "gpt",
                "value": "visual quality: 3\n object consistency: 2\n dynamic degree: 3\n motion smoothness: 2\n text-to-video alignment: 2\n factual consistency: 3\n overall score: 2\n"
            }
        ],
        "video": "3005033.mp4",
        "prompt": "a robot vacuum traveling through southeast asian countries"
    },
    {
        "id": "lab_7004180",
        "images": [
            "7004180_00.jpg",
            "7004180_01.jpg",
            "7004180_02.jpg",
            "7004180_03.jpg",
            "7004180_04.jpg",
            "7004180_05.jpg",
            "7004180_06.jpg",
            "7004180_07.jpg"
        ],
        "conversations": [
            {
                "from": "human",
                "value": "\nSuppose you are an expert in judging and evaluating the quality of AI-generated videos, \nplease watch the following frames of a given video and see the text prompt for generating the video, \nthen give scores from 7 different dimensions:\n(1) visual quality, \n(2) object consistency,\n(3) dynamic degree,\n(4) motion smoothness,\n(5) text-to-video alignment,\n(6) factual consistency, \n(7) overall score\nfor each dimension, output a number from [1,2,3], in which '1' stands for 'Bad', '2' stands for 'Average', '3' stands for 'Good'.\nHere is an output example: \nvisual quality: 3\nobject consistency: 2 \ndynamic degree: 2\nmotion smoothness: 1\ntext-to-video alignment: 1\nfactual consistency: 2\noverall score: 1\n\nFor this item, the text prompt is jiaxuan is the most beautiful \uff0celegant\uff0ccharming girl in the world. billions of boys fell in love with her  ,\nall the frames of video are as follows: \n\n<image> <image> <image> <image> <image> <image> <image> <image> \n"
            },
            {
                "from": "gpt",
                "value": "visual quality: 3\n object consistency: 3\n dynamic degree: 3\n motion smoothness: 3\n text-to-video alignment: 3\n factual consistency: 3\n overall score: 3\n"
            }
        ],
        "video": "7004180.mp4",
        "prompt": "jiaxuan is the most beautiful \uff0celegant\uff0ccharming girl in the world. billions of boys fell in love with her"
    },
    {
        "id": "lab_1006309",
        "images": [
            "1006309_00.jpg",
            "1006309_01.jpg",
            "1006309_02.jpg",
            "1006309_03.jpg",
            "1006309_04.jpg",
            "1006309_05.jpg",
            "1006309_06.jpg",
            "1006309_07.jpg",
            "1006309_08.jpg",
            "1006309_09.jpg",
            "1006309_10.jpg",
            "1006309_11.jpg",
            "1006309_12.jpg",
            "1006309_13.jpg",
            "1006309_14.jpg",
            "1006309_15.jpg"
        ],
        "conversations": [
            {
                "from": "human",
                "value": "\nSuppose you are an expert in judging and evaluating the quality of AI-generated videos, \nplease watch the following frames of a given video and see the text prompt for generating the video, \nthen give scores from 7 different dimensions:\n(1) visual quality, \n(2) object consistency,\n(3) dynamic degree,\n(4) motion smoothness,\n(5) text-to-video alignment,\n(6) factual consistency, \n(7) overall score\nfor each dimension, output a number from [1,2,3], in which '1' stands for 'Bad', '2' stands for 'Average', '3' stands for 'Good'.\nHere is an output example: \nvisual quality: 3\nobject consistency: 2 \ndynamic degree: 2\nmotion smoothness: 1\ntext-to-video alignment: 1\nfactual consistency: 2\noverall score: 1\n\nFor this item, the text prompt is scary halloween skeletons are having a conversation with each other  ,\nall the frames of video are as follows: \n\n<image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> <image> \n"
            },
            {
                "from": "gpt",
                "value": "visual quality: 3\n object consistency: 2\n dynamic degree: 3\n motion smoothness: 2\n text-to-video alignment: 3\n factual consistency: 2\n overall score: 3\n"
            }
        ],
        "video": "1006309.mp4",
        "prompt": "scary halloween skeletons are having a conversation with each other"
    }
]