Update non-LFS files

Browse files

Files changed (7) hide show

.mdl +0 -0
.msc +0 -0
.mv +0 -1
LICENSE +75 -0
LLAMA3_LICENSE +117 -0
README.md +86 -0
README_zh.md +65 -0

.mdl DELETED Viewed

Binary file (59 Bytes)

.msc DELETED Viewed

Binary file (1.26 kB)

.mv DELETED Viewed

	@@ -1 +0,0 @@
1	- Revision:master,CreatedAt:1719926614

LICENSE ADDED Viewed

	@@ -0,0 +1,75 @@

+The CogVLM License
+1. Definitions
+“Licensor” means the CogVLM Model Team that distributes its Software.
+“Software” means the CogVLM model parameters made available under this license.
+2. License Grant
+Under the terms and conditions of this license, the Licensor hereby grants you a non-exclusive, worldwide, non-transferable, non-sublicensable, revocable, royalty-free copyright license.
+This license permits you to use all open-source models in this repository for academic research free. Users who wish to use the models for commercial purposes must register [here](https://open.bigmodel.cn/mla/form).
+Registered users may use the models for commercial activities free of charge, but must comply with all terms and conditions of this license.
+The license notice shall be included in all copies or substantial portions of the Software.
+3. Restriction
+You will not use, copy, modify, merge, publish, distribute, reproduce, or create derivative works of the Software, in whole or in part, for any military, or illegal purposes.
+You will not use the Software for any act that may undermine China's national security and national unity, harm the public interest of society, or infringe upon the rights and interests of human beings.
+4. Disclaimer
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+5. Limitation of Liability
+EXCEPT TO THE EXTENT PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL THEORY, WHETHER BASED IN TORT, NEGLIGENCE, CONTRACT, LIABILITY, OR OTHERWISE WILL ANY LICENSOR BE LIABLE TO YOU FOR ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES, OR ANY OTHER COMMERCIAL LOSSES, EVEN IF THE LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
+6. Dispute Resolution
+This license shall be governed and construed in accordance with the laws of People’s Republic of China. Any dispute arising from or in connection with this License shall be submitted to Haidian District People's Court in Beijing.
+Note that the license is subject to update to a more comprehensive version.  For any questions related to the license and copyright, please contact us at license@zhipuai.cn.
+7. Llama3 and EVA-CLIP2 License
+For the CogVLM2 open source model based on the LLama3 series model as the base model, the Llama3 license conditions (https://llama.meta.com/llama3/license/, a copy of this repository license conditions) and the EVA-CLIP2 license conditions (MIT , https://github.com/baaivision/EVA/blob/master/LICENSE) for model weights.
+1. 定义
+“许可方”是指分发其软件的 CogVLM 模型团队。
+“软件”是指根据本许可提供的 CogVLM 模型参数。
+2. 许可授予
+根据本许可的条款和条件，许可方特此授予您非排他性、全球性、不可转让、不可再许可、可撤销、免版税的版权许可。
+本许可允许您免费使用本仓库中的所有开源模型进行学术研究，对于希望将模型用于商业目的的用户，需在[这里](https://open.bigmodel.cn/mla/form)完成登记。
+经过登记的用户可以免费使用本模型进行商业活动，但必须遵守本许可的所有条款和条件。
+上述版权声明和本许可声明应包含在本软件的所有副本或重要部分中。
+3.限制
+您不得出于任何军事或非法目的使用、复制、修改、合并、发布、分发、复制或创建本软件的全部或部分衍生作品。
+您不得利用本软件从事任何危害国家安全和国家统一、危害社会公共利益、侵犯人身权益的行为。
+4.免责声明
+本软件“按原样”提供，不提供任何明示或暗示的保证，包括但不限于对适销性、特定用途的适用性和非侵权性的保证。 在任何情况下，作者或版权持有人均不对任何索赔、损害或其他责任负责，无论是在合同诉讼、侵权行为还是其他方面，由软件或软件的使用或其他交易引起、由软件引起或与之相关 软件。
+5. 责任限制
+除适用法律禁止的范围外，在任何情况下且根据任何法律理论，无论是基于侵权行为、疏忽、合同、责任或其他原因，任何许可方均不对您承担任何直接、间接、特殊、偶然、示范性、 或间接损害，或任何其他商业损失，即使许可人已被告知此类损害的可能性。
+6.争议解决
+本许可受中华人民共和国法律管辖并按其解释。 因本许可引起的或与本许可有关的任何争议应提交北京市海淀区人民法院。
+请注意，许可证可能会更新到更全面的版本。 有关许可和版权的任何问题，请���过 license@zhipuai.cn 与我们联系。
+7. Llama3 和 EVA-CLIP2 许可
+针对基于以 LLama3 系列模型作为基座模型的 CogVLM2 开源模型， Llama3 许可条件 (https://llama.meta.com/llama3/license/ ，本仓库副本一份许可条件) 和 EVA-CLIP2 许可条件 (MIT, https://github.com/baaivision/EVA/blob/master/LICENSE) 适用于模型权重。

LLAMA3_LICENSE ADDED Viewed

	@@ -0,0 +1,117 @@

+META LLAMA 3 COMMUNITY LICENSE AGREEMENT
+Meta Llama 3 Version Release Date: April 18, 2024
+“Agreement” means the terms and conditions for use, reproduction, distribution and modification of the
+Llama Materials set forth herein.
+“Documentation” means the specifications, manuals and documentation accompanying Meta Llama 3
+distributed by Meta at https://llama.meta.com/get-started/.
+“Licensee” or “you” means you, or your employer or any other person or entity (if you are entering into
+this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or
+regulations to provide legal consent and that has legal authority to bind your employer or such other
+person or entity if you are entering in this Agreement on their behalf.
+“Meta Llama 3” means the foundational large language models and software and algorithms, including
+machine-learning model code, trained model weights, inference-enabling code, training-enabling code,
+fine-tuning enabling code and other elements of the foregoing distributed by Meta at
+https://llama.meta.com/llama-downloads.
+“Llama Materials” means, collectively, Meta’s proprietary Meta Llama 3 and Documentation (and any
+portion thereof) made available under this Agreement.
+“Meta” or “we” means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your
+principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located
+outside of the EEA or Switzerland).
+By clicking “I Accept” below or by using or distributing any portion or element of the Llama Materials,
+you agree to be bound by this Agreement.
+1. License Rights and Redistribution.
+  a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free
+limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama
+Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the
+Llama Materials.
+  b. Redistribution and Use.
+      i. If you distribute or make available the Llama Materials (or any derivative works
+thereof), or a product or service that uses any of them, including another AI model, you shall (A) provide
+a copy of this Agreement with any such Llama Materials; and (B) prominently display “Built with Meta
+Llama 3” on a related website, user interface, blogpost, about page, or product documentation. If you
+use the Llama Materials to create, train, fine tune, or otherwise improve an AI model, which is
+distributed or made available, you shall also include “Llama 3” at the beginning of any such AI model
+name.
+      ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part
+of an integrated end user product, then Section 2 of this Agreement will not apply to you.
+      iii. You must retain in all copies of the Llama Materials that you distribute the following
+attribution notice within a “Notice” text file distributed as a part of such copies: “Meta Llama 3 is
+licensed under the Meta Llama 3 Community License, Copyright © Meta Platforms, Inc. All Rights
+Reserved.”
+      iv. Your use of the Llama Materials must comply with applicable laws and regulations
+(including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama
+Materials (available at https://llama.meta.com/llama3/use-policy), which is hereby incorporated by
+reference into this Agreement.
+      v. You will not use the Llama Materials or any output or results of the Llama Materials to
+improve any other large language model (excluding Meta Llama 3 or derivative works thereof).
+2. Additional Commercial Terms. If, on the Meta Llama 3 version release date, the monthly active users
+of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700
+million monthly active users in the preceding calendar month, you must request a license from Meta,
+which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the
+rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
+3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY
+OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF
+ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED,
+INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT,
+MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR
+DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND
+ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND
+RESULTS.
+4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF
+LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING
+OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL,
+INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED
+OF THE POSSIBILITY OF ANY OF THE FOREGOING.
+5. Intellectual Property.
+  a. No trademark licenses are granted under this Agreement, and in connection with the Llama
+Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other
+or any of its affiliates, except as required for reasonable and customary use in describing and
+redistributing the Llama Materials or as set forth in this Section 5(a). Meta hereby grants you a license to
+use “Llama 3” (the “Mark”) solely as required to comply with the last sentence of Section 1.b.i. You will
+comply with Meta’s brand guidelines (currently accessible at
+https://about.meta.com/brand/resources/meta/company-brand/ ). All goodwill arising out of your use
+of the Mark will inure to the benefit of Meta.
+  b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with
+respect to any derivative works and modifications of the Llama Materials that are made by you, as
+between you and Meta, you are and will be the owner of such derivative works and modifications.
+  c. If you institute litigation or other proceedings against Meta or any entity (including a
+cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Meta Llama 3 outputs or
+results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other
+rights owned or licensable by you, then any licenses granted to you under this Agreement shall
+terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold
+harmless Meta from and against any claim by any third party arising out of or related to your use or
+distribution of the Llama Materials.
+6. Term and Termination. The term of this Agreement will commence upon your acceptance of this
+Agreement or access to the Llama Materials and will continue in full force and effect until terminated in
+accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in
+breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete
+and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this
+Agreement.
+7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of
+the State of California without regard to choice of law principles, and the UN Convention on Contracts
+for the International Sale of Goods does not apply to this Agreement. The courts of California shall have
+exclusive jurisdiction of any dispute arising out of this Agreement.

README.md ADDED Viewed

	@@ -0,0 +1,86 @@

+---
+license: other
+license_name: cogvlm2
+license_link: https://huggingface.co/THUDM/cogvlm2-llama3-video-19B/blob/main/LICENSE
+language:
+- en
+pipeline_tag: text-generation
+tags:
+- chat
+- cogvlm2
+- cogvlm--video
+inference: false
+---
+# CogVLM2-Video
+[中文版本README](README_zh.md)
+CogVLM2-Video achieves state-of-the-art performance on multiple video question answering tasks. The following diagram
+shows the performance of CogVLM2-Video on
+the [MVBench](https://github.com/OpenGVLab/Ask-Anything), [VideoChatGPT-Bench](https://github.com/mbzuai-oryx/Video-ChatGPT)
+and Zero-shot VideoQA datasets (MSVD-QA, MSRVTT-QA, ActivityNet-QA). Where VCG-* refers to the VideoChatGPTBench, ZS-*
+refers to Zero-Shot VideoQA datasets and MV-* refers to main categories in the MVBench.
+![Quantitative Evaluation](https://github.com/THUDM/CogVLM2/tree/main/resources/cogvlm2_video_bench.jpeg)
+## Detailed performance
+Performance on VideoChatGPT-Bench and Zero-shot VideoQA dataset:
+| Models                | VCG-AVG  | VCG-CI   | VCG-DO   | VCG-CU   | VCG-TU   | VCG-CO   | ZS-AVG    |
+|-----------------------|----------|----------|----------|----------|----------|----------|-----------|
+| IG-VLM GPT4V          | 3.17     | 3.40     | 2.80     | 3.61     | 2.89     | 3.13     | 65.70     |
+| ST-LLM                | 3.15     | 3.23     | 3.05     | 3.74     | 2.93     | 2.81     | 62.90     |
+| ShareGPT4Video        | N/A      | N/A      | N/A      | N/A      | N/A      | N/A      | 46.50     |
+| VideoGPT+             | 3.28     | 3.27     | 3.18     | 3.74     | 2.83     | **3.39** | 61.20     |
+| VideoChat2_HD_mistral | 3.10     | 3.40     | 2.91     | 3.72     | 2.65     | 2.84     | 57.70     |
+| PLLaVA-34B            | 3.32     | **3.60** | 3.20     | **3.90** | 2.67     | 3.25     | **68.10** |
+| CogVLM2-Video         | **3.41** | 3.49     | **3.46** | 3.87     | **2.98** | 3.23     | 66.60     |
+Performance on MVBench dataset:
+| Model                 | AVG      | AA       | AC       | AL       | AP       | AS       | CO       | CI       | EN    | ER       | FA       | FP       | MA       | MC       | MD       | OE       | OI       | OS   | ST       | SC   | UA       |
+|-----------------------|----------|----------|----------|----------|----------|----------|----------|----------|-------|----------|----------|----------|----------|----------|----------|----------|----------|------|----------|------|----------|
+| IG-VLM GPT4V          | 43.7     | 72.0     | 39.0     | 40.5     | **63.5** | 55.5     | 52.0     | 11.0     | 31.0  | 59.0     | 46.5     | 47.5     | 22.5     | 12.0     | 12.0     | 18.5     | 59.0     | 29.5 | 83.5     | 45.0 | 73.5     |
+| ST-LLM                | 54.9     | 84.0     | 36.5     | 31.0     | 53.5     | 66.0     | 46.5     | 58.5     | 34.5  | 41.5     | 44.0     | 44.5     | 78.5     | 56.5     | 42.5     | 80.5     | 73.5     | 38.5 | 86.5     | 43.0 | 58.5     |
+| ShareGPT4Video        | 51.2     | 79.5     | 35.5     | 41.5     | 39.5     | 49.5     | 46.5     | 51.5     | 28.5  | 39.0     | 40.0     | 25.5     | 75.0     | 62.5     | 50.5     | 82.5     | 54.5     | 32.5 | 84.5     | 51.0 | 54.5     |
+| VideoGPT+             | 58.7     | 83.0     | 39.5     | 34.0     | 60.0     | **69.0** | 50.0     | 60.0     | 29.5  | 44.0     | 48.5     | 53.0     | 90.5     | 71.0     | 44.0     | **85.5** | 75.5     | 36.0 | 89.5     | 45.0 | 66.5     |
+| VideoChat2_HD_mistral | 62.3     | 79.5     | **60.0** | **87.5** | 50.0     | 68.5     | **93.5** | 71.5     | 36.5  | 45.0     | 49.5     | **87.0** | 40.0     | **76.0** | **92.0** | 53.0     | 62.0     | 45.5 | 36.0     | 44.0 | 69.5     |
+| PLLaVA-34B            | 58.1     | 82.0     | 40.5     | 49.5     | 53.0     | 67.5     | 66.5     | 59.0     | l39.5 | **63.5** | 47.0     | 50.0     | 70.0     | 43.0     | 37.5     | 68.5     | 67.5     | 36.5 | **91.0** | 51.5 | **79.0** |
+| CogVLM2-Video         | **62.3** | **85.5** | 41.5     | 31.5     | 65.5     | 79.5     | 58.5     | **77.0** | 28.5  | 42.5     | **54.0** | 57.0     | **91.5** | 73.0     | 48.0     | **91.0** | **78.0** | 36.0 | **91.5** | 47.0 | 68.5     |
+## Evaluation details
+We follow the previous works to evaluate the performance of our model. In different benchmarks, we craft task-specific
+prompts for each benchmark:
+``` python
+# For MVBench
+prompt = f"Carefully watch the video and pay attention to the cause and sequence of events, the detail and movement of objects, and the action and pose of persons. Based on your observations, select the best option that accurately addresses the question.\n " + f"{prompt.replace('Short Answer.', '')}\n" + "Short Answer:"
+# For VideoChatGPT-Bench
+prompt = f"Carefully watch the video and pay attention to the cause and sequence of events, the detail and movement of objects, and the action and pose of persons. Based on your observations, comprehensively answer the following question. Your answer should be long and cover all the related aspects\n " + f"{prompt.replace('Short Answer.', '')}\n" + "Answer:"
+# For Zero-shot VideoQA
+prompt = f"The input consists of a sequence of key frames from a video. Answer the question comprehensively including all the possible verbs and nouns that can discribe the events, followed by significant events, characters, or objects that appear throughout the frames.\n " + f"{prompt.replace('Short Answer.', '')}\n" + "Answer:"
+```
+For evaluation codes, please refer to
+the [evaluation script](https://github.com/magic-research/PLLaVA/blob/main/README.md) in PLLaVA.
+## Using This Model
+This repository is a `base` version model and does not support chat.
+You can quickly install the Python package dependencies and run model inference in
+our [github](https://github.com/THUDM/CogVLM2/tree/main/video_demo).
+## License
+This model is released under the CogVLM2 [LICENSE](LICENSE). For models built with Meta Llama 3, please also adhere to
+the [LLAMA3_LICENSE](LLAMA3_LICENSE).
+## Training details
+Pleaser refer to our technical report for training formula and hyperparameters.

README_zh.md ADDED Viewed

	@@ -0,0 +1,65 @@

+# CogVLM2-Video
+CogVLM2-Video 在多个视频问答任务上实现了最先进的性能。下图显示了 CogVLM2-Video
+在 [MVBench](https://github.com/OpenGVLab/Ask-Anything)、[VideoChatGPT-Bench](https://github.com/mbzuai-oryx/Video-ChatGPT)
+和 Zero-shot VideoQA 数据集 (MSVD-QA、MSRVTT-QA、ActivityNet-QA) 上的性能。
+![Quantitative Evaluation](https://github.com/THUDM/CogVLM2/tree/main/resources/cogvlm2_video_bench.jpeg)
+其中 VCG 指的是 VideoChatGPTBench，ZS 指的是零样本 VideoQA 数据集，MV-* 指的是 MVBench 中的主要类别。
+## 评估结论
+具体榜单测试数据如下：
+| Models                | VCG-AVG  | VCG-CI   | VCG-DO   | VCG-CU   | VCG-TU   | VCG-CO   | ZS-AVG    |
+|-----------------------|----------|----------|----------|----------|----------|----------|-----------|
+| IG-VLM GPT4V          | 3.17     | 3.40     | 2.80     | 3.61     | 2.89     | 3.13     | 65.70     |
+| ST-LLM                | 3.15     | 3.23     | 3.05     | 3.74     | 2.93     | 2.81     | 62.90     |
+| ShareGPT4Video        | N/A      | N/A      | N/A      | N/A      | N/A      | N/A      | 46.50     |
+| VideoGPT+             | 3.28     | 3.27     | 3.18     | 3.74     | 2.83     | **3.39** | 61.20     |
+| VideoChat2_HD_mistral | 3.10     | 3.40     | 2.91     | 3.72     | 2.65     | 2.84     | 57.70     |
+| PLLaVA-34B            | 3.32     | **3.60** | 3.20     | **3.90** | 2.67     | 3.25     | **68.10** |
+| CogVLM2-Video         | **3.41** | 3.49     | **3.46** | 3.87     | **2.98** | 3.23     | 66.60     |
+CogVLM2-Video 在 MVBench 数据集上的表现
+| Model                 | AVG      | AA       | AC       | AL       | AP       | AS       | CO       | CI       | EN    | ER       | FA       | FP       | MA       | MC       | MD       | OE       | OI       | OS   | ST       | SC   | UA       |
+|-----------------------|----------|----------|----------|----------|----------|----------|----------|----------|-------|----------|----------|----------|----------|----------|----------|----------|----------|------|----------|------|----------|
+| IG-VLM GPT4V          | 43.7     | 72.0     | 39.0     | 40.5     | **63.5** | 55.5     | 52.0     | 11.0     | 31.0  | 59.0     | 46.5     | 47.5     | 22.5     | 12.0     | 12.0     | 18.5     | 59.0     | 29.5 | 83.5     | 45.0 | 73.5     |
+| ST-LLM                | 54.9     | 84.0     | 36.5     | 31.0     | 53.5     | 66.0     | 46.5     | 58.5     | 34.5  | 41.5     | 44.0     | 44.5     | 78.5     | 56.5     | 42.5     | 80.5     | 73.5     | 38.5 | 86.5     | 43.0 | 58.5     |
+| ShareGPT4Video        | 51.2     | 79.5     | 35.5     | 41.5     | 39.5     | 49.5     | 46.5     | 51.5     | 28.5  | 39.0     | 40.0     | 25.5     | 75.0     | 62.5     | 50.5     | 82.5     | 54.5     | 32.5 | 84.5     | 51.0 | 54.5     |
+| VideoGPT+             | 58.7     | 83.0     | 39.5     | 34.0     | 60.0     | **69.0** | 50.0     | 60.0     | 29.5  | 44.0     | 48.5     | 53.0     | 90.5     | 71.0     | 44.0     | **85.5** | 75.5     | 36.0 | 89.5     | 45.0 | 66.5     |
+| VideoChat2_HD_mistral | 62.3     | 79.5     | **60.0** | **87.5** | 50.0     | 68.5     | **93.5** | 71.5     | 36.5  | 45.0     | 49.5     | **87.0** | 40.0     | **76.0** | **92.0** | 53.0     | 62.0     | 45.5 | 36.0     | 44.0 | 69.5     |
+| PLLaVA-34B            | 58.1     | 82.0     | 40.5     | 49.5     | 53.0     | 67.5     | 66.5     | 59.0     | l39.5 | **63.5** | 47.0     | 50.0     | 70.0     | 43.0     | 37.5     | 68.5     | 67.5     | 36.5 | **91.0** | 51.5 | **79.0** |
+| CogVLM2-Video         | **62.3** | **85.5** | 41.5     | 31.5     | 65.5     | 79.5     | 58.5     | **77.0** | 28.5  | 42.5     | **54.0** | 57.0     | **91.5** | 73.0     | 48.0     | **91.0** | **78.0** | 36.0 | **91.5** | 47.0 | 68.5     |
+## 评估和复现
+我们遵循以前的研究来评估我们模型的性能。在不同的基准测试中，我们为每个基准测试制作特定于任务的提示：
+``` python
+# For MVBench
+prompt = f"Carefully watch the video and pay attention to the cause and sequence of events, the detail and movement of objects, and the action and pose of persons. Based on your observations, select the best option that accurately addresses the question.\n " + f"{prompt.replace('Short Answer.', '')}\n" + "Short Answer:"
+# For VideoChatGPT-Bench
+prompt = f"Carefully watch the video and pay attention to the cause and sequence of events, the detail and movement of objects, and the action and pose of persons. Based on your observations, comprehensively answer the following question. Your answer should be long and cover all the related aspects\n " + f"{prompt.replace('Short Answer.', '')}\n" + "Answer:"
+# For Zero-shot VideoQA
+prompt = f"The input consists of a sequence of key frames from a video. Answer the question comprehensively including all the possible verbs and nouns that can discribe the events, followed by significant events, characters, or objects that appear throughout the frames.\n " + f"{prompt.replace('Short Answer.', '')}\n" + "Answer:"
+```
+有关评估代码，请参阅 PLLaVA 中的 [评估脚本](https://github.com/magic-research/PLLaVA/blob/main/README.md)。
+## 快速调用
+本仓库为 `base` 版本模型，不支持对话。
+您可以在我们的 [github](https://github.com/THUDM/CogVLM2/tree/main/video_demo) 中快速安装对应的 Python包 依赖和运行模型推理。
+## 模型协议
+此模型根据 CogVLM2 [LICENSE](LICENSE) 发布。对于使用 Meta Llama 3 构建的模型，还请遵守
+[LLAMA3_LICENSE](LLAMA3_LICENSE)。
+## 引用
+我们即将发布技术报告，尽情期待。