zjowowen commited on
Commit
643bd7e
1 Parent(s): 2e278e7

init space

Browse files
.gitignore ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ .env
2
+ *bkp.py
LICENSE ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Apache License
2
+ Version 2.0, January 2004
3
+ http://www.apache.org/licenses/
4
+
5
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
+
7
+ 1. Definitions.
8
+
9
+ "License" shall mean the terms and conditions for use, reproduction,
10
+ and distribution as defined by Sections 1 through 9 of this document.
11
+
12
+ "Licensor" shall mean the copyright owner or entity authorized by
13
+ the copyright owner that is granting the License.
14
+
15
+ "Legal Entity" shall mean the union of the acting entity and all
16
+ other entities that control, are controlled by, or are under common
17
+ control with that entity. For the purposes of this definition,
18
+ "control" means (i) the power, direct or indirect, to cause the
19
+ direction or management of such entity, whether by contract or
20
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
21
+ outstanding shares, or (iii) beneficial ownership of such entity.
22
+
23
+ "You" (or "Your") shall mean an individual or Legal Entity
24
+ exercising permissions granted by this License.
25
+
26
+ "Source" form shall mean the preferred form for making modifications,
27
+ including but not limited to software source code, documentation
28
+ source, and configuration files.
29
+
30
+ "Object" form shall mean any form resulting from mechanical
31
+ transformation or translation of a Source form, including but
32
+ not limited to compiled object code, generated documentation,
33
+ and conversions to other media types.
34
+
35
+ "Work" shall mean the work of authorship, whether in Source or
36
+ Object form, made available under the License, as indicated by a
37
+ copyright notice that is included in or attached to the work
38
+ (an example is provided in the Appendix below).
39
+
40
+ "Derivative Works" shall mean any work, whether in Source or Object
41
+ form, that is based on (or derived from) the Work and for which the
42
+ editorial revisions, annotations, elaborations, or other modifications
43
+ represent, as a whole, an original work of authorship. For the purposes
44
+ of this License, Derivative Works shall not include works that remain
45
+ separable from, or merely link (or bind by name) to the interfaces of,
46
+ the Work and Derivative Works thereof.
47
+
48
+ "Contribution" shall mean any work of authorship, including
49
+ the original version of the Work and any modifications or additions
50
+ to that Work or Derivative Works thereof, that is intentionally
51
+ submitted to Licensor for inclusion in the Work by the copyright owner
52
+ or by an individual or Legal Entity authorized to submit on behalf of
53
+ the copyright owner. For the purposes of this definition, "submitted"
54
+ means any form of electronic, verbal, or written communication sent
55
+ to the Licensor or its representatives, including but not limited to
56
+ communication on electronic mailing lists, source code control systems,
57
+ and issue tracking systems that are managed by, or on behalf of, the
58
+ Licensor for the purpose of discussing and improving the Work, but
59
+ excluding communication that is conspicuously marked or otherwise
60
+ designated in writing by the copyright owner as "Not a Contribution."
61
+
62
+ "Contributor" shall mean Licensor and any individual or Legal Entity
63
+ on behalf of whom a Contribution has been received by Licensor and
64
+ subsequently incorporated within the Work.
65
+
66
+ 2. Grant of Copyright License. Subject to the terms and conditions of
67
+ this License, each Contributor hereby grants to You a perpetual,
68
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69
+ copyright license to reproduce, prepare Derivative Works of,
70
+ publicly display, publicly perform, sublicense, and distribute the
71
+ Work and such Derivative Works in Source or Object form.
72
+
73
+ 3. Grant of Patent License. Subject to the terms and conditions of
74
+ this License, each Contributor hereby grants to You a perpetual,
75
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76
+ (except as stated in this section) patent license to make, have made,
77
+ use, offer to sell, sell, import, and otherwise transfer the Work,
78
+ where such license applies only to those patent claims licensable
79
+ by such Contributor that are necessarily infringed by their
80
+ Contribution(s) alone or by combination of their Contribution(s)
81
+ with the Work to which such Contribution(s) was submitted. If You
82
+ institute patent litigation against any entity (including a
83
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
84
+ or a Contribution incorporated within the Work constitutes direct
85
+ or contributory patent infringement, then any patent licenses
86
+ granted to You under this License for that Work shall terminate
87
+ as of the date such litigation is filed.
88
+
89
+ 4. Redistribution. You may reproduce and distribute copies of the
90
+ Work or Derivative Works thereof in any medium, with or without
91
+ modifications, and in Source or Object form, provided that You
92
+ meet the following conditions:
93
+
94
+ (a) You must give any other recipients of the Work or
95
+ Derivative Works a copy of this License; and
96
+
97
+ (b) You must cause any modified files to carry prominent notices
98
+ stating that You changed the files; and
99
+
100
+ (c) You must retain, in the Source form of any Derivative Works
101
+ that You distribute, all copyright, patent, trademark, and
102
+ attribution notices from the Source form of the Work,
103
+ excluding those notices that do not pertain to any part of
104
+ the Derivative Works; and
105
+
106
+ (d) If the Work includes a "NOTICE" text file as part of its
107
+ distribution, then any Derivative Works that You distribute must
108
+ include a readable copy of the attribution notices contained
109
+ within such NOTICE file, excluding those notices that do not
110
+ pertain to any part of the Derivative Works, in at least one
111
+ of the following places: within a NOTICE text file distributed
112
+ as part of the Derivative Works; within the Source form or
113
+ documentation, if provided along with the Derivative Works; or,
114
+ within a display generated by the Derivative Works, if and
115
+ wherever such third-party notices normally appear. The contents
116
+ of the NOTICE file are for informational purposes only and
117
+ do not modify the License. You may add Your own attribution
118
+ notices within Derivative Works that You distribute, alongside
119
+ or as an addendum to the NOTICE text from the Work, provided
120
+ that such additional attribution notices cannot be construed
121
+ as modifying the License.
122
+
123
+ You may add Your own copyright statement to Your modifications and
124
+ may provide additional or different license terms and conditions
125
+ for use, reproduction, or distribution of Your modifications, or
126
+ for any such Derivative Works as a whole, provided Your use,
127
+ reproduction, and distribution of the Work otherwise complies with
128
+ the conditions stated in this License.
129
+
130
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
131
+ any Contribution intentionally submitted for inclusion in the Work
132
+ by You to the Licensor shall be under the terms and conditions of
133
+ this License, without any additional terms or conditions.
134
+ Notwithstanding the above, nothing herein shall supersede or modify
135
+ the terms of any separate license agreement you may have executed
136
+ with Licensor regarding such Contributions.
137
+
138
+ 6. Trademarks. This License does not grant permission to use the trade
139
+ names, trademarks, service marks, or product names of the Licensor,
140
+ except as required for reasonable and customary use in describing the
141
+ origin of the Work and reproducing the content of the NOTICE file.
142
+
143
+ 7. Disclaimer of Warranty. Unless required by applicable law or
144
+ agreed to in writing, Licensor provides the Work (and each
145
+ Contributor provides its Contributions) on an "AS IS" BASIS,
146
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147
+ implied, including, without limitation, any warranties or conditions
148
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149
+ PARTICULAR PURPOSE. You are solely responsible for determining the
150
+ appropriateness of using or redistributing the Work and assume any
151
+ risks associated with Your exercise of permissions under this License.
152
+
153
+ 8. Limitation of Liability. In no event and under no legal theory,
154
+ whether in tort (including negligence), contract, or otherwise,
155
+ unless required by applicable law (such as deliberate and grossly
156
+ negligent acts) or agreed to in writing, shall any Contributor be
157
+ liable to You for damages, including any direct, indirect, special,
158
+ incidental, or consequential damages of any character arising as a
159
+ result of this License or out of the use or inability to use the
160
+ Work (including but not limited to damages for loss of goodwill,
161
+ work stoppage, computer failure or malfunction, or any and all
162
+ other commercial damages or losses), even if such Contributor
163
+ has been advised of the possibility of such damages.
164
+
165
+ 9. Accepting Warranty or Additional Liability. While redistributing
166
+ the Work or Derivative Works thereof, You may choose to offer,
167
+ and charge a fee for, acceptance of support, warranty, indemnity,
168
+ or other liability obligations and/or rights consistent with this
169
+ License. However, in accepting such obligations, You may act only
170
+ on Your own behalf and on Your sole responsibility, not on behalf
171
+ of any other Contributor, and only if You agree to indemnify,
172
+ defend, and hold each Contributor harmless for any liability
173
+ incurred by, or claims asserted against, such Contributor by reason
174
+ of your accepting any such warranty or additional liability.
175
+
176
+ END OF TERMS AND CONDITIONS
177
+
178
+ APPENDIX: How to apply the Apache License to your work.
179
+
180
+ To apply the Apache License to your work, attach the following
181
+ boilerplate notice, with the fields enclosed by brackets "[]"
182
+ replaced with your own identifying information. (Don't include
183
+ the brackets!) The text should be enclosed in the appropriate
184
+ comment syntax for the file format. We also recommend that a
185
+ file or class name and description of purpose be included on the
186
+ same "printed page" as the copyright notice for easier
187
+ identification within third-party archives.
188
+
189
+ Copyright [yyyy] [name of copyright owner]
190
+
191
+ Licensed under the Apache License, Version 2.0 (the "License");
192
+ you may not use this file except in compliance with the License.
193
+ You may obtain a copy of the License at
194
+
195
+ http://www.apache.org/licenses/LICENSE-2.0
196
+
197
+ Unless required by applicable law or agreed to in writing, software
198
+ distributed under the License is distributed on an "AS IS" BASIS,
199
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200
+ See the License for the specific language governing permissions and
201
+ limitations under the License.
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  title: LightZero RAG
3
- emoji: 📉
4
  colorFrom: yellow
5
  colorTo: blue
6
  sdk: gradio
@@ -10,4 +10,107 @@ pinned: false
10
  license: apache-2.0
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: LightZero RAG
3
+ emoji: 📖
4
  colorFrom: yellow
5
  colorTo: blue
6
  sdk: gradio
 
10
  license: apache-2.0
11
  ---
12
 
13
+ # RAG Demo
14
+
15
+ English | [简体中文(Simplified Chinese)](https://github.com/puyuan1996/RAG/blob/main/README_zh.md)
16
+
17
+ ## Introduction
18
+
19
+ RAG is a demonstration project for a question-answering system based on Retrieval-Augmented Generation (RAG).
20
+ - It utilizes large language models such as GPT-3.5 in conjunction with a document retrieval vector database like Weaviate to respond to user queries by retrieving relevant document contexts and leveraging the generative capabilities of the language model.
21
+ - The project also includes a web-based interactive application built with Gradio and rag_demo.py.
22
+
23
+ ## rag_demo.py Features
24
+
25
+ - Supports loading OpenAI API keys via environment variables.
26
+ - Facilitates loading local documents and splitting them into chunks.
27
+ - Allows for the creation of a vector store and the conversion of document chunks into vectors for storage in Weaviate.
28
+ - Sets up a Retrieval-Augmented Generation process, combining document retrieval and language model generation to answer user questions.
29
+ - Executes queries and prints results, with the option to use the RAG process or not.
30
+
31
+ ## app.py Features
32
+
33
+ - Creates a Gradio application where users can input questions and the application employs the Retrieval-Augmented Generation (RAG) model to find answers, displaying results within the interface.
34
+ - Retrieved contexts are highlighted in the Markdown document to help users understand the source of the answers. The application interface is divided into two sections: the top for Q&A and the bottom to display the contexts referred to by the RAG model.
35
+
36
+ ## How to Use
37
+
38
+ 1. Clone the project to your local machine.
39
+ 2. Install dependencies.
40
+
41
+ ```shell
42
+ pip3 install -r requirements.txt
43
+ ```
44
+ 3. Create a `.env` file in the project root directory and add your OpenAI API key:
45
+
46
+ ```
47
+ OPENAI_API_KEY='your API key'
48
+ QUESTION_LANG='cn' # The language of the question, currently available option is 'cn'
49
+ ```
50
+
51
+ 4. Ensure you have available documents as context or use the commented-out code snippet to download the documents you want to reference.
52
+ 5. Run the `python3 -u rag_demo.py` file to start using the application.
53
+
54
+ ## Example
55
+
56
+ ```python
57
+
58
+ # The difference between rag_demo.py and rag_demo_v0.py is that it can output the retrieved document chunks.
59
+ if __name__ == "__main__":
60
+ # Assuming documents are already present locally
61
+ file_path = './documents/LightZero_README.zh.md'
62
+ # Load and split document
63
+ chunks = load_and_split_document(file_path)
64
+ # Create vector store
65
+ retriever = create_vector_store(chunks)
66
+ # Set up RAG process
67
+ rag_chain = setup_rag_chain()
68
+
69
+ # Pose a question and get an answer
70
+ query = "Does the AlphaZero algorithm implemented in LightZero support running in the Atari environment? Please explain in detail."
71
+ # Use RAG chain to get referenced documents and answer
72
+ retrieved_documents, result_with_rag = execute_query(retriever, rag_chain, query)
73
+ # Get an answer without using RAG chain
74
+ result_without_rag = execute_query_no_rag(query=query)
75
+
76
+ # Details of data handling code are omitted here, please refer to the source files in this repository for specifics
77
+
78
+ # Print and compare results from both methods
79
+ print("=" * 40)
80
+ print(f"My question is:\n{query}")
81
+ print("=" * 40)
82
+ print(f"Result with RAG:\n{wrapped_result_with_rag}\nRetrieved context is: \n{context}")
83
+ print("=" * 40)
84
+ print(f"Result without RAG:\n{wrapped_result_without_rag}")
85
+ print("=" * 40)
86
+ ```
87
+
88
+ ## Project Structure
89
+
90
+ ```
91
+ RAG/
92
+
93
+ ├── rag_demo_v0.py # RAG demonstration script without support for outputting retrieved document chunks.
94
+ ├── rag_demo.py # RAG demonstration script with support for outputting retrieved document chunks.
95
+ ├── app.py # Web-based interactive application built with Gradio and rag_demo.py.
96
+ ├── .env # Environment variable configuration file
97
+ └── documents/ # Documents folder
98
+ └── your_document.txt # Context document
99
+ ```
100
+
101
+ ## Contribution Guide
102
+
103
+ If you would like to contribute code to RAG, please follow these steps:
104
+
105
+ 1. Fork the project.
106
+ 2. Create a new branch.
107
+ 3. Commit your changes.
108
+ 4. Submit a Pull Request.
109
+
110
+ ## Issues and Support
111
+
112
+ If you encounter any issues or require assistance, please submit a problem through the project's Issues page.
113
+
114
+ ## License
115
+
116
+ All code in this repository is compliant with [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
README_zh.md ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # RAG Demo 使用说明
2
+
3
+ 简体中文 | [English](https://github.com/puyuan1996/RAG/blob/main/README.md)
4
+
5
+ ## 简介
6
+
7
+ RAG 是一个基于检索增强生成 (RAG) 的问答系统示例项目。
8
+ - 它使用大型语言模型(如 GPT-3.5)和文档检索向量数据库(如 Weaviate)来响应用户的问题,通过检索相关的文档上下文以及利用语言模型的生成能力来提供准确的回答。
9
+ - 同时提供了一个基于 Gradio 和 rag_demo.py 构建的网页交互式应用。
10
+
11
+ ## rag_demo.py 功能
12
+
13
+ - 支持通过环境变量加载 OpenAI API 密钥。
14
+ - 支持加载本地文档并将其分割成小块。
15
+ - 支持创建向量存储,并将文档块转换为向量存储在 Weaviate 中。
16
+ - 支持设置检索增强生成流程,结合文档检索和语言模型生成对用户问题进行回答。
17
+ - 支持执行查询并打印结果,可以选择是否通过 RAG 流程。
18
+
19
+ ## app.py 功能
20
+
21
+ - 创建一个Gradio应用,用户可以在其中输入问题,应用会使用Retrieval-Augmented Generation (RAG)模型来寻找答案并将结果显示在界面上。
22
+ - 其中,检索到的上下文会在Markdown文档中高亮显示,帮助用户理解答案的来源。应用界面分为两部分:顶部是问答区,底部展示了RAG模型参考的上下文。
23
+
24
+ ## 使用方法
25
+
26
+ 1. 克隆项目到本地。
27
+ 2. 安装依赖。
28
+
29
+ ```shell
30
+ pip3 install -r requirements.txt
31
+ ```
32
+ 3. 在项目根目录下创建 `.env` 文件,并添加你的 OpenAI API 密钥:
33
+
34
+ ```
35
+ OPENAI_API_KEY='你的API密钥'
36
+ QUESTION_LANG='cn' # 问题语言,目前可选值为 'cn'
37
+ ```
38
+
39
+ 4. 确保已经有可用的文档作为上下文,或者使用注释掉的代码段下载你需要参考的文档。
40
+ 5. 执行 `python3 -u rag_demo.py` 文件即可开始使用。
41
+
42
+ ## 示例
43
+
44
+ ```python
45
+
46
+ # rag_demo.py 相对 rag_demo_v0.py 的不同之处在于可以输出检索到的文档块。
47
+ if __name__ == "__main__":
48
+ # 假设文档已存在于本地
49
+ file_path = './documents/LightZero_README.zh.md'
50
+ # 加载和分割文档
51
+ chunks = load_and_split_document(file_path)
52
+ # 创建向量存储
53
+ retriever = create_vector_store(chunks)
54
+ # 设置 RAG 流程
55
+ rag_chain = setup_rag_chain()
56
+
57
+ # 提出问题并获取答案
58
+ query = "请问 LightZero 里面实现的 AlphaZero 算法支持在 Atari 环境上运行吗?请详细解释原因"
59
+ # 使用 RAG 链获取参考的文档与答案
60
+ retrieved_documents, result_with_rag = execute_query(retriever, rag_chain, query)
61
+ # 不使用 RAG 链获取答案
62
+ result_without_rag = execute_query_no_rag(query=query)
63
+
64
+ # 此处省略部分数据处理代码,具体细节请参考本仓库中的源文件
65
+
66
+ # 打印并对比两种方法的结果
67
+ print("=" * 40)
68
+ print(f"我的问题是:\n{query}")
69
+ print("=" * 40)
70
+ print(f"Result with RAG:\n{wrapped_result_with_rag}\n检索得到的context是: \n{context}")
71
+ print("=" * 40)
72
+ print(f"Result without RAG:\n{wrapped_result_without_rag}")
73
+ print("=" * 40)
74
+ ```
75
+
76
+ ## 项目结构
77
+
78
+ ```
79
+ RAG/
80
+
81
+ ├── rag_demo_v0.py # RAG 演示脚本,不支持输出检索到的文档块。
82
+ ├── rag_demo.py # RAG 演示脚本,支持输出检索到的文档块。
83
+ ├── app.py # 基于 Gradio 和 rag_demo.py 构建的网页交互式应用。
84
+ ├── .env # 环境变量配置文件
85
+ └── documents/ # 文档文件夹
86
+ └── your_document.txt # 上下文文档
87
+ ```
88
+
89
+ ## 贡献指南
90
+
91
+ 如果您希望为 RAG 贡献代码,请遵循以下步骤:
92
+
93
+ 1. Fork 项目。
94
+ 2. 创建一个新的分支。
95
+ 3. 提交你的改动。
96
+ 4. 提交 Pull Request。
97
+
98
+ ## 问题和支持
99
+
100
+ 如果遇到任何问题或需要帮助,请通过项目的 Issues 页面提交问题。
101
+
102
+ ## 许可证
103
+
104
+ 本仓库中的所有代码都符合 [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)。
app.py ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ 这段代码的整体功能是创建一个Gradio应用,用户可以在其中输入问题,应用会使用Retrieval-Augmented Generation (RAG)模型来寻找答案并将结果显示在界面上。
3
+ 其中,检索到的上下文会在Markdown文档中高亮显示,帮助用户理解答案的来源。应用界面分为两部分:顶部是问答区,底部展示了RAG模型参考的上下文。
4
+
5
+ 结构概述:
6
+ - 导入必要的库和函数。
7
+ - 设置环境变量和全局变量。
8
+ - 加载和处理Markdown文档。
9
+ - 定义处理用户问题并返回答案和高亮显示上下文的函数。
10
+ - 使用Gradio构建用户界面,包括Markdown、输入框、按钮和输出框。
11
+ - 启动Gradio应用并设置为可以分享。
12
+ """
13
+
14
+ import os
15
+
16
+ import gradio as gr
17
+ from dotenv import load_dotenv
18
+ from langchain.document_loaders import TextLoader
19
+
20
+ from rag_demo import load_and_split_document, create_vector_store, setup_rag_chain, execute_query
21
+
22
+ # 环境设置
23
+ load_dotenv() # 加载环境变量
24
+ QUESTION_LANG = os.getenv("QUESTION_LANG") # 从环境变量获取 QUESTION_LANG
25
+
26
+ assert QUESTION_LANG in ['cn', 'en'], QUESTION_LANG
27
+
28
+ if QUESTION_LANG == "cn":
29
+ title = "LightZero RAG Demo"
30
+ title_markdown = """
31
+ <div align="center">
32
+ <img src="https://raw.githubusercontent.com/puyuan1996/RAG/main/assets/banner.svg" width="80%" height="20%" alt="Banner Image">
33
+ </div>
34
+ <h2 style="text-align: center; color: black;"><a href="https://github.com/puyuan1996/RAG"> 🎭LightZero RAG Demo</a></h2>
35
+ <h4 align="center"> 📢说明:请您在下面的"问题"框中输入任何关于 LightZero 的问题,然后点击"提交"按钮。右侧"回答"框中会显示 RAG 模型给出的回答。在QA栏的下方会给出参考文档(检索得到的 context 用黄色高亮显示)。</h4>
36
+ <h4 align="center"> 如果你喜欢这个项目,请给我们在 GitHub 点个 star ✨ 。我们将会持续保持更新。 </h4>
37
+ <strong><h5 align="center">注意:算法模型的输出可能包含一定的随机性。相关结果不代表任何开发者和相关 AI 服务的态度和意见。本项目开发者不对生成结果作任何保证,仅供参考。<h5></strong>
38
+ """
39
+ tos_markdown = """
40
+ ### 使用条款
41
+ 玩家使用本服务须同意以下条款:
42
+ 该服务是一项探索性研究预览版,仅供非商业用途。它仅提供有限的安全措施,并可能生成令人反感的内容。不得将其用于任何非法、有害、暴力、种族主义等目的。
43
+ 如果您的游玩体验有不佳之处,请发送邮件至 opendilab@pjlab.org.cn ! 我们将删除相关信息,并不断改进这个项目。
44
+ 为了获得最佳体验,请使用台式电脑,因为移动设备可能会影响可视化效果。
45
+ **版权所有 2024 OpenDILab。**
46
+ """
47
+
48
+ # 路径变量,方便之后的文件使用
49
+ file_path = './documents/LightZero_README.zh.md'
50
+ chunks = load_and_split_document(file_path)
51
+ retriever = create_vector_store(chunks)
52
+ # rag_chain = setup_rag_chain(model_name="gpt-4")
53
+ rag_chain = setup_rag_chain(model_name="gpt-3.5-turbo")
54
+
55
+ # 加载原始Markdown文档
56
+ loader = TextLoader(file_path)
57
+ orig_documents = loader.load()
58
+
59
+
60
+ def rag_answer(question):
61
+ retrieved_documents, answer = execute_query(retriever, rag_chain, question)
62
+ # Highlight the context in the document
63
+ context = [retrieved_documents[i].page_content for i in range(len(retrieved_documents))]
64
+ highlighted_document = orig_documents[0].page_content
65
+ for i in range(len(context)):
66
+ highlighted_document = highlighted_document.replace(context[i], f"<mark>{context[i]}</mark>")
67
+ return answer, highlighted_document
68
+
69
+ """
70
+ 在下面的代码中,gr.Blocks构建了Gradio的界面布局,gr.Textbox用于创建文本输入框,gr.Button创建了一个按钮,gr.Markdown则用于显示Markdown格式的内容。
71
+ gr_submit.click是一个事件处理器,当用户点击提交按钮时,它会调用rag_answer函数,并将输入和输出的组件关联起来。
72
+ 代码中的rag_answer函数负责接收用户的问题,使用RAG模型检索和生成答案,并将检索到的文本段落在Markdown原文中高亮显示。
73
+ 该函数返回模型生成的答案和高亮显示上下文的Markdown文本。
74
+ """
75
+ with gr.Blocks(title=title, theme='ParityError/Interstellar') as rag_demo:
76
+ gr.Markdown(title_markdown)
77
+
78
+ with gr.Row():
79
+ with gr.Column():
80
+ inputs = gr.Textbox(
81
+ placeholder="请您输入任何关于 LightZero 的问题。",
82
+ label="问题 (Q)") # 设置输出框,包括答案和高亮显示参考文档
83
+ gr_submit = gr.Button('提交')
84
+
85
+ outputs_answer = gr.Textbox(placeholder="当你点击提交按钮后,这里会显示 RAG 模型给出的回答。",
86
+ label="回答 (A)")
87
+ with gr.Row():
88
+ # placeholder="当你点击提交按钮后,这里会显示参考的文档,其中检索得到的���问题最相关的 context 用高亮显示。"
89
+ outputs_context = gr.Markdown(label="参考的文档,检索得到的 context 用高亮显示 (C)")
90
+
91
+ gr.Markdown(tos_markdown)
92
+
93
+ gr_submit.click(
94
+ rag_answer,
95
+ inputs=inputs,
96
+ outputs=[outputs_answer, outputs_context],
97
+ )
98
+
99
+ if __name__ == "__main__":
100
+ # 启动界面,设置为可以分享。如果分享公网链接失败,可以在本地执行 ngrok http 7860 将本地端口映射到公网
101
+ rag_demo.launch(share=True)
assets/banner.svg ADDED
documents/LightZero_README.md ADDED
@@ -0,0 +1,537 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div id="top"></div>
2
+
3
+ # LightZero
4
+
5
+ <div align="center">
6
+ <img width="1000px" height="auto" src="https://github.com/opendilab/LightZero/blob/main/LightZero.png"></a>
7
+ </div>
8
+
9
+ ---
10
+
11
+ [![Twitter](https://img.shields.io/twitter/url?style=social&url=https%3A%2F%2Ftwitter.com%2Fopendilab)](https://twitter.com/opendilab)
12
+ [![PyPI](https://img.shields.io/pypi/v/LightZero)](https://pypi.org/project/LightZero/)
13
+ ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/LightZero)
14
+ ![Loc](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/HansBug/e002642132ec758e99264118c66778a4/raw/loc.json)
15
+ ![Comments](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/HansBug/e002642132ec758e99264118c66778a4/raw/comments.json)
16
+
17
+ [![Code Test](https://github.com/opendilab/LightZero/workflows/Code%20Test/badge.svg)](https://github.com/opendilab/LightZero/actions?query=workflow%3A%22Code+Test%22)
18
+ [![Badge Creation](https://github.com/opendilab/LightZero/workflows/Badge%20Creation/badge.svg)](https://github.com/opendilab/LightZero/actions?query=workflow%3A%22Badge+Creation%22)
19
+ [![Package Release](https://github.com/opendilab/LightZero/workflows/Package%20Release/badge.svg)](https://github.com/opendilab/LightZero/actions?query=workflow%3A%22Package+Release%22)
20
+
21
+ ![GitHub Org's stars](https://img.shields.io/github/stars/opendilab)
22
+ [![GitHub stars](https://img.shields.io/github/stars/opendilab/LightZero)](https://github.com/opendilab/LightZero/stargazers)
23
+ [![GitHub forks](https://img.shields.io/github/forks/opendilab/LightZero)](https://github.com/opendilab/LightZero/network)
24
+ ![GitHub commit activity](https://img.shields.io/github/commit-activity/m/opendilab/LightZero)
25
+ [![GitHub issues](https://img.shields.io/github/issues/opendilab/LightZero)](https://github.com/opendilab/LightZero/issues)
26
+ [![GitHub pulls](https://img.shields.io/github/issues-pr/opendilab/LightZero)](https://github.com/opendilab/LightZero/pulls)
27
+ [![Contributors](https://img.shields.io/github/contributors/opendilab/LightZero)](https://github.com/opendilab/LightZero/graphs/contributors)
28
+ [![GitHub license](https://img.shields.io/github/license/opendilab/LightZero)](https://github.com/opendilab/LightZero/blob/master/LICENSE)
29
+
30
+ Updated on 2023.12.07 LightZero-v0.0.3
31
+
32
+ > LightZero is a lightweight, efficient, and easy-to-understand open-source algorithm toolkit that combines Monte Carlo Tree Search (MCTS) and Deep Reinforcement Learning (RL).
33
+
34
+ English | [简体中文(Simplified Chinese)](https://github.com/opendilab/LightZero/blob/main/README.zh.md) | [Paper](https://arxiv.org/pdf/2310.08348.pdf)
35
+
36
+ ## Background
37
+
38
+ The integration of Monte Carlo Tree Search and Deep Reinforcement Learning,
39
+ exemplified by AlphaZero and MuZero,
40
+ has achieved unprecedented performance levels in various games, including Go and Atari.
41
+ This advanced methodology has also made significant strides in scientific domains like protein structure prediction and the search for matrix multiplication algorithms.
42
+ The following is an overview of the historical evolution of the Monte Carlo Tree Search algorithm series:
43
+ ![pipeline](assets/mcts_rl_evolution_overview.png)
44
+
45
+ ## Overview
46
+
47
+ **LightZero** is an open-source algorithm toolkit that combines MCTS and RL for PyTorch. It provides support for a range of MCTS-based RL algorithms and applications with the following advantages:
48
+ - Lightweight.
49
+ - Efficient.
50
+ - Easy-to-understand.
51
+
52
+ For further details, please refer to [Features](#features), [Framework Structure](#framework-structure) and [Integrated Algorithms](#integrated-algorithms).
53
+
54
+ **LightZero** aims to **promote the standardization of the MCTS+RL algorithm family to accelerate related research and applications**. A performance comparison of all implemented algorithms under a unified framework is presented in the [Benchmark](#benchmark).
55
+
56
+ ### Outline
57
+
58
+ - [Overview](#overview)
59
+ - [Outline](#outline)
60
+ - [Features](#features)
61
+ - [Framework Structure](#framework-structure)
62
+ - [Integrated Algorithms](#integrated-algorithms)
63
+ - [Installation](#installation)
64
+ - [Quick Start](#quick-start)
65
+ - [Benchmark](#benchmark)
66
+ - [Awesome-MCTS Notes](#awesome-mcts-notes)
67
+ - [Paper Notes](#paper-notes)
68
+ - [Algo. Overview](#algo-overview)
69
+ - [Awesome-MCTS Papers](#awesome-mcts-papers)
70
+ - [Key Papers](#key-papers)
71
+ - [Other Papers](#other-papers)
72
+ - [Feedback and Contribution](#feedback-and-contribution)
73
+ - [Citation](#citation)
74
+ - [Acknowledgments](#acknowledgments)
75
+ - [License](#license)
76
+
77
+ ### Features
78
+
79
+ **Lightweight**: LightZero integrates multiple MCTS algorithm families and can solve decision-making problems with various attributes in a lightweight framework. The algorithms and environments LightZero implemented can be found [here](#integrated-algorithms).
80
+
81
+ **Efficient**: LightZero uses mixed heterogeneous computing programming to improve computational efficiency for the most time-consuming part of MCTS algorithms.
82
+
83
+ **Easy-to-understand**: LightZero provides detailed documentation and algorithm framework diagrams for all integrated algorithms to help users understand the algorithm's core and compare the differences and similarities between algorithms under the same paradigm. LightZero also provides function call graphs and network structure diagrams for algorithm code implementation, making it easier for users to locate critical code. All the documentation can be found [here](#paper-notes).
84
+
85
+ ### Framework Structure
86
+
87
+ [comment]: <> (<p align="center">)
88
+
89
+ [comment]: <> ( <img src="assets/lightzero_file_structure.png" alt="Image Description 1" width="45%" height="auto" style="margin: 0 1%;">)
90
+
91
+ [comment]: <> ( <img src="assets/lightzero_pipeline.png" alt="Image Description 2" width="45%" height="auto" style="margin: 0 1%;">)
92
+
93
+ [comment]: <> (</p>)
94
+
95
+ <p align="center">
96
+ <img src="assets/lightzero_pipeline.svg" alt="Image Description 2" width="50%" height="auto" style="margin: 0 1%;">
97
+ </p>
98
+
99
+ The above picture is the framework pipeline of LightZero. We briefly introduce the three core modules below:
100
+
101
+ **Model**:
102
+ ``Model`` is used to define the network structure, including the ``__init__`` function for initializing the network structure and the ``forward`` function for computing the network's forward propagation.
103
+
104
+ **Policy**:
105
+ ``Policy`` defines the way the network is updated and interacts with the environment, including three processes: the ``learning`` process, the ``collecting`` process, and the ``evaluation`` process.
106
+
107
+ **MCTS**:
108
+ ``MCTS`` defines the structure of the Monte Carlo search tree and the way it interacts with the Policy. The implementation of MCTS includes two languages: Python and C++, implemented in ``ptree`` and ``ctree``, respectively.
109
+
110
+ For the file structure of LightZero, please refer to [lightzero_file_structure](https://github.com/opendilab/LightZero/blob/main/assets/lightzero_file_structure.svg).
111
+
112
+ ### Integrated Algorithms
113
+ LightZero is a library with a [PyTorch](https://pytorch.org/) implementation of MCTS algorithms (sometimes combined with cython and cpp), including:
114
+ - [AlphaZero](https://www.science.org/doi/10.1126/science.aar6404)
115
+ - [MuZero](https://arxiv.org/abs/1911.08265)
116
+ - [Sampled MuZero](https://arxiv.org/abs/2104.06303)
117
+ - [Stochastic MuZero](https://openreview.net/pdf?id=X6D9bAHhBQ1)
118
+ - [EfficientZero](https://arxiv.org/abs/2111.00210)
119
+ - [Gumbel MuZero](https://openreview.net/pdf?id=bERaNdoegnO&)
120
+
121
+ The environments and algorithms currently supported by LightZero are shown in the table below:
122
+
123
+ | Env./Algo. | AlphaZero | MuZero | EfficientZero | Sampled EfficientZero | Gumbel MuZero | Stochastic MuZero |
124
+ |---------------| --------- | ------ |-------------| ------------------ | ---------- |----------------|
125
+ | TicTacToe | ✔ | ✔ | 🔒 | 🔒 | ✔ | 🔒 |
126
+ | Gomoku | ✔ | ✔ | 🔒 | 🔒 | ✔ | 🔒 |
127
+ | Connect4 | ✔ | ✔ | 🔒 | 🔒 | 🔒 | 🔒 |
128
+ | 2048 | ✔ | ✔ | 🔒 | 🔒 | 🔒 | ✔ |
129
+ | Chess | 🔒 | 🔒 | 🔒 | 🔒 | 🔒 | 🔒 |
130
+ | Go | 🔒 | 🔒 | 🔒 | 🔒 | 🔒 | 🔒 |
131
+ | CartPole | --- | ✔ | ✔ | ✔ | ✔ | ✔ |
132
+ | Pendulum | --- | ✔ | ✔ | ✔ | ✔ | ✔ |
133
+ | LunarLander | --- | ✔ | ✔ | ✔ | ✔ | ✔ |
134
+ | BipedalWalker | --- | ✔ | ✔ | ✔ | ✔ | 🔒 |
135
+ | Atari | --- | ✔ | ✔ | ✔ | ✔ | ✔ |
136
+ | MuJoCo | --- | ✔ | ✔ | ✔ | 🔒 | 🔒 |
137
+ | MiniGrid | --- | ✔ | ✔ | ✔ | 🔒 | 🔒 |
138
+ | Bsuite | --- | ✔ | ✔ | ✔ | 🔒 | 🔒 |
139
+
140
+ <sup>(1): "✔" means that the corresponding item is finished and well-tested.</sup>
141
+
142
+ <sup>(2): "🔒" means that the corresponding item is in the waiting-list (Work In Progress).</sup>
143
+
144
+ <sup>(3): "---" means that this algorithm doesn't support this environment.</sup>
145
+
146
+
147
+ ## Installation
148
+
149
+ You can install the latest LightZero in development from the GitHub source codes with the following command:
150
+
151
+ ```bash
152
+ git clone https://github.com/opendilab/LightZero.git
153
+ cd LightZero
154
+ pip3 install -e .
155
+ ```
156
+
157
+ Kindly note that LightZero currently supports compilation only on `Linux` and `macOS` platforms.
158
+ We are actively working towards extending this support to the `Windows` platform.
159
+ Your patience during this transition is greatly appreciated.
160
+
161
+ ## Installation with Docker
162
+
163
+ We also provide a Dockerfile that sets up an environment with all dependencies needed to run the LightZero library. This Docker image is based on Ubuntu 20.04 and installs Python 3.8, along with other necessary tools and libraries.
164
+ Here's how to use our Dockerfile to build a Docker image, run a container from this image, and execute LightZero code inside the container.
165
+ 1. **Download the Dockerfile**: The Dockerfile is located in the root directory of the LightZero repository. Download this [file](https://github.com/opendilab/LightZero/blob/main/Dockerfile) to your local machine.
166
+ 2. **Prepare the build context**: Create a new empty directory on your local machine, move the Dockerfile into this directory, and navigate into this directory. This step helps to avoid sending unnecessary files to the Docker daemon during the build process.
167
+ ```bash
168
+ mkdir lightzero-docker
169
+ mv Dockerfile lightzero-docker/
170
+ cd lightzero-docker/
171
+ ```
172
+ 3. **Build the Docker image**: Use the following command to build the Docker image. This command should be run from inside the directory that contains the Dockerfile.
173
+ ```bash
174
+ docker build -t ubuntu-py38-lz:latest -f ./Dockerfile .
175
+ ```
176
+ 4. **Run a container from the image**: Use the following command to start a container from the image in interactive mode with a Bash shell.
177
+ ```bash
178
+ docker run -dit --rm ubuntu-py38-lz:latest /bin/bash
179
+ ```
180
+ 5. **Execute LightZero code inside the container**: Once you're inside the container, you can run the example Python script with the following command:
181
+ ```bash
182
+ python ./LightZero/zoo/classic_control/cartpole/config/cartpole_muzero_config.py
183
+ ```
184
+
185
+ [comment]: <> (- [AlphaGo Zero]&#40;https://www.nature.com/articles/nature24270&#41; )
186
+
187
+ ## Quick Start
188
+
189
+ Train a MuZero agent to play [CartPole](https://gymnasium.farama.org/environments/classic_control/cart_pole/):
190
+
191
+ ```bash
192
+ cd LightZero
193
+ python3 -u zoo/classic_control/cartpole/config/cartpole_muzero_config.py
194
+ ```
195
+
196
+ Train a MuZero agent to play [Pong](https://gymnasium.farama.org/environments/atari/pong/):
197
+
198
+ ```bash
199
+ cd LightZero
200
+ python3 -u zoo/atari/config/atari_muzero_config.py
201
+ ```
202
+
203
+ Train a MuZero agent to play [TicTacToe](https://en.wikipedia.org/wiki/Tic-tac-toe):
204
+
205
+ ```bash
206
+ cd LightZero
207
+ python3 -u zoo/board_games/tictactoe/config/tictactoe_muzero_bot_mode_config.py
208
+ ```
209
+
210
+ ## Benchmark
211
+
212
+ <details open><summary>Click to collapse</summary>
213
+
214
+ - Below are the benchmark results of [AlphaZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/alphazero.py) and [MuZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/muzero.py) on three board games: [TicTacToe](https://github.com/opendilab/LightZero/blob/main/zoo/board_games/tictactoe/envs/tictactoe_env.py), [Connect4](https://github.com/opendilab/LightZero/blob/main/zoo/board_games/connect4/envs/connect4_env.py), [Gomoku](https://github.com/opendilab/LightZero/blob/main/zoo/board_games/gomoku/envs/gomoku_env.py).
215
+ <p align="center">
216
+ <img src="assets/benchmark/main/tictactoe_bot-mode_main.png" alt="tictactoe_bot-mode_main" width="30%" height="auto" style="margin: 0 1%;">
217
+ <img src="assets/benchmark/main/gomoku_bot-mode_main.png" alt="connect4_bot-mode_main" width="30%" height="auto" style="margin: 0 1%;">
218
+ <img src="assets/benchmark/main/gomoku_bot-mode_main.png" alt="gomoku_bot-mode_main" width="30%" height="auto" style="margin: 0 1%;">
219
+ </p>
220
+
221
+ - Below are the benchmark results of [MuZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/muzero.py), [MuZero w/ SSL](https://github.com/opendilab/LightZero/blob/main/lzero/policy/muzero.py) , [EfficientZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/efficientzero.py) and [Sampled EfficientZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/sampled_efficientzero.py) on three discrete action space games in [Atari](https://github.com/opendilab/LightZero/blob/main/zoo/atari/envs/atari_lightzero_env.py).
222
+ <p align="center">
223
+ <img src="assets/benchmark/main/pong_main.png" alt="pong_main" width="23%" height="auto" style="margin: 0 1%;">
224
+ <img src="assets/benchmark/main/qbert_main.png" alt="qbert_main" width="23%" height="auto" style="margin: 0 1%;">
225
+ <img src="assets/benchmark/main/mspacman_main.png" alt="mspacman_main" width="23%" height="auto" style="margin: 0 1%;">
226
+ <img src="assets/benchmark/ablation/mspacman_sez_K.png" alt="mspacman_sez_K" width="23%" height="auto" style="margin: 0 1%;">
227
+ </p>
228
+
229
+
230
+ - Below are the benchmark results of [Sampled EfficientZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/sampled_efficientzero.py) with ``Factored/Gaussian`` policy representation on three classic continuous action space games: [Pendulum-v1](https://github.com/opendilab/LightZero/blob/main/zoo/classic_control/pendulum/envs/pendulum_lightzero_env.py), [LunarLanderContinuous-v2](https://github.com/opendilab/LightZero/blob/main/zoo/box2d/lunarlander/envs/lunarlander_env.py), [BipedalWalker-v3](https://github.com/opendilab/LightZero/blob/main/zoo/box2d/bipedalwalker/envs/bipedalwalker_env.py)
231
+ and two MuJoCo continuous action space games: [Hopper-v3](https://github.com/opendilab/LightZero/blob/main/zoo/mujoco/envs/mujoco_lightzero_env.py), [Walker2d-v3](https://github.com/opendilab/LightZero/blob/main/zoo/mujoco/envs/mujoco_lightzero_env.py).
232
+ > "Factored Policy" indicates that the agent learns a policy network that outputs a categorical distribution. After manual discretization, the dimensions of the action space for the five environments are 11, 49 (7^2), 256 (4^4), 64 (4^3), and 4096 (4^6), respectively. On the other hand, "Gaussian Policy" refers to the agent learning a policy network that directly outputs parameters (mu and sigma) for a Gaussian distribution.
233
+ <p align="center">
234
+ <img src="assets/benchmark/main/pendulum_main.png" alt="pendulum_main" width="30%" height="auto" style="margin: 0 1%;">
235
+ <img src="assets/benchmark/ablation/pendulum_sez_K.png" alt="pendulum_sez_K" width="30%" height="auto" style="margin: 0 1%;">
236
+ <img src="assets/benchmark/main/lunarlander_main.png" alt="lunarlander_main" width="30%" height="auto" style="margin: 0 1%;">
237
+ </p>
238
+ <p align="center">
239
+ <img src="assets/benchmark/main/bipedalwalker_main.png" alt="bipedalwalker_main" width="30%" height="auto" style="margin: 0 1%;">
240
+ <img src="assets/benchmark/main/hopper_main.png" alt="hopper_main" width="31.5%" height="auto" style="margin: 0 1%;">
241
+ <img src="assets/benchmark/main/walker2d_main.png" alt="walker2d_main" width="31.5%" height="auto" style="margin: 0 1%;">
242
+ </p>
243
+
244
+ - Below are the benchmark results of [GumbelMuZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/gumbel_muzero.py) and [MuZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/muzero.py) (under different simulation cost) on four environments: [PongNoFrameskip-v4](https://github.com/opendilab/LightZero/blob/main/zoo/atari/envs/atari_lightzero_env.py), [MsPacmanNoFrameskip-v4]((https://github.com/opendilab/LightZero/blob/main/zoo/atari/envs/atari_lightzero_env.py)), [Gomoku](https://github.com/opendilab/LightZero/blob/main/zoo/board_games/gomoku/envs/gomoku_env.py), and [LunarLanderContinuous-v2](https://github.com/opendilab/LightZero/blob/main/zoo/box2d/lunarlander/envs/lunarlander_env.py).
245
+ <p align="center">
246
+ <img src="assets/benchmark/ablation/pong_gmz_ns.png" alt="pong_gmz_ns" width="23%" height="auto" style="margin: 0 1%;">
247
+ <img src="assets/benchmark/ablation/mspacman_gmz_ns.png" alt="mspacman_gmz_ns" width="23%" height="auto" style="margin: 0 1%;">
248
+ <img src="assets/benchmark/ablation/gomoku_bot-mode_gmz_ns.png" alt="gomoku_bot-mode_gmz_ns" width="23%" height="auto" style="margin: 0 1%;">
249
+ <img src="assets/benchmark/ablation/lunarlander_gmz_ns.png" alt="lunarlander_gmz_ns" width="23%" height="auto" style="margin: 0 1%;">
250
+ </p>
251
+
252
+ - Below are the benchmark results of [StochasticMuZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/stochastic_muzero.py) and [MuZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/muzero.py) on [2048 environment](https://github.com/opendilab/LightZero/blob/main/zoo/game_2048/envs/game_2048_env.py) with varying levels of chance (num_chances=2 and 5).
253
+ <p align="center">
254
+ <img src="assets/benchmark/main/2048/2048_stochasticmz_mz.png" alt="2048_stochasticmz_mz" width="30%" height="auto" style="margin: 0 1%;">
255
+ <img src="assets/benchmark/main/2048/2048_stochasticmz_mz_nc5.png" alt="mspacman_gmz_ns" width="30%" height="auto" style="margin: 0 1%;">
256
+ </p>
257
+
258
+ - Below are the benchmark results of various MCTS exploration mechanisms of [MuZero w/ SSL](https://github.com/opendilab/LightZero/blob/main/lzero/policy/muzero.py) in the [MiniGrid environment](https://github.com/opendilab/LightZero/blob/main/zoo/minigrid/envs/minigrid_lightzero_env.py).
259
+ <p align="center">
260
+ <img src="assets/benchmark/main/minigrid/keycorridors3r3_exploration.png" alt="keycorridors3r3_exploration" width="30%" height="auto" style="margin: 0 1%;">
261
+ <img src="assets/benchmark/main/minigrid/fourrooms_exploration.png" alt="fourrooms_exploration" width="30%" height="auto" style="margin: 0 1%;">
262
+ </p>
263
+
264
+ </details>
265
+
266
+
267
+ ## Awesome-MCTS Notes
268
+
269
+ ### Paper Notes
270
+ The following are the detailed paper notes (in Chinese) of the above algorithms:
271
+
272
+ <details open><summary>Click to collapse</summary>
273
+
274
+
275
+ - [AlphaZero](https://github.com/opendilab/LightZero/blob/main/assets/paper_notes/AlphaZero.pdf)
276
+ - [MuZero](https://github.com/opendilab/LightZero/blob/main/assets/paper_notes/MuZero.pdf)
277
+ - [EfficientZero](https://github.com/opendilab/LightZero/blob/main/assets/paper_notes/EfficientZero.pdf)
278
+ - [SampledMuZero](https://github.com/opendilab/LightZero/blob/main/assets/paper_notes/SampledMuZero.pdf)
279
+ - [GumbelMuZero](https://github.com/opendilab/LightZero/blob/main/assets/paper_notes/GumbelMuZero.pdf)
280
+ - [StochasticMuZero](https://github.com/opendilab/LightZero/blob/main/assets/paper_notes/StochasticMuZero.pdf)
281
+ - [NotationTable](https://github.com/opendilab/LightZero/blob/main/assets/paper_notes/SymbolTable.pdf)
282
+
283
+ </details>
284
+
285
+ ### Algo. Overview
286
+
287
+ The following are the overview MCTS principle diagrams of the above algorithms:
288
+
289
+ <details><summary>Click to expand</summary>
290
+
291
+ - [MCTS](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/mcts_overview.pdf)
292
+ - [AlphaZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/alphazero_overview.pdf)
293
+ - [MuZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/muzero_overview.pdf)
294
+ - [EfficientZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/efficientzero_overview.pdf)
295
+ - [SampledMuZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/sampled_muzero_overview.pdf)
296
+ - [GumbelMuZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/gumbel_muzero_overview.pdf)
297
+
298
+ </details>
299
+
300
+ ## Awesome-MCTS Papers
301
+
302
+ Here is a collection of research papers about **Monte Carlo Tree Search**.
303
+ [This Section](#awesome-msts-papers) will be continuously updated to track the frontier of MCTS.
304
+
305
+ ### Key Papers
306
+
307
+ <details><summary>Click to expand</summary>
308
+
309
+ #### LightZero Implemented series
310
+
311
+ - [2018 _Science_ AlphaZero: A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play](https://www.science.org/doi/10.1126/science.aar6404)
312
+ - [2019 MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model](https://arxiv.org/abs/1911.08265)
313
+ - [2021 EfficientZero: Mastering Atari Games with Limited Data](https://arxiv.org/abs/2111.00210)
314
+ - [2021 Sampled MuZero: Learning and Planning in Complex Action Spaces](https://arxiv.org/abs/2104.06303)
315
+ - [2022 Stochastic MuZero: Planning in Stochastic Environments with A Learned Model](https://openreview.net/pdf?id=X6D9bAHhBQ1)
316
+ - [2022 Gumbel MuZero: Policy Improvement by Planning with Gumbel](https://openreview.net/pdf?id=bERaNdoegnO&)
317
+
318
+ #### AlphaGo series
319
+ - [2015 _Nature_ AlphaGo Mastering the game of Go with deep neural networks and tree search](https://www.nature.com/articles/nature16961)
320
+ - [2017 _Nature_ AlphaGo Zero Mastering the game of Go without human knowledge](https://www.nature.com/articles/nature24270)
321
+ - [2019 ELF OpenGo: An Analysis and Open Reimplementation of AlphaZero](https://arxiv.org/abs/1902.04522)
322
+ - [Code](https://github.com/pytorch/ELF)
323
+ - [2023 Student of Games: A unified learning algorithm for both perfect and imperfect information games](https://www.science.org/doi/10.1126/sciadv.adg3256)
324
+
325
+ #### MuZero series
326
+ - [2022 Online and Offline Reinforcement Learning by Planning with a Learned Model](https://arxiv.org/abs/2104.06294)
327
+ - [2021 Vector Quantized Models for Planning](https://arxiv.org/abs/2106.04615)
328
+ - [2021 Muesli: Combining Improvements in Policy Optimization. ](https://arxiv.org/abs/2104.06159)
329
+ #### MCTS Analysis
330
+ - [2020 Monte-Carlo Tree Search as Regularized Policy Optimization](https://arxiv.org/abs/2007.12509)
331
+ - [2021 Self-Consistent Models and Values](https://arxiv.org/abs/2110.12840)
332
+ - [2022 Adversarial Policies Beat Professional-Level Go AIs](https://arxiv.org/abs/2211.00241)
333
+ - [2022 _PNAS_ Acquisition of Chess Knowledge in AlphaZero.](https://arxiv.org/abs/2111.09259)
334
+
335
+ #### MCTS Application
336
+ - [2023 Symbolic Physics Learner: Discovering governing equations via Monte Carlo tree search](https://openreview.net/pdf?id=ZTK3SefE8_Z)
337
+ - [2022 _Nature_ Discovering faster matrix multiplication algorithms with reinforcement learning](https://www.nature.com/articles/s41586-022-05172-4)
338
+ - [Code](https://github.com/deepmind/alphatensor)
339
+ - [2022 MuZero with Self-competition for Rate Control in VP9 Video Compression](https://arxiv.org/abs/2202.06626)
340
+ - [2021 DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning](https://arxiv.org/abs/2106.06135)
341
+ - [2019 Combining Planning and Deep Reinforcement Learning in Tactical Decision Making for Autonomous Driving](https://arxiv.org/pdf/1905.02680.pdf)
342
+
343
+ </details>
344
+
345
+ ### Other Papers
346
+
347
+ <details><summary>Click to expand</summary>
348
+
349
+ #### ICML
350
+ - [Scalable Safe Policy Improvement via Monte Carlo Tree Search](https://openreview.net/pdf?id=tevbBSzSfK) 2023
351
+ - Alberto Castellini, Federico Bianchi, Edoardo Zorzi, Thiago D. Simão, Alessandro Farinelli, Matthijs T. J. Spaan
352
+ - Key: safe policy improvement online using a MCTS based strategy, Safe Policy Improvement with Baseline Bootstrapping
353
+ - ExpEnv: Gridworld and SysAdmin
354
+ - [Efficient Learning for AlphaZero via Path Consistency](https://proceedings.mlr.press/v162/zhao22h/zhao22h.pdf) 2022
355
+ - Dengwei Zhao, Shikui Tu, Lei Xu
356
+ - Key: limited amount of self-plays, path consistency (PC) optimality
357
+ - ExpEnv: Go, Othello, Gomoku
358
+ - [Visualizing MuZero Models](https://arxiv.org/abs/2102.12924) 2021
359
+ - Joery A. de Vries, Ken S. Voskuil, Thomas M. Moerland, Aske Plaat
360
+ - Key: visualizing the value equivalent dynamics model, action trajectories diverge, two regularization techniques
361
+ - ExpEnv: CartPole and MountainCar.
362
+ - [Convex Regularization in Monte-Carlo Tree Search](https://arxiv.org/pdf/2007.00391.pdf) 2021
363
+ - Tuan Dam, Carlo D'Eramo, Jan Peters, Joni Pajarinen
364
+ - Key: entropy-regularization backup operators, regret analysis, Tsallis etropy,
365
+ - ExpEnv: synthetic tree, Atari
366
+ - [Information Particle Filter Tree: An Online Algorithm for POMDPs with Belief-Based Rewards on Continuous Domains](http://proceedings.mlr.press/v119/fischer20a/fischer20a.pdf) 2020
367
+ - Johannes Fischer, Ömer Sahin Tas
368
+ - Key: Continuous POMDP, Particle Filter Tree, information-based reward shaping, Information Gathering.
369
+ - ExpEnv: POMDPs.jl framework
370
+ - [Code](https://github.com/johannes-fischer/icml2020_ipft)
371
+ - [Retro*: Learning Retrosynthetic Planning with Neural Guided A* Search](http://proceedings.mlr.press/v119/chen20k/chen20k.pdf) 2020
372
+ - Binghong Chen, Chengtao Li, Hanjun Dai, Le Song
373
+ - Key: chemical retrosynthetic planning, neural-based A*-like algorithm, ANDOR tree
374
+ - ExpEnv: USPTO datasets
375
+ - [Code](https://github.com/binghong-ml/retro_star)
376
+ #### ICLR
377
+ - [Become a Proficient Player with Limited Data through Watching Pure Videos](https://openreview.net/pdf?id=Sy-o2N0hF4f) 2023
378
+ - Weirui Ye, Yunsheng Zhang, Pieter Abbeel, Yang Gao
379
+ - Key: pre-training from action-free videos, forward-inverse cycle consistency (FICC) objective based on vector quantization, pre-training phase, fine-tuning phase.
380
+ - ExpEnv: Atari
381
+ - [Policy-Based Self-Competition for Planning Problems](https://arxiv.org/abs/2306.04403) 2023
382
+ - Jonathan Pirnay, Quirin Göttl, Jakob Burger, Dominik Gerhard Grimm
383
+ - Key: self-competition, find strong trajectories by planning against possible strategies of its past self.
384
+ - ExpEnv: Traveling Salesman Problem and the Job-Shop Scheduling Problem.
385
+ - [Explaining Temporal Graph Models through an Explorer-Navigator Framework](https://openreview.net/pdf?id=BR_ZhvcYbGJ) 2023
386
+ - Wenwen Xia, Mincai Lai, Caihua Shan, Yao Zhang, Xinnan Dai, Xiang Li, Dongsheng Li
387
+ - Key: Temporal GNN Explainer, an explorer to find the event subsets with MCTS, a navigator that learns the correlations between events and helps reduce the search space.
388
+ - ExpEnv: Wikipedia and Reddit, Synthetic datasets
389
+ - [SpeedyZero: Mastering Atari with Limited Data and Time](https://openreview.net/pdf?id=Mg5CLXZgvLJ) 2023
390
+ - Yixuan Mei, Jiaxuan Gao, Weirui Ye, Shaohuai Liu, Yang Gao, Yi Wu
391
+ - Key: distributed RL system, Priority Refresh, Clipped LARS
392
+ - ExpEnv: Atari
393
+ - [Efficient Offline Policy Optimization with a Learned Model](https://openreview.net/pdf?id=Yt-yM-JbYFO) 2023
394
+ - Zichen Liu, Siyi Li, Wee Sun Lee, Shuicheng YAN, Zhongwen Xu
395
+ - Key: Regularized One-Step Model-based algorithm for Offline-RL
396
+ - ExpEnv: Atari,BSuite
397
+ - [Code](https://github.com/sail-sg/rosmo/tree/main)
398
+ - [Enabling Arbitrary Translation Objectives with Adaptive Tree Search](https://arxiv.org/pdf/2202.11444.pdf) 2022
399
+ - Wang Ling, Wojciech Stokowiec, Domenic Donato, Chris Dyer, Lei Yu, Laurent Sartran, Austin Matthews
400
+ - Key: adaptive tree search, translation models, autoregressive models,
401
+ - ExpEnv: Chinese–English and Pashto–English tasks from WMT2020, German–English from WMT2014
402
+ - [What's Wrong with Deep Learning in Tree Search for Combinatorial Optimization](https://arxiv.org/abs/2201.10494) 2022
403
+ - Maximili1an Böther, Otto Kißig, Martin Taraz, Sarel Cohen, Karen Seidel, Tobias Friedrich
404
+ - Key: combinatorial optimization, open-source benchmark suite for the NP-hard maximum independent set problem, an in-depth analysis of the popular guided tree search algorithm, compare the tree search implementations to other solvers
405
+ - ExpEnv: NP-hard MAXIMUM INDEPENDENT SET.
406
+ - [Code](https://github.com/maxiboether/mis-benchmark-framework)
407
+ - [Monte-Carlo Planning and Learning with Language Action Value Estimates](https://openreview.net/pdf?id=7_G8JySGecm) 2021
408
+ - Youngsoo Jang, Seokin Seo, Jongmin Lee, Kee-Eung Kim
409
+ - Key: Monte-Carlo tree search with language-driven exploration, locally optimistic language value estimates.
410
+ - ExpEnv: Interactive Fiction (IF) games
411
+ - [Practical Massively Parallel Monte-Carlo Tree Search Applied to Molecular Design](https://arxiv.org/abs/2006.10504) 2021
412
+ - Xiufeng Yang, Tanuj Kr Aasawat, Kazuki Yoshizoe
413
+ - Key: massively parallel Monte-Carlo Tree Search, molecular design, Hash-driven parallel search,
414
+ - ExpEnv: octanol-water partition coefficient (logP) penalized by the synthetic accessibility (SA) and large Ring Penalty score.
415
+ - [Watch the Unobserved: A Simple Approach to Parallelizing Monte Carlo Tree Search](https://arxiv.org/pdf/1810.11755.pdf) 2020
416
+ - Anji Liu, Jianshu Chen, Mingze Yu, Yu Zhai, Xuewen Zhou, Ji Liu
417
+ - Key: parallel Monte-Carlo Tree Search, partition the tree into sub-trees efficiently, compare the observation ratio of each processor.
418
+ - ExpEnv: speedup and performance comparison on JOY-CITY game, average episode return on atari game
419
+ - [Code](https://github.com/liuanji/WU-UCT)
420
+ - [Learning to Plan in High Dimensions via Neural Exploration-Exploitation Trees](https://openreview.net/pdf?id=rJgJDAVKvB) 2020
421
+ - Binghong Chen, Bo Dai, Qinjie Lin, Guo Ye, Han Liu, Le Song
422
+ - Key: meta path planning algorithm, exploits a novel neural architecture which can learn promising search directions from problem structures.
423
+ - ExpEnv: a 2d workspace with a 2 DoF (degrees of freedom) point robot, a 3 DoF stick robot and a 5 DoF snake robot
424
+ #### NeurIPS
425
+ - [LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios](https://openreview.net/pdf?id=oIUXpBnyjv) 2023
426
+ - Yazhe Niu, Yuan Pu, Zhenjie Yang, Xueyan Li, Tong Zhou, Jiyuan Ren, Shuai Hu, Hongsheng Li, Yu Liu
427
+ - Key: the first unified benchmark for deploying MCTS/MuZero in general sequential decision scenarios.
428
+ - ExpEnv: ClassicControl, Box2D, Atari, MuJoCo, GoBigger, MiniGrid, TicTacToe, ConnectFour, Gomoku, 2048, etc.
429
+ - [Large Language Models as Commonsense Knowledge for Large-Scale Task Planning](https://openreview.net/pdf?id=Wjp1AYB8lH) 2023
430
+ - Zirui Zhao, Wee Sun Lee, David Hsu
431
+ - Key: world model (LLM) and the LLM-induced policy can be combined in MCTS, to scale up task planning.
432
+ - ExpEnv: multiplication, travel planning, object rearrangement
433
+ - [Monte Carlo Tree Search with Boltzmann Exploration](https://openreview.net/pdf?id=NG4DaApavi) 2023
434
+ - Michael Painter, Mohamed Baioumy, Nick Hawes, Bruno Lacerda
435
+ - Key: Boltzmann exploration with MCTS, optimal actions for the maximum entropy objective do not necessarily correspond to optimal actions for the original objective, two improved algorithms.
436
+ - ExpEnv: the Frozen Lake environment, the Sailing Problem, Go
437
+ - [Generalized Weighted Path Consistency for Mastering Atari Games](https://openreview.net/pdf?id=vHRLS8HhK1) 2023
438
+ - Dengwei Zhao, Shikui Tu, Lei Xu
439
+ - Key: Generalized Weighted Path Consistency, A weighting mechanism.
440
+ - ExpEnv: Atari
441
+ - [Accelerating Monte Carlo Tree Search with Probability Tree State Abstraction](https://openreview.net/pdf?id=0zeLTZAqaJ) 2023
442
+ - Yangqing Fu, Ming Sun, Buqing Nie, Yue Gao
443
+ - Key: probability tree state abstraction, transitivity and aggregation error bound
444
+ - ExpEnv: Atari, CartPole, LunarLander, Gomoku
445
+ - [Planning for Sample Efficient Imitation Learning](https://openreview.net/forum?id=BkN5UoAqF7) 2022
446
+ - Zhao-Heng Yin, Weirui Ye, Qifeng Chen, Yang Gao
447
+ - Key: Behavioral Cloning,Adversarial Imitation Learning (AIL),MCTS-based RL.
448
+ - ExpEnv: DeepMind Control Suite
449
+ - [Code](https://github.com/zhaohengyin/EfficientImitate)
450
+ - [Evaluation Beyond Task Performance: Analyzing Concepts in AlphaZero in Hex](https://openreview.net/pdf?id=dwKwB2Cd-Km) 2022
451
+ - Charles Lovering, Jessica Zosa Forde, George Konidaris, Ellie Pavlick, Michael L. Littman
452
+ - Key: AlphaZero’s internal representations, model probing and behavioral tests, how these concepts are captured in the network.
453
+ - ExpEnv: Hex
454
+ - [Are AlphaZero-like Agents Robust to Adversarial Perturbations?](https://openreview.net/pdf?id=yZ_JlZaOCzv) 2022
455
+ - Li-Cheng Lan, Huan Zhang, Ti-Rong Wu, Meng-Yu Tsai, I-Chen Wu, 4 Cho-Jui Hsieh
456
+ - Key: adversarial states, first adversarial attack on Go AIs.
457
+ - ExpEnv: Go
458
+ - [Monte Carlo Tree Descent for Black-Box Optimization](https://openreview.net/pdf?id=FzdmrTUyZ4g) 2022
459
+ - Yaoguang Zhai, Sicun Gao
460
+ - Key: Black-Box Optimization, how to further integrate samplebased descent for faster optimization.
461
+ - ExpEnv: synthetic functions for nonlinear optimization, reinforcement learning problems in MuJoCo locomotion environments, and optimization problems in Neural Architecture Search (NAS).
462
+ - [Monte Carlo Tree Search based Variable Selection for High Dimensional Bayesian Optimization](https://openreview.net/pdf?id=SUzPos_pUC) 2022
463
+ - Lei Song∗ , Ke Xue∗ , Xiaobin Huang, Chao Qian
464
+ - Key: a low-dimensional subspace via MCTS, optimizes in the subspace with any Bayesian optimization algorithm.
465
+ - ExpEnv: NAS-bench problems and MuJoCo locomotion
466
+ - [Monte Carlo Tree Search With Iteratively Refining State Abstractions](https://proceedings.neurips.cc/paper/2021/file/9b0ead00a217ea2c12e06a72eec4923f-Paper.pdf) 2021
467
+ - Samuel Sokota, Caleb Ho, Zaheen Ahmad, J. Zico Kolter
468
+ - Key: stochastic environments, Progressive widening, abstraction refining
469
+ - ExpEnv: Blackjack, Trap, five by five Go.
470
+ - [Deep Synoptic Monte Carlo Planning in Reconnaissance Blind Chess](https://proceedings.neurips.cc/paper/2021/file/215a71a12769b056c3c32e7299f1c5ed-Paper.pdf) 2021
471
+ - Gregory Clark
472
+ - Key: imperfect information, belief state with an unweighted particle filter, a novel stochastic abstraction of information states.
473
+ - ExpEnv: reconnaissance blind chess
474
+ - [POLY-HOOT: Monte-Carlo Planning in Continuous Space MDPs with Non-Asymptotic Analysis](https://proceedings.neurips.cc/paper/2020/file/30de24287a6d8f07b37c716ad51623a7-Paper.pdf) 2020
475
+ - Weichao Mao, Kaiqing Zhang, Qiaomin Xie, Tamer Ba¸sar
476
+ - Key: continuous state-action spaces, Hierarchical Optimistic Optimization.
477
+ - ExpEnv: CartPole, Inverted Pendulum, Swing-up, and LunarLander.
478
+ - [Learning Search Space Partition for Black-box Optimization using Monte Carlo Tree Search](https://proceedings.neurips.cc/paper/2020/file/e2ce14e81dba66dbff9cbc35ecfdb704-Paper.pdf) 2020
479
+ - Linnan Wang, Rodrigo Fonseca, Yuandong Tian
480
+ - Key: learns the partition of the search space using a few samples, a nonlinear decision boundary and learns a local model to pick good candidates.
481
+ - ExpEnv: MuJoCo locomotion tasks, Small-scale Benchmarks,
482
+ - [Mix and Match: An Optimistic Tree-Search Approach for Learning Models from Mixture Distributions](https://arxiv.org/abs/1907.10154) 2020
483
+ - Matthew Faw, Rajat Sen, Karthikeyan Shanmugam, Constantine Caramanis, Sanjay Shakkottai
484
+ - Key: covariate shift problem, Mix&Match combines stochastic gradient descent (SGD) with optimistic tree search and model re-use (evolving partially trained models with samples from different mixture distributions)
485
+ - [Code](https://github.com/matthewfaw/mixnmatch)
486
+
487
+ #### Other Conference or Journal
488
+ - [On Monte Carlo Tree Search and Reinforcement Learning](https://www.jair.org/index.php/jair/article/download/11099/26289/20632) Journal of Artificial Intelligence Research 2017.
489
+ - [Sample-Efficient Neural Architecture Search by Learning Actions for Monte Carlo Tree Search](https://arxiv.org/pdf/1906.06832) IEEE Transactions on Pattern Analysis and Machine Intelligence 2022.
490
+ </details>
491
+
492
+
493
+ ## Feedback and Contribution
494
+ - [File an issue](https://github.com/opendilab/LightZero/issues/new/choose) on Github
495
+ - Contact our email (opendilab@pjlab.org.cn)
496
+
497
+ - We appreciate all the feedback and contributions to improve LightZero, both algorithms and system designs.
498
+
499
+ [comment]: <> (- Contributes to our future plan [Roadmap]&#40;https://github.com/opendilab/LightZero/projects&#41;)
500
+
501
+ [comment]: <> (And `CONTRIBUTING.md` offers some necessary information.)
502
+
503
+
504
+ ## Citation
505
+ ```latex
506
+ @misc{lightzero,
507
+ title={LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios},
508
+ author={Yazhe Niu and Yuan Pu and Zhenjie Yang and Xueyan Li and Tong Zhou and Jiyuan Ren and Shuai Hu and Hongsheng Li and Yu Liu},
509
+ year={2023},
510
+ eprint={2310.08348},
511
+ archivePrefix={arXiv},
512
+ primaryClass={cs.LG}
513
+ }
514
+ ```
515
+
516
+ ## Acknowledgments
517
+
518
+ This project has been developed partially based on the following pioneering works on GitHub repositories.
519
+ We express our profound gratitude for these foundational resources:
520
+ - https://github.com/opendilab/DI-engine
521
+ - https://github.com/deepmind/mctx
522
+ - https://github.com/YeWR/EfficientZero
523
+ - https://github.com/werner-duvaud/muzero-general
524
+
525
+ We would like to extend our special thanks to the following contributors [@PaParaZz1](https://github.com/PaParaZz1), [@karroyan](https://github.com/karroyan), [@nighood](https://github.com/nighood),
526
+ [@jayyoung0802](https://github.com/jayyoung0802), [@timothijoe](https://github.com/timothijoe), [@TuTuHuss](https://github.com/TuTuHuss), [@HarryXuancy](https://github.com/HarryXuancy), [@puyuan1996](https://github.com/puyuan1996), [@HansBug](https://github.com/HansBug) for their valuable contributions and support to this algorithm library.
527
+
528
+ Thanks to all who contributed to this project:
529
+ <a href="https://github.com/opendilab/LightZero/graphs/contributors">
530
+ <img src="https://contrib.rocks/image?repo=opendilab/LightZero" />
531
+ </a>
532
+
533
+
534
+ ## License
535
+ All code within this repository is under [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
536
+
537
+ <p align="right">(<a href="#top">Back to top</a>)</p>
documents/LightZero_README.zh.md ADDED
@@ -0,0 +1,533 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div id="top"></div>
2
+
3
+ # LightZero
4
+
5
+ <div align="center">
6
+ <img width="1000px" height="auto" src="https://github.com/opendilab/LightZero/blob/main/LightZero.png"></a>
7
+ </div>
8
+
9
+ ---
10
+
11
+ [![Twitter](https://img.shields.io/twitter/url?style=social&url=https%3A%2F%2Ftwitter.com%2Fopendilab)](https://twitter.com/opendilab)
12
+ [![PyPI](https://img.shields.io/pypi/v/LightZero)](https://pypi.org/project/LightZero/)
13
+ ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/LightZero)
14
+ ![Loc](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/HansBug/e002642132ec758e99264118c66778a4/raw/loc.json)
15
+ ![Comments](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/HansBug/e002642132ec758e99264118c66778a4/raw/comments.json)
16
+
17
+ [![Code Test](https://github.com/opendilab/LightZero/workflows/Code%20Test/badge.svg)](https://github.com/opendilab/LightZero/actions?query=workflow%3A%22Code+Test%22)
18
+ [![Badge Creation](https://github.com/opendilab/LightZero/workflows/Badge%20Creation/badge.svg)](https://github.com/opendilab/LightZero/actions?query=workflow%3A%22Badge+Creation%22)
19
+ [![Package Release](https://github.com/opendilab/LightZero/workflows/Package%20Release/badge.svg)](https://github.com/opendilab/LightZero/actions?query=workflow%3A%22Package+Release%22)
20
+
21
+ ![GitHub Org's stars](https://img.shields.io/github/stars/opendilab)
22
+ [![GitHub stars](https://img.shields.io/github/stars/opendilab/LightZero)](https://github.com/opendilab/LightZero/stargazers)
23
+ [![GitHub forks](https://img.shields.io/github/forks/opendilab/LightZero)](https://github.com/opendilab/LightZero/network)
24
+ ![GitHub commit activity](https://img.shields.io/github/commit-activity/m/opendilab/LightZero)
25
+ [![GitHub issues](https://img.shields.io/github/issues/opendilab/LightZero)](https://github.com/opendilab/LightZero/issues)
26
+ [![GitHub pulls](https://img.shields.io/github/issues-pr/opendilab/LightZero)](https://github.com/opendilab/LightZero/pulls)
27
+ [![Contributors](https://img.shields.io/github/contributors/opendilab/LightZero)](https://github.com/opendilab/LightZero/graphs/contributors)
28
+ [![GitHub license](https://img.shields.io/github/license/opendilab/LightZero)](https://github.com/opendilab/LightZero/blob/master/LICENSE)
29
+
30
+ 最近更新于 2023.12.07 LightZero-v0.0.3
31
+
32
+ > LightZero 是一个轻量、高效、易懂的 MCTS+RL 开源算法库。
33
+
34
+ [English](https://github.com/opendilab/LightZero/blob/main/README.md) | 简体中文 | [论文链接](https://arxiv.org/pdf/2310.08348.pdf)
35
+
36
+ ## 背景
37
+
38
+ 以 AlphaZero, MuZero 为代表的结合蒙特卡洛树搜索 (Monte Carlo Tree Search, MCTS) 和深度强化学习 (Deep Reinforcemeent Learning, DRL) 的方法,在诸如围棋,Atari 等各种游戏上取得了超人的水平,也在诸如蛋白质结构预测,矩阵乘法算法寻找等科学领域取得了可喜的进展。下图为蒙特卡洛树搜索(MCTS)算法族的发展历史:
39
+ ![pipeline](assets/mcts_rl_evolution_overview.png)
40
+
41
+ ## 概览
42
+
43
+ **LightZero** 是一个结合了蒙特卡洛树搜索和强化学习的开源算法工具包。 它支持一系列基于 MCTS 的 RL 算法,具有以下优点:
44
+ - 轻量。
45
+ - 高效。
46
+ - 易懂。
47
+
48
+ 详情请参考[特点](#features)、[框架结构](#framework-structure)和[集成算法](#integrated-algorithms)。
49
+
50
+ **LightZero** 的目标是**标准化 MCTS 算法族,以加速相关研究和应用。** [Benchmark](#benchmark) 中介绍了目前所有已实现算法的性能比较。
51
+
52
+ ### 导航
53
+ - [概览](#概览)
54
+ - [导航](#导航)
55
+ - [特点](#特点)
56
+ - [框架结构](#框架结构)
57
+ - [集成算法](#集成算法)
58
+ - [安装方法](#安装方法)
59
+ - [快速开始](#快速开始)
60
+ - [基线算法比较](#基线算法比较)
61
+ - [MCTS相关笔记](#MCTS-相关笔记)
62
+ - [论文笔记](#论文笔记)
63
+ - [算法框架图](#算法框架图)
64
+ - [MCTS相关论文](#MCTS-相关论文)
65
+ - [重要论文](#重要论文)
66
+ - [其他论文](#其他论文)
67
+ - [反馈意见和贡献](#反馈意见和贡献)
68
+ - [引用](#引用)
69
+ - [致谢](#致谢)
70
+ - [许可证](#许可证)
71
+
72
+ ### 特点
73
+ **轻量**:LightZero 中集成了多种 MCTS 族算法,能够在同一框架下轻量化地解决多种属性的决策问题。
74
+
75
+ **高效**:LightZero 针对 MCTS 族算法中耗时最长的环节,采用混合异构计算编程提高计算效率。
76
+
77
+ **易懂**:LightZero 为所有集成的算法提供了详细文档和算法框架图,帮助用户理解算法内核,在同一范式下比较算法之间的异同。同时,LightZero 也为算法的代码实现提供了函数调用图和网络结构图,便于用户定位关键代码。
78
+
79
+ ### 框架结构
80
+
81
+ <p align="center">
82
+ <img src="assets/lightzero_pipeline.svg" alt="Image Description 2" width="50%" height="auto" style="margin: 0 1%;">
83
+ </p>
84
+
85
+ 上图是 LightZero 的框架流程图。我们在下面简介其中的3个核心模块:
86
+
87
+ **Model**:
88
+ ``Model`` 用于定义网络结构,包含``__init__``函数用于初始化网络结构,和``forward``函数用于计算网络的前向传播。
89
+
90
+ **Policy**:
91
+ ``Policy`` 定义了对网络的更新方式和与环境交互的方式,包括三个过程,分别是训练过程(learn)、采样过程(collect)和评估过程(evaluate)。
92
+
93
+ **MCTS**:
94
+
95
+ ``MCTS`` 定义了蒙特卡洛搜索树的结构和与``Policy``的交互方式。``MCTS``的实现包括 python 和 cpp 两种,分别在``ptree``和``ctree``中实现。
96
+
97
+ 关于 LightZero 的文件结构,请参考 [lightzero_file_structure](https://github.com/opendilab/LightZero/blob/main/assets/lightzero_file_structure.svg)。
98
+
99
+ ### 集成算法
100
+ LightZero 是基于 [PyTorch](https://pytorch.org/) 实现的 MCTS 算法库,在 MCTS 的实现中也用到了 cython 和 cpp。同时,LightZero 的框架主要基于 [DI-engine](https://github.com/opendilab/DI-engine) 实现。目前 LightZero 中集成的算法包括:
101
+ - [AlphaZero](https://www.science.org/doi/10.1126/science.aar6404)
102
+ - [MuZero](https://arxiv.org/abs/1911.08265)
103
+ - [Sampled MuZero](https://arxiv.org/abs/2104.06303)
104
+ - [Stochastic MuZero](https://openreview.net/pdf?id=X6D9bAHhBQ1)
105
+ - [EfficientZero](https://arxiv.org/abs/2111.00210)
106
+ - [Gumbel MuZero](https://openreview.net/pdf?id=bERaNdoegnO&)
107
+
108
+
109
+ LightZero 目前支持的环境及算法如下表所示:
110
+
111
+ | Env./Algo. | AlphaZero | MuZero | EfficientZero | Sampled EfficientZero | Gumbel MuZero | Stochastic MuZero |
112
+ |---------------| --------- | ------ |-------------| ------------------ | ---------- |----------------|
113
+ | TicTacToe | ✔ | ✔ | 🔒 | 🔒 | ✔ | 🔒 |
114
+ | Gomoku | ✔ | ✔ | 🔒 | 🔒 | ✔ | 🔒 |
115
+ | Connect4 | ✔ | ✔ | 🔒 | 🔒 | 🔒 | 🔒 |
116
+ | 2048 | ✔ | ✔ | 🔒 | 🔒 | 🔒 | ✔ |
117
+ | Chess | 🔒 | 🔒 | 🔒 | 🔒 | 🔒 | 🔒 |
118
+ | Go | 🔒 | 🔒 | 🔒 | 🔒 | 🔒 | 🔒 |
119
+ | CartPole | --- | ✔ | ✔ | ✔ | ✔ | ✔ |
120
+ | Pendulum | --- | ✔ | ✔ | ✔ | ✔ | ✔ |
121
+ | LunarLander | --- | ✔ | ✔ | ✔ | ✔ | ✔ |
122
+ | BipedalWalker | --- | ✔ | ✔ | ✔ | ✔ | 🔒 |
123
+ | Atari | --- | ✔ | ✔ | ✔ | ✔ | ✔ |
124
+ | MuJoCo | --- | ✔ | ✔ | ✔ | 🔒 | 🔒 |
125
+ | MiniGrid | --- | ✔ | ✔ | ✔ | 🔒 | 🔒 |
126
+ | Bsuite | --- | ✔ | ✔ | ✔ | 🔒 | 🔒 |
127
+
128
+ <sup>(1): "✔" 表示对应的项目已经完成并经过良好的测试。</sup>
129
+
130
+ <sup>(2): "🔒" 表示对应的项目在等待列表中(正在进行中)。</sup>
131
+
132
+ <sup>(3): "---" 表示该算法不支持此环境。</sup>
133
+
134
+ ## 安装方法
135
+
136
+ 可以用以下命令从 Github 的源码中安装最新版的 LightZero:
137
+
138
+ ```bash
139
+ git clone https://github.com/opendilab/LightZero.git
140
+ cd LightZero
141
+ pip3 install -e .
142
+ ```
143
+
144
+ 请注意,LightZero 目前仅支持在 `Linux` 和 `macOS` 平台上进行编译。
145
+ 我们正在积极将该支持扩展到 `Windows` 平台。
146
+
147
+ ### 使用 Docker 进行安装
148
+
149
+ 我们也提供了一个Dockerfile,用于设置包含运行 LightZero 库所需所有依赖项的环境。此 Docker 镜像基于 Ubuntu 20.04,并安装了Python 3.8以及其他必要的工具和库。
150
+ 以下是如何使用我们的 Dockerfile 来构建 Docker 镜像,从该镜像运行一个容器,并在容器内执行 LightZero 代码的步骤。
151
+
152
+ 1. **下载 Dockerfile**:Dockerfile 位于 LightZero 仓库的根目录中。将此[文件](https://github.com/opendilab/LightZero/blob/main/Dockerfile)下载到您的本地机器。
153
+
154
+ 2. **准备构建上下文**:在您的本地机器上创建一个新的空目录,将 Dockerfile 移动到此目录,并导航到此目录。这一步有助于在构建过程中避免向 Docker 守护进程发送不必要的文件。
155
+ ```bash
156
+ mkdir lightzero-docker
157
+ mv Dockerfile lightzero-docker/
158
+ cd lightzero-docker/
159
+ ```
160
+ 3. **构建 Docker 镜像**:使用以下命令构建 Docker 镜像。此命令应在包含 Dockerfile 的目录内运行。
161
+ ```bash
162
+ docker build -t ubuntu-py38-lz:latest -f ./Dockerfile .
163
+ ```
164
+ 4. **从镜像运行容器**:使用以下命令以交互模式启动一个 Bash shell 的容器。
165
+ ```bash
166
+ docker run -dit --rm ubuntu-py38-lz:latest /bin/bash
167
+ ```
168
+ 5. **在容器内执行 LightZero 代码**:一旦你在容器内部,你可以使用以下命令运行示例 Python 脚本:
169
+ ```bash
170
+ python ./LightZero/zoo/classic_control/cartpole/config/cartpole_muzero_config.py
171
+ ```
172
+
173
+ ## 快速开始
174
+ 使用如下代码在 [CartPole](https://gymnasium.farama.org/environments/classic_control/cart_pole/) 环境上快速训练一个 MuZero 智能体:
175
+
176
+ ```bash
177
+ cd LightZero
178
+ python3 -u zoo/classic_control/cartpole/config/cartpole_muzero_config.py
179
+ ```
180
+
181
+ 使用如下代码在 [Pong](https://gymnasium.farama.org/environments/atari/pong/) 环境上快速训练一个 MuZero 智能体:
182
+
183
+ ```bash
184
+ cd LightZero
185
+ python3 -u zoo/atari/config/atari_muzero_config.py
186
+ ```
187
+
188
+ 使用如下代码在 [TicTacToe](https://en.wikipedia.org/wiki/Tic-tac-toe) 环境上快速训练一个 MuZero 智能体:
189
+
190
+ ```bash
191
+ cd LightZero
192
+ python3 -u zoo/board_games/tictactoe/config/tictactoe_muzero_bot_mode_config.py
193
+ ```
194
+
195
+ ## 基线算法比较
196
+
197
+ <details open><summary>点击折叠</summary>
198
+
199
+ - [AlphaZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/alphazero.py) 和 [MuZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/muzero.py) 在3个棋类游戏([TicTacToe (井字棋)](https://github.com/opendilab/LightZero/blob/main/zoo/board_games/tictactoe/envs/tictactoe_env.py),[Connect4](https://github.com/opendilab/LightZero/blob/main/zoo/board_games/connect4/envs/connect4_env.py) 和 [Gomoku (五子棋)](https://github.com/opendilab/LightZero/blob/main/zoo/board_games/gomoku/envs/gomoku_env.py))上的基线结果:
200
+ <p align="center">
201
+ <img src="assets/benchmark/main/tictactoe_bot-mode_main.png" alt="tictactoe_bot-mode_main" width="30%" height="auto" style="margin: 0 1%;">
202
+ <img src="assets/benchmark/main/connect4_bot-mode_main.png" alt="connect4_bot-mode_main" width="30%" height="auto" style="margin: 0 1%;">
203
+ <img src="assets/benchmark/main/gomoku_bot-mode_main.png" alt="gomoku_bot-mode_main" width="30%" height="auto" style="margin: 0 1%;">
204
+ </p>
205
+
206
+ - [MuZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/muzero.py),[MuZero w/ SSL](https://github.com/opendilab/LightZero/blob/main/lzero/policy/muzero.py),[EfficientZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/efficientzero.py) 和 [Sampled EfficientZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/sampled_efficientzero.py) 在3个代表性的 [Atari](https://github.com/opendilab/LightZero/blob/main/zoo/atari/envs/atari_lightzero_env.py) 离散动作空间环境上的基线结果:
207
+ <p align="center">
208
+ <img src="assets/benchmark/main/pong_main.png" alt="pong_main" width="23%" height="auto" style="margin: 0 1%;">
209
+ <img src="assets/benchmark/main/qbert_main.png" alt="qbert_main" width="23%" height="auto" style="margin: 0 1%;">
210
+ <img src="assets/benchmark/main/mspacman_main.png" alt="mspacman_main" width="23%" height="auto" style="margin: 0 1%;">
211
+ <img src="assets/benchmark/ablation/mspacman_sez_K.png" alt="mspacman_sez_K" width="23%" height="auto" style="margin: 0 1%;">
212
+ </p>
213
+
214
+ - [Sampled EfficientZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/sampled_efficientzero.py)(包括 ``Factored/Gaussian`` 2种策略表征方法)在5个连续动作空间环境([Pendulum-v1](https://github.com/opendilab/LightZero/blob/main/zoo/classic_control/pendulum/envs/pendulum_lightzero_env.py),[LunarLanderContinuous-v2](https://github.com/opendilab/LightZero/blob/main/zoo/box2d/lunarlander/envs/lunarlander_env.py),[BipedalWalker-v3](https://github.com/opendilab/LightZero/blob/main/zoo/box2d/bipedalwalker/envs/bipedalwalker_env.py),[Hopper-v3](https://github.com/opendilab/LightZero/blob/main/zoo/mujoco/envs/mujoco_lightzero_env.py) 和 [Walker2d-v3](https://github.com/opendilab/LightZero/blob/main/zoo/mujoco/envs/mujoco_lightzero_env.py))上的基线结果:
215
+ > 其中 ``Factored Policy`` 表示智能体学习一个输出离散分布的策略网络,上述5种环境手动离散化后的动作空间维度分别为11、49(7^2)、256(4^4)、64 (4^3) 和 4096 (4^6)。``Gaussian Policy``表示智能体学习一个策略网络,该网络直接输出高斯分布的参数 μ 和 σ。
216
+
217
+ <p align="center">
218
+ <img src="assets/benchmark/main/pendulum_main.png" alt="pendulum_main" width="30%" height="auto" style="margin: 0 1%;">
219
+ <img src="assets/benchmark/ablation/pendulum_sez_K.png" alt="pendulum_sez_K" width="30%" height="auto" style="margin: 0 1%;">
220
+ <img src="assets/benchmark/main/lunarlander_main.png" alt="lunarlander_main" width="30%" height="auto" style="margin: 0 1%;">
221
+ </p>
222
+ <p align="center">
223
+ <img src="assets/benchmark/main/bipedalwalker_main.png" alt="bipedalwalker_main" width="30%" height="auto" style="margin: 0 1%;">
224
+ <img src="assets/benchmark/main/hopper_main.png" alt="hopper_main" width="31.5%" height="auto" style="margin: 0 1%;">
225
+ <img src="assets/benchmark/main/walker2d_main.png" alt="walker2d_main" width="31.5%" height="auto" style="margin: 0 1%;">
226
+ </p>
227
+
228
+ - [Gumbel MuZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/gumbel_muzero.py) 和 [MuZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/muzero.py) 在不同模拟次数下,在四个环境([PongNoFrameskip-v4](https://github.com/opendilab/LightZero/blob/main/zoo/atari/envs/atari_lightzero_env.py), [MsPacmanNoFrameskip-v4]((https://github.com/opendilab/LightZero/blob/main/zoo/atari/envs/atari_lightzero_env.py)), [Gomoku](https://github.com/opendilab/LightZero/blob/main/zoo/board_games/gomoku/envs/gomoku_env.py) 和 [LunarLanderContinuous-v2](https://github.com/opendilab/LightZero/blob/main/zoo/box2d/lunarlander/envs/lunarlander_env.py))上的基线结果:
229
+ <p align="center">
230
+ <img src="assets/benchmark/ablation/pong_gmz_ns.png" alt="pong_gmz_ns" width="23%" height="auto" style="margin: 0 1%;">
231
+ <img src="assets/benchmark/ablation/mspacman_gmz_ns.png" alt="mspacman_gmz_ns" width="23%" height="auto" style="margin: 0 1%;">
232
+ <img src="assets/benchmark/ablation/gomoku_bot-mode_gmz_ns.png" alt="gomoku_bot-mode_gmz_ns" width="23%" height="auto" style="margin: 0 1%;">
233
+ <img src="assets/benchmark/ablation/lunarlander_gmz_ns.png" alt="lunarlander_gmz_ns" width="23%" height="auto" style="margin: 0 1%;">
234
+ </p>
235
+
236
+ - [Stochastic MuZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/stochastic_muzero.py) 和 [MuZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/muzero.py) 在具有不同随机性程度的[2048环境](https://github.com/opendilab/LightZero/blob/main/zoo/game_2048/envs/game_2048_env.py) (num_chances=2/5) 上的基线结果:
237
+ <p align="center">
238
+ <img src="assets/benchmark/main/2048/2048_stochasticmz_mz.png" alt="2048_stochasticmz_mz" width="30%" height="auto" style="margin: 0 1%;">
239
+ <img src="assets/benchmark/main/2048/2048_stochasticmz_mz_nc5.png" alt="mspacman_gmz_ns" width="30%" height="auto" style="margin: 0 1%;">
240
+ </p>
241
+
242
+ - 结合不同的探索机制的 [MuZero w/ SSL](https://github.com/opendilab/LightZero/blob/main/lzero/policy/muzero.py) 在 [MiniGrid 环境](https://github.com/opendilab/LightZero/blob/main/zoo/minigrid/envs/minigrid_lightzero_env.py)上的基线结果:
243
+ <p align="center">
244
+ <img src="assets/benchmark/main/minigrid/keycorridors3r3_exploration.png" alt="keycorridors3r3_exploration" width="30%" height="auto" style="margin: 0 1%;">
245
+ <img src="assets/benchmark/main/minigrid/fourrooms_exploration.png" alt="fourrooms_exploration" width="30%" height="auto" style="margin: 0 1%;">
246
+ </p>
247
+
248
+ </details>
249
+
250
+ ## MCTS 相关笔记
251
+
252
+ ### 论文笔记
253
+
254
+ 以下是 LightZero 中集成算法的中文详细文档:
255
+
256
+ <details open><summary>点击折叠</summary>
257
+
258
+ [AlphaZero](https://github.com/opendilab/LightZero/blob/main/assets/paper_notes/AlphaZero.pdf)
259
+
260
+ [MuZero](https://github.com/opendilab/LightZero/blob/main/assets/paper_notes/MuZero.pdf)
261
+
262
+ [EfficientZero](https://github.com/opendilab/LightZero/blob/main/assets/paper_notes/EfficientZero.pdf)
263
+
264
+ [SampledMuZero](https://github.com/opendilab/LightZero/blob/main/assets/paper_notes/SampledMuZero.pdf)
265
+
266
+ [GumbelMuZero](https://github.com/opendilab/LightZero/blob/main/assets/paper_notes/GumbelMuZero.pdf)
267
+
268
+ [StochasticMuZero](https://github.com/opendilab/LightZero/blob/main/assets/paper_notes/StochasticMuZero.pdf)
269
+
270
+ [算法概览图符号表](https://github.com/opendilab/LightZero/blob/main/assets/paper_notes/NotationTable.pdf)
271
+
272
+ </details>
273
+
274
+ ### 算法框架图
275
+
276
+ 以下是 LightZero 中集成算法的框架概览图:
277
+
278
+ <details closed>
279
+ <summary>(点击查看更多)</summary>
280
+
281
+ [MCTS](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/mcts_overview.pdf)
282
+
283
+ [AlphaZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/alphazero_overview.pdf)
284
+
285
+ [MuZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/muzero_overview.pdf)
286
+
287
+ [EfficientZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/efficientzero_overview.pdf)
288
+
289
+ [SampledMuZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/sampled_muzero_overview.pdf)
290
+
291
+ [GumbelMuZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/gumbel_muzero_overview.pdf)
292
+
293
+ </details>
294
+
295
+ ## MCTS 相关论文
296
+
297
+ 以下是关于 **MCTS** 相关的论文集合,[这一部分](#MCTS-相关论文) 将会持续更新,追踪 MCTS 的前沿动态。
298
+
299
+ ### 重要论文
300
+
301
+ <details closed>
302
+ <summary>(点击查看更多)</summary>
303
+
304
+ #### LightZero Implemented series
305
+
306
+ - [2018 _Science_ AlphaZero: A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play](https://www.science.org/doi/10.1126/science.aar6404)
307
+ - [2019 MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model](https://arxiv.org/abs/1911.08265)
308
+ - [2021 EfficientZero: Mastering Atari Games with Limited Data](https://arxiv.org/abs/2111.00210)
309
+ - [2021 Sampled MuZero: Learning and Planning in Complex Action Spaces](https://arxiv.org/abs/2104.06303)
310
+ - [2022 Stochastic MuZero: Plannig in Stochastic Environments with A Learned Model](https://openreview.net/pdf?id=X6D9bAHhBQ1)
311
+ - [2022 Gumbel MuZero: Policy Improvement by Planning with Gumbel](https://openreview.net/pdf?id=bERaNdoegnO&)
312
+
313
+
314
+ #### AlphaGo series
315
+
316
+ - [2015 _Nature_ AlphaGo Mastering the game of Go with deep neural networks and tree search](https://www.nature.com/articles/nature16961)
317
+ - [2017 _Nature_ AlphaGo Zero Mastering the game of Go without human knowledge](https://www.nature.com/articles/nature24270)
318
+ - [2019 ELF OpenGo: An Analysis and Open Reimplementation of AlphaZero](https://arxiv.org/abs/1902.04522)
319
+ - [Code](https://github.com/pytorch/ELF)
320
+ - [2023 Student of Games: A unified learning algorithm for both perfect and imperfect information games](https://www.science.org/doi/10.1126/sciadv.adg3256)
321
+
322
+ #### MuZero series
323
+ - [2022 Online and Offline Reinforcement Learning by Planning with a Learned Model](https://arxiv.org/abs/2104.06294)
324
+ - [2021 Vector Quantized Models for Planning](https://arxiv.org/abs/2106.04615)
325
+ - [2021 Muesli: Combining Improvements in Policy Optimization. ](https://arxiv.org/abs/2104.06159)
326
+
327
+ #### MCTS Analysis
328
+ - [2020 Monte-Carlo Tree Search as Regularized Policy Optimization](https://arxiv.org/abs/2007.12509)
329
+ - [2021 Self-Consistent Models and Values](https://arxiv.org/abs/2110.12840)
330
+ - [2022 Adversarial Policies Beat Professional-Level Go AIs](https://arxiv.org/abs/2211.00241)
331
+ - [2022 _PNAS_ Acquisition of Chess Knowledge in AlphaZero.](https://arxiv.org/abs/2111.09259)
332
+
333
+ #### MCTS Application
334
+ - [2023 Symbolic Physics Learner: Discovering governing equations via Monte Carlo tree search](https://openreview.net/pdf?id=ZTK3SefE8_Z)
335
+ - [2022 _Nature_ Discovering faster matrix multiplication algorithms with reinforcement learning](https://www.nature.com/articles/s41586-022-05172-4)
336
+ - [Code](https://github.com/deepmind/alphatensor)
337
+ - [2022 MuZero with Self-competition for Rate Control in VP9 Video Compression](https://arxiv.org/abs/2202.06626)
338
+ - [2021 DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning](https://arxiv.org/abs/2106.06135)
339
+ - [2019 Combining Planning and Deep Reinforcement Learning in Tactical Decision Making for Autonomous Driving](https://arxiv.org/pdf/1905.02680.pdf)
340
+
341
+ </details>
342
+
343
+ ### 其他论文
344
+
345
+ <details closed>
346
+ <summary>(点击查看更多)</summary>
347
+
348
+ #### ICML
349
+ - [Scalable Safe Policy Improvement via Monte Carlo Tree Search](https://openreview.net/pdf?id=tevbBSzSfK) 2023
350
+ - Alberto Castellini, Federico Bianchi, Edoardo Zorzi, Thiago D. Simão, Alessandro Farinelli, Matthijs T. J. Spaan
351
+ - Key: safe policy improvement online using a MCTS based strategy, Safe Policy Improvement with Baseline Bootstrapping
352
+ - ExpEnv: Gridworld and SysAdmin
353
+ - [Efficient Learning for AlphaZero via Path Consistency](https://proceedings.mlr.press/v162/zhao22h/zhao22h.pdf) 2022
354
+ - Dengwei Zhao, Shikui Tu, Lei Xu
355
+ - Key: limited amount of self-plays, path consistency (PC) optimality
356
+ - ExpEnv: Go, Othello, Gomoku
357
+ - [Visualizing MuZero Models](https://arxiv.org/abs/2102.12924) 2021
358
+ - Joery A. de Vries, Ken S. Voskuil, Thomas M. Moerland, Aske Plaat
359
+ - Key: visualizing the value equivalent dynamics model, action trajectories diverge, two regularization techniques
360
+ - ExpEnv: CartPole and MountainCar.
361
+ and internal state transition dynamics,
362
+ - [Convex Regularization in Monte-Carlo Tree Search](https://arxiv.org/pdf/2007.00391.pdf) 2021
363
+ - Tuan Dam, Carlo D'Eramo, Jan Peters, Joni Pajarinen
364
+ - Key: entropy-regularization backup operators, regret analysis, Tsallis etropy,
365
+ - ExpEnv: synthetic tree, Atari
366
+ - [Information Particle Filter Tree: An Online Algorithm for POMDPs with Belief-Based Rewards on Continuous Domains](http://proceedings.mlr.press/v119/fischer20a/fischer20a.pdf) 2020
367
+ - Johannes Fischer, Ömer Sahin Tas
368
+ - Key: Continuous POMDP, Particle Filter Tree, information-based reward shaping, Information Gathering.
369
+ - ExpEnv: POMDPs.jl framework
370
+ - [Code](https://github.com/johannes-fischer/icml2020_ipft)
371
+ - [Retro*: Learning Retrosynthetic Planning with Neural Guided A* Search](http://proceedings.mlr.press/v119/chen20k/chen20k.pdf) 2020
372
+ - Binghong Chen, Chengtao Li, Hanjun Dai, Le Song
373
+ - Key: chemical retrosynthetic planning, neural-based A*-like algorithm, ANDOR tree
374
+ - ExpEnv: USPTO datasets
375
+ - [Code](https://github.com/binghong-ml/retro_star)
376
+ #### ICLR
377
+ - [Become a Proficient Player with Limited Data through Watching Pure Videos](https://openreview.net/pdf?id=Sy-o2N0hF4f) 2023
378
+ - Weirui Ye, Yunsheng Zhang, Pieter Abbeel, Yang Gao
379
+ - Key: pre-training from action-free videos, forward-inverse cycle consistency (FICC) objective based on vector quantization, pre-training phase, fine-tuning phase.
380
+ - ExpEnv: Atari
381
+ - [Policy-Based Self-Competition for Planning Problems](https://arxiv.org/abs/2306.04403) 2023
382
+ - Jonathan Pirnay, Quirin Göttl, Jakob Burger, Dominik Gerhard Grimm
383
+ - Key: self-competition, find strong trajectories by planning against possible strategies of its past self.
384
+ - ExpEnv: Traveling Salesman Problem and the Job-Shop Scheduling Problem.
385
+ - [Explaining Temporal Graph Models through an Explorer-Navigator Framework](https://openreview.net/pdf?id=BR_ZhvcYbGJ) 2023
386
+ - Wenwen Xia, Mincai Lai, Caihua Shan, Yao Zhang, Xinnan Dai, Xiang Li, Dongsheng Li
387
+ - Key: Temporal GNN Explainer, an explorer to find the event subsets with MCTS, a navigator that learns the correlations between events and helps reduce the search space.
388
+ - ExpEnv: Wikipedia and Reddit, Synthetic datasets
389
+ - [SpeedyZero: Mastering Atari with Limited Data and Time](https://openreview.net/pdf?id=Mg5CLXZgvLJ) 2023
390
+ - Yixuan Mei, Jiaxuan Gao, Weirui Ye, Shaohuai Liu, Yang Gao, Yi Wu
391
+ - Key: distributed RL system, Priority Refresh, Clipped LARS
392
+ - ExpEnv: Atari
393
+ - [Efficient Offline Policy Optimization with a Learned Model](https://openreview.net/pdf?id=Yt-yM-JbYFO) 2023
394
+ - Zichen Liu, Siyi Li, Wee Sun Lee, Shuicheng YAN, Zhongwen Xu
395
+ - Key: Regularized One-Step Model-based algorithm for Offline-RL
396
+ - ExpEnv: Atari,BSuite
397
+ - [Code](https://github.com/sail-sg/rosmo/tree/main)
398
+ - [Enabling Arbitrary Translation Objectives with Adaptive Tree Search](https://arxiv.org/pdf/2202.11444.pdf) 2022
399
+ - Wang Ling, Wojciech Stokowiec, Domenic Donato, Chris Dyer, Lei Yu, Laurent Sartran, Austin Matthews
400
+ - Key: adaptive tree search, translation models, autoregressive models,
401
+ - ExpEnv: Chinese–English and Pashto–English tasks from WMT2020, German–English from WMT2014
402
+ - [What's Wrong with Deep Learning in Tree Search for Combinatorial Optimization](https://arxiv.org/abs/2201.10494) 2022
403
+ - Maximili1an Böther, Otto Kißig, Martin Taraz, Sarel Cohen, Karen Seidel, Tobias Friedrich
404
+ - Key: Combinatorial optimization, open-source benchmark suite for the NP-hard MAXIMUM INDEPENDENT SET problem, an in-depth analysis of the popular guided tree search algorithm, compare the tree search implementations to other solvers
405
+ - ExpEnv: NP-hard MAXIMUM INDEPENDENT SET.
406
+ - [Code](https://github.com/maxiboether/mis-benchmark-framework)
407
+ - [Monte-Carlo Planning and Learning with Language Action Value Estimates](https://openreview.net/pdf?id=7_G8JySGecm) 2021
408
+ - Youngsoo Jang, Seokin Seo, Jongmin Lee, Kee-Eung Kim
409
+ - Key: Monte-Carlo tree search with language-driven exploration, locally optimistic language value estimates,
410
+ - ExpEnv: Interactive Fiction (IF) games
411
+ - [Practical Massively Parallel Monte-Carlo Tree Search Applied to Molecular Design](https://arxiv.org/abs/2006.10504) 2021
412
+ - Xiufeng Yang, Tanuj Kr Aasawat, Kazuki Yoshizoe
413
+ - Key: massively parallel Monte-Carlo Tree Search, molecular design, Hash-driven parallel search,
414
+ - ExpEnv: octanol-water partition coefficient (logP) penalized by the synthetic accessibility (SA) and large Ring Penalty score.
415
+ - [Watch the Unobserved: A Simple Approach to Parallelizing Monte Carlo Tree Search](https://arxiv.org/pdf/1810.11755.pdf) 2020
416
+ - Anji Liu, Jianshu Chen, Mingze Yu, Yu Zhai, Xuewen Zhou, Ji Liu
417
+ - Key: parallel Monte-Carlo Tree Search, partition the tree into sub-trees efficiently, compare the observation ratio of each processor
418
+ - ExpEnv: speedup and performance comparison on JOY-CITY game, average episode return on atari game
419
+ - [Code](https://github.com/liuanji/WU-UCT)
420
+ - [Learning to Plan in High Dimensions via Neural Exploration-Exploitation Trees](https://openreview.net/pdf?id=rJgJDAVKvB) 2020
421
+ - Binghong Chen, Bo Dai, Qinjie Lin, Guo Ye, Han Liu, Le Song
422
+ - Key: meta path planning algorithm, exploits a novel neural architecture which can learn promising search directions from problem structures.
423
+ - ExpEnv: a 2d workspace with a 2 DoF (degrees of freedom) point robot, a 3 DoF stick robot and a 5 DoF snake robot
424
+ #### NeurIPS
425
+
426
+ - [LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios](https://openreview.net/pdf?id=oIUXpBnyjv) 2023
427
+ - Yazhe Niu, Yuan Pu, Zhenjie Yang, Xueyan Li, Tong Zhou, Jiyuan Ren, Shuai Hu, Hongsheng Li, Yu Liu
428
+ - Key: the first unified benchmark for deploying MCTS/MuZero in general sequential decision scenarios.
429
+ - ExpEnv: ClassicControl, Box2D, Atari, MuJoCo, GoBigger, MiniGrid, TicTacToe, ConnectFour, Gomoku, 2048, etc.
430
+ - [Large Language Models as Commonsense Knowledge for Large-Scale Task Planning](https://openreview.net/pdf?id=Wjp1AYB8lH) 2023
431
+ - Zirui Zhao, Wee Sun Lee, David Hsu
432
+ - Key: world model (LLM) and the LLM-induced policy can be combined in MCTS, to scale up task planning.
433
+ - ExpEnv: multiplication, travel planning, object rearrangement
434
+ - [Monte Carlo Tree Search with Boltzmann Exploration](https://openreview.net/pdf?id=NG4DaApavi) 2023
435
+ - Michael Painter, Mohamed Baioumy, Nick Hawes, Bruno Lacerda
436
+ - Key: Boltzmann exploration with MCTS, optimal actions for the maximum entropy objective do not necessarily correspond to optimal actions for the original objective, two improved algorithms.
437
+ - ExpEnv: the Frozen Lake environment, the Sailing Problem, Go
438
+ - [Generalized Weighted Path Consistency for Mastering Atari Games](https://openreview.net/pdf?id=vHRLS8HhK1) 2023
439
+ - Dengwei Zhao, Shikui Tu, Lei Xu
440
+ - Key: Generalized Weighted Path Consistency, A weighting mechanism.
441
+ - ExpEnv: Atari
442
+ - [Accelerating Monte Carlo Tree Search with Probability Tree State Abstraction](https://openreview.net/pdf?id=0zeLTZAqaJ) 2023
443
+ - Yangqing Fu, Ming Sun, Buqing Nie, Yue Gao
444
+ - Key: probability tree state abstraction, transitivity and aggregation error bound
445
+ - ExpEnv: Atari, CartPole, LunarLander, Gomoku
446
+ - [Planning for Sample Efficient Imitation Learning](https://openreview.net/forum?id=BkN5UoAqF7) 2022
447
+ - Zhao-Heng Yin, Weirui Ye, Qifeng Chen, Yang Gao
448
+ - Key: Behavioral Cloning,Adversarial Imitation Learning (AIL),MCTS-based RL,
449
+ - ExpEnv: DeepMind Control Suite
450
+ - [Code](https://github.com/zhaohengyin/EfficientImitate)
451
+ - [Evaluation Beyond Task Performance: Analyzing Concepts in AlphaZero in Hex](https://openreview.net/pdf?id=dwKwB2Cd-Km) 2022
452
+ - Charles Lovering, Jessica Zosa Forde, George Konidaris, Ellie Pavlick, Michael L. Littman
453
+ - Key: AlphaZero’s internal representations, model probing and behavioral tests, how these concepts are captured in the network.
454
+ - ExpEnv: Hex
455
+ - [Are AlphaZero-like Agents Robust to Adversarial Perturbations?](https://openreview.net/pdf?id=yZ_JlZaOCzv) 2022
456
+ - Li-Cheng Lan, Huan Zhang, Ti-Rong Wu, Meng-Yu Tsai, I-Chen Wu, 4 Cho-Jui Hsieh
457
+ - Key: adversarial states, first adversarial attack on Go AIs
458
+ - ExpEnv: Go
459
+ - [Monte Carlo Tree Descent for Black-Box Optimization](https://openreview.net/pdf?id=FzdmrTUyZ4g) 2022
460
+ - Yaoguang Zhai, Sicun Gao
461
+ - Key: Black-Box Optimization, how to further integrate samplebased descent for faster optimization.
462
+ - ExpEnv: synthetic functions for nonlinear optimization, reinforcement learning problems in MuJoCo locomotion environments, and optimization problems in Neural Architecture Search (NAS).
463
+ - [Monte Carlo Tree Search based Variable Selection for High Dimensional Bayesian Optimization](https://openreview.net/pdf?id=SUzPos_pUC) 2022
464
+ - Lei Song∗ , Ke Xue∗ , Xiaobin Huang, Chao Qian
465
+ - Key: a low-dimensional subspace via MCTS, optimizes in the subspace with any Bayesian optimization algorithm.
466
+ - ExpEnv: NAS-bench problems and MuJoCo locomotion
467
+ - [Monte Carlo Tree Search With Iteratively Refining State Abstractions](https://proceedings.neurips.cc/paper/2021/file/9b0ead00a217ea2c12e06a72eec4923f-Paper.pdf) 2021
468
+ - Samuel Sokota, Caleb Ho, Zaheen Ahmad, J. Zico Kolter
469
+ - Key: stochastic environments, Progressive widening, abstraction refining,
470
+ - ExpEnv: Blackjack, Trap, five by five Go.
471
+ - [Deep Synoptic Monte Carlo Planning in Reconnaissance Blind Chess](https://proceedings.neurips.cc/paper/2021/file/215a71a12769b056c3c32e7299f1c5ed-Paper.pdf) 2021
472
+ - Gregory Clark
473
+ - Key: imperfect information, belief state with an unweighted particle filter, a novel stochastic abstraction of information states.
474
+ - ExpEnv: reconnaissance blind chess
475
+ - [POLY-HOOT: Monte-Carlo Planning in Continuous Space MDPs with Non-Asymptotic Analysis](https://proceedings.neurips.cc/paper/2020/file/30de24287a6d8f07b37c716ad51623a7-Paper.pdf) 2020
476
+ - Weichao Mao, Kaiqing Zhang, Qiaomin Xie, Tamer Ba¸sar
477
+ - Key: continuous state-action spaces, Hierarchical Optimistic Optimization,
478
+ - ExpEnv: CartPole, Inverted Pendulum, Swing-up, and LunarLander.
479
+ - [Learning Search Space Partition for Black-box Optimization using Monte Carlo Tree Search](https://proceedings.neurips.cc/paper/2020/file/e2ce14e81dba66dbff9cbc35ecfdb704-Paper.pdf) 2020
480
+ - Linnan Wang, Rodrigo Fonseca, Yuandong Tian
481
+ - Key: learns the partition of the search space using a few samples, a nonlinear decision boundary and learns a local model to pick good candidates.
482
+ - ExpEnv: MuJoCo locomotion tasks, Small-scale Benchmarks,
483
+ - [Mix and Match: An Optimistic Tree-Search Approach for Learning Models from Mixture Distributions](https://arxiv.org/abs/1907.10154) 2020
484
+ - Matthew Faw, Rajat Sen, Karthikeyan Shanmugam, Constantine Caramanis, Sanjay Shakkottai
485
+ - Key: covariate shift problem, Mix&Match combines stochastic gradient descent (SGD) with optimistic tree search and model re-use (evolving partially trained models with samples from different mixture distributions)
486
+ - [Code](https://github.com/matthewfaw/mixnmatch)
487
+
488
+ #### Other Conference or Journal
489
+ - [On Monte Carlo Tree Search and Reinforcement Learning](https://www.jair.org/index.php/jair/article/download/11099/26289/20632) Journal of Artificial Intelligence Research 2017.
490
+ - [Sample-Efficient Neural Architecture Search by Learning Actions for Monte Carlo Tree Search](https://arxiv.org/pdf/1906.06832) IEEE Transactions on Pattern Analysis and Machine Intelligence 2022.
491
+ </details>
492
+
493
+ ## 反馈意见和贡献
494
+ - 有任何疑问或意见都可以在 github 上直接 [提出 issue](https://github.com/opendilab/LightZero/issues/new/choose)
495
+ - 或者联系我们的邮箱 (opendilab@pjlab.org.cn)
496
+
497
+ - 感谢所有的反馈意见,包括对算法和系统设计。这些反馈意见和建议都会让 LightZero 变得更好。
498
+
499
+
500
+ ## 引用
501
+
502
+ ```latex
503
+ @misc{lightzero,
504
+ title={LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios},
505
+ author={Yazhe Niu and Yuan Pu and Zhenjie Yang and Xueyan Li and Tong Zhou and Jiyuan Ren and Shuai Hu and Hongsheng Li and Yu Liu},
506
+ year={2023},
507
+ eprint={2310.08348},
508
+ archivePrefix={arXiv},
509
+ primaryClass={cs.LG}
510
+ }
511
+ ```
512
+
513
+ ## 致谢
514
+ 此算法库的实现部分基于以下 GitHub 仓库,非常感谢这些开创性工作:
515
+ - https://github.com/opendilab/DI-engine
516
+ - https://github.com/deepmind/mctx
517
+ - https://github.com/YeWR/EfficientZero
518
+ - https://github.com/werner-duvaud/muzero-general
519
+
520
+ 特别感谢以下贡献者 [@PaParaZz1](https://github.com/PaParaZz1), [@karroyan](https://github.com/karroyan), [@nighood](https://github.com/nighood),
521
+ [@jayyoung0802](https://github.com/jayyoung0802), [@timothijoe](https://github.com/timothijoe), [@TuTuHuss](https://github.com/TuTuHuss), [@HarryXuancy](https://github.com/HarryXuancy), [@puyuan1996](https://github.com/puyuan1996), [@HansBug](https://github.com/HansBug) 对本项目的贡献和支持。
522
+
523
+ 感谢所有为此项目做出贡献的人:
524
+ <a href="https://github.com/opendilab/LightZero/graphs/contributors">
525
+ <img src="https://contrib.rocks/image?repo=opendilab/LightZero" />
526
+ </a>
527
+
528
+ ## 许可证
529
+
530
+ 本仓库中的所有代码都符合 [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)。
531
+
532
+ <p align="right">(<a href="#top">回到顶部</a>)</p>
533
+
documents/state_of_the_union.txt ADDED
@@ -0,0 +1,723 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.
2
+
3
+ Last year COVID-19 kept us apart. This year we are finally together again.
4
+
5
+ Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans.
6
+
7
+ With a duty to one another to the American people to the Constitution.
8
+
9
+ And with an unwavering resolve that freedom will always triumph over tyranny.
10
+
11
+ Six days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated.
12
+
13
+ He thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined.
14
+
15
+ He met the Ukrainian people.
16
+
17
+ From President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.
18
+
19
+ Groups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland.
20
+
21
+ In this struggle as President Zelenskyy said in his speech to the European Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United States is here tonight.
22
+
23
+ Let each of us here tonight in this Chamber send an unmistakable signal to Ukraine and to the world.
24
+
25
+ Please rise if you are able and show that, Yes, we the United States of America stand with the Ukrainian people.
26
+
27
+ Throughout our history we’ve learned this lesson when dictators do not pay a price for their aggression they cause more chaos.
28
+
29
+ They keep moving.
30
+
31
+ And the costs and the threats to America and the world keep rising.
32
+
33
+ That’s why the NATO Alliance was created to secure peace and stability in Europe after World War 2.
34
+
35
+ The United States is a member along with 29 other nations.
36
+
37
+ It matters. American diplomacy matters. American resolve matters.
38
+
39
+ Putin’s latest attack on Ukraine was premeditated and unprovoked.
40
+
41
+ He rejected repeated efforts at diplomacy.
42
+
43
+ He thought the West and NATO wouldn’t respond. And he thought he could divide us at home. Putin was wrong. We were ready. Here is what we did.
44
+
45
+ We prepared extensively and carefully.
46
+
47
+ We spent months building a coalition of other freedom-loving nations from Europe and the Americas to Asia and Africa to confront Putin.
48
+
49
+ I spent countless hours unifying our European allies. We shared with the world in advance what we knew Putin was planning and precisely how he would try to falsely justify his aggression.
50
+
51
+ We countered Russia’s lies with truth.
52
+
53
+ And now that he has acted the free world is holding him accountable.
54
+
55
+ Along with twenty-seven members of the European Union including France, Germany, Italy, as well as countries like the United Kingdom, Canada, Japan, Korea, Australia, New Zealand, and many others, even Switzerland.
56
+
57
+ We are inflicting pain on Russia and supporting the people of Ukraine. Putin is now isolated from the world more than ever.
58
+
59
+ Together with our allies –we are right now enforcing powerful economic sanctions.
60
+
61
+ We are cutting off Russia’s largest banks from the international financial system.
62
+
63
+ Preventing Russia’s central bank from defending the Russian Ruble making Putin’s $630 Billion “war fund” worthless.
64
+
65
+ We are choking off Russia’s access to technology that will sap its economic strength and weaken its military for years to come.
66
+
67
+ Tonight I say to the Russian oligarchs and corrupt leaders who have bilked billions of dollars off this violent regime no more.
68
+
69
+ The U.S. Department of Justice is assembling a dedicated task force to go after the crimes of Russian oligarchs.
70
+
71
+ We are joining with our European allies to find and seize your yachts your luxury apartments your private jets. We are coming for your ill-begotten gains.
72
+
73
+ And tonight I am announcing that we will join our allies in closing off American air space to all Russian flights – further isolating Russia – and adding an additional squeeze –on their economy. The Ruble has lost 30% of its value.
74
+
75
+ The Russian stock market has lost 40% of its value and trading remains suspended. Russia’s economy is reeling and Putin alone is to blame.
76
+
77
+ Together with our allies we are providing support to the Ukrainians in their fight for freedom. Military assistance. Economic assistance. Humanitarian assistance.
78
+
79
+ We are giving more than $1 Billion in direct assistance to Ukraine.
80
+
81
+ And we will continue to aid the Ukrainian people as they defend their country and to help ease their suffering.
82
+
83
+ Let me be clear, our forces are not engaged and will not engage in conflict with Russian forces in Ukraine.
84
+
85
+ Our forces are not going to Europe to fight in Ukraine, but to defend our NATO Allies – in the event that Putin decides to keep moving west.
86
+
87
+ For that purpose we’ve mobilized American ground forces, air squadrons, and ship deployments to protect NATO countries including Poland, Romania, Latvia, Lithuania, and Estonia.
88
+
89
+ As I have made crystal clear the United States and our Allies will defend every inch of territory of NATO countries with the full force of our collective power.
90
+
91
+ And we remain clear-eyed. The Ukrainians are fighting back with pure courage. But the next few days weeks, months, will be hard on them.
92
+
93
+ Putin has unleashed violence and chaos. But while he may make gains on the battlefield – he will pay a continuing high price over the long run.
94
+
95
+ And a proud Ukrainian people, who have known 30 years of independence, have repeatedly shown that they will not tolerate anyone who tries to take their country backwards.
96
+
97
+ To all Americans, I will be honest with you, as I’ve always promised. A Russian dictator, invading a foreign country, has costs around the world.
98
+
99
+ And I’m taking robust action to make sure the pain of our sanctions is targeted at Russia’s economy. And I will use every tool at our disposal to protect American businesses and consumers.
100
+
101
+ Tonight, I can announce that the United States has worked with 30 other countries to release 60 Million barrels of oil from reserves around the world.
102
+
103
+ America will lead that effort, releasing 30 Million barrels from our own Strategic Petroleum Reserve. And we stand ready to do more if necessary, unified with our allies.
104
+
105
+ These steps will help blunt gas prices here at home. And I know the news about what’s happening can seem alarming.
106
+
107
+ But I want you to know that we are going to be okay.
108
+
109
+ When the history of this era is written Putin’s war on Ukraine will have left Russia weaker and the rest of the world stronger.
110
+
111
+ While it shouldn’t have taken something so terrible for people around the world to see what’s at stake now everyone sees it clearly.
112
+
113
+ We see the unity among leaders of nations and a more unified Europe a more unified West. And we see unity among the people who are gathering in cities in large crowds around the world even in Russia to demonstrate their support for Ukraine.
114
+
115
+ In the battle between democracy and autocracy, democracies are rising to the moment, and the world is clearly choosing the side of peace and security.
116
+
117
+ This is a real test. It’s going to take time. So let us continue to draw inspiration from the iron will of the Ukrainian people.
118
+
119
+ To our fellow Ukrainian Americans who forge a deep bond that connects our two nations we stand with you.
120
+
121
+ Putin may circle Kyiv with tanks, but he will never gain the hearts and souls of the Ukrainian people.
122
+
123
+ He will never extinguish their love of freedom. He will never weaken the resolve of the free world.
124
+
125
+ We meet tonight in an America that has lived through two of the hardest years this nation has ever faced.
126
+
127
+ The pandemic has been punishing.
128
+
129
+ And so many families are living paycheck to paycheck, struggling to keep up with the rising cost of food, gas, housing, and so much more.
130
+
131
+ I understand.
132
+
133
+ I remember when my Dad had to leave our home in Scranton, Pennsylvania to find work. I grew up in a family where if the price of food went up, you felt it.
134
+
135
+ That’s why one of the first things I did as President was fight to pass the American Rescue Plan.
136
+
137
+ Because people were hurting. We needed to act, and we did.
138
+
139
+ Few pieces of legislation have done more in a critical moment in our history to lift us out of crisis.
140
+
141
+ It fueled our efforts to vaccinate the nation and combat COVID-19. It delivered immediate economic relief for tens of millions of Americans.
142
+
143
+ Helped put food on their table, keep a roof over their heads, and cut the cost of health insurance.
144
+
145
+ And as my Dad used to say, it gave people a little breathing room.
146
+
147
+ And unlike the $2 Trillion tax cut passed in the previous administration that benefitted the top 1% of Americans, the American Rescue Plan helped working people—and left no one behind.
148
+
149
+ And it worked. It created jobs. Lots of jobs.
150
+
151
+ In fact—our economy created over 6.5 Million new jobs just last year, more jobs created in one year
152
+ than ever before in the history of America.
153
+
154
+ Our economy grew at a rate of 5.7% last year, the strongest growth in nearly 40 years, the first step in bringing fundamental change to an economy that hasn’t worked for the working people of this nation for too long.
155
+
156
+ For the past 40 years we were told that if we gave tax breaks to those at the very top, the benefits would trickle down to everyone else.
157
+
158
+ But that trickle-down theory led to weaker economic growth, lower wages, bigger deficits, and the widest gap between those at the top and everyone else in nearly a century.
159
+
160
+ Vice President Harris and I ran for office with a new economic vision for America.
161
+
162
+ Invest in America. Educate Americans. Grow the workforce. Build the economy from the bottom up
163
+ and the middle out, not from the top down.
164
+
165
+ Because we know that when the middle class grows, the poor have a ladder up and the wealthy do very well.
166
+
167
+ America used to have the best roads, bridges, and airports on Earth.
168
+
169
+ Now our infrastructure is ranked 13th in the world.
170
+
171
+ We won’t be able to compete for the jobs of the 21st Century if we don’t fix that.
172
+
173
+ That’s why it was so important to pass the Bipartisan Infrastructure Law—the most sweeping investment to rebuild America in history.
174
+
175
+ This was a bipartisan effort, and I want to thank the members of both parties who worked to make it happen.
176
+
177
+ We’re done talking about infrastructure weeks.
178
+
179
+ We’re going to have an infrastructure decade.
180
+
181
+ It is going to transform America and put us on a path to win the economic competition of the 21st Century that we face with the rest of the world—particularly with China.
182
+
183
+ As I’ve told Xi Jinping, it is never a good bet to bet against the American people.
184
+
185
+ We’ll create good jobs for millions of Americans, modernizing roads, airports, ports, and waterways all across America.
186
+
187
+ And we’ll do it all to withstand the devastating effects of the climate crisis and promote environmental justice.
188
+
189
+ We’ll build a national network of 500,000 electric vehicle charging stations, begin to replace poisonous lead pipes—so every child—and every American—has clean water to drink at home and at school, provide affordable high-speed internet for every American—urban, suburban, rural, and tribal communities.
190
+
191
+ 4,000 projects have already been announced.
192
+
193
+ And tonight, I’m announcing that this year we will start fixing over 65,000 miles of highway and 1,500 bridges in disrepair.
194
+
195
+ When we use taxpayer dollars to rebuild America – we are going to Buy American: buy American products to support American jobs.
196
+
197
+ The federal government spends about $600 Billion a year to keep the country safe and secure.
198
+
199
+ There’s been a law on the books for almost a century
200
+ to make sure taxpayers’ dollars support American jobs and businesses.
201
+
202
+ Every Administration says they’ll do it, but we are actually doing it.
203
+
204
+ We will buy American to make sure everything from the deck of an aircraft carrier to the steel on highway guardrails are made in America.
205
+
206
+ But to compete for the best jobs of the future, we also need to level the playing field with China and other competitors.
207
+
208
+ That’s why it is so important to pass the Bipartisan Innovation Act sitting in Congress that will make record investments in emerging technologies and American manufacturing.
209
+
210
+ Let me give you one example of why it’s so important to pass it.
211
+
212
+ If you travel 20 miles east of Columbus, Ohio, you’ll find 1,000 empty acres of land.
213
+
214
+ It won’t look like much, but if you stop and look closely, you’ll see a “Field of dreams,” the ground on which America’s future will be built.
215
+
216
+ This is where Intel, the American company that helped build Silicon Valley, is going to build its $20 billion semiconductor “mega site”.
217
+
218
+ Up to eight state-of-the-art factories in one place. 10,000 new good-paying jobs.
219
+
220
+ Some of the most sophisticated manufacturing in the world to make computer chips the size of a fingertip that power the world and our everyday lives.
221
+
222
+ Smartphones. The Internet. Technology we have yet to invent.
223
+
224
+ But that’s just the beginning.
225
+
226
+ Intel’s CEO, Pat Gelsinger, who is here tonight, told me they are ready to increase their investment from
227
+ $20 billion to $100 billion.
228
+
229
+ That would be one of the biggest investments in manufacturing in American history.
230
+
231
+ And all they’re waiting for is for you to pass this bill.
232
+
233
+ So let’s not wait any longer. Send it to my desk. I’ll sign it.
234
+
235
+ And we will really take off.
236
+
237
+ And Intel is not alone.
238
+
239
+ There’s something happening in America.
240
+
241
+ Just look around and you’ll see an amazing story.
242
+
243
+ The rebirth of the pride that comes from stamping products “Made In America.” The revitalization of American manufacturing.
244
+
245
+ Companies are choosing to build new factories here, when just a few years ago, they would have built them overseas.
246
+
247
+ That’s what is happening. Ford is investing $11 billion to build electric vehicles, creating 11,000 jobs across the country.
248
+
249
+ GM is making the largest investment in its history—$7 billion to build electric vehicles, creating 4,000 jobs in Michigan.
250
+
251
+ All told, we created 369,000 new manufacturing jobs in America just last year.
252
+
253
+ Powered by people I’ve met like JoJo Burgess, from generations of union steelworkers from Pittsburgh, who’s here with us tonight.
254
+
255
+ As Ohio Senator Sherrod Brown says, “It’s time to bury the label “Rust Belt.”
256
+
257
+ It’s time.
258
+
259
+ But with all the bright spots in our economy, record job growth and higher wages, too many families are struggling to keep up with the bills.
260
+
261
+ Inflation is robbing them of the gains they might otherwise feel.
262
+
263
+ I get it. That’s why my top priority is getting prices under control.
264
+
265
+ Look, our economy roared back faster than most predicted, but the pandemic meant that businesses had a hard time hiring enough workers to keep up production in their factories.
266
+
267
+ The pandemic also disrupted global supply chains.
268
+
269
+ When factories close, it takes longer to make goods and get them from the warehouse to the store, and prices go up.
270
+
271
+ Look at cars.
272
+
273
+ Last year, there weren’t enough semiconductors to make all the cars that people wanted to buy.
274
+
275
+ And guess what, prices of automobiles went up.
276
+
277
+ So—we have a choice.
278
+
279
+ One way to fight inflation is to drive down wages and make Americans poorer.
280
+
281
+ I have a better plan to fight inflation.
282
+
283
+ Lower your costs, not your wages.
284
+
285
+ Make more cars and semiconductors in America.
286
+
287
+ More infrastructure and innovation in America.
288
+
289
+ More goods moving faster and cheaper in America.
290
+
291
+ More jobs where you can earn a good living in America.
292
+
293
+ And instead of relying on foreign supply chains, let’s make it in America.
294
+
295
+ Economists call it “increasing the productive capacity of our economy.”
296
+
297
+ I call it building a better America.
298
+
299
+ My plan to fight inflation will lower your costs and lower the deficit.
300
+
301
+ 17 Nobel laureates in economics say my plan will ease long-term inflationary pressures. Top business leaders and most Americans support my plan. And here’s the plan:
302
+
303
+ First – cut the cost of prescription drugs. Just look at insulin. One in ten Americans has diabetes. In Virginia, I met a 13-year-old boy named Joshua Davis.
304
+
305
+ He and his Dad both have Type 1 diabetes, which means they need insulin every day. Insulin costs about $10 a vial to make.
306
+
307
+ But drug companies charge families like Joshua and his Dad up to 30 times more. I spoke with Joshua’s mom.
308
+
309
+ Imagine what it’s like to look at your child who needs insulin and have no idea how you’re going to pay for it.
310
+
311
+ What it does to your dignity, your ability to look your child in the eye, to be the parent you expect to be.
312
+
313
+ Joshua is here with us tonight. Yesterday was his birthday. Happy birthday, buddy.
314
+
315
+ For Joshua, and for the 200,000 other young people with Type 1 diabetes, let’s cap the cost of insulin at $35 a month so everyone can afford it.
316
+
317
+ Drug companies will still do very well. And while we’re at it let Medicare negotiate lower prices for prescription drugs, like the VA already does.
318
+
319
+ Look, the American Rescue Plan is helping millions of families on Affordable Care Act plans save $2,400 a year on their health care premiums. Let’s close the coverage gap and make those savings permanent.
320
+
321
+ Second – cut energy costs for families an average of $500 a year by combatting climate change.
322
+
323
+ Let’s provide investments and tax credits to weatherize your homes and businesses to be energy efficient and you get a tax credit; double America’s clean energy production in solar, wind, and so much more; lower the price of electric vehicles, saving you another $80 a month because you’ll never have to pay at the gas pump again.
324
+
325
+ Third – cut the cost of child care. Many families pay up to $14,000 a year for child care per child.
326
+
327
+ Middle-class and working families shouldn’t have to pay more than 7% of their income for care of young children.
328
+
329
+ My plan will cut the cost in half for most families and help parents, including millions of women, who left the workforce during the pandemic because they couldn’t afford child care, to be able to get back to work.
330
+
331
+ My plan doesn’t stop there. It also includes home and long-term care. More affordable housing. And Pre-K for every 3- and 4-year-old.
332
+
333
+ All of these will lower costs.
334
+
335
+ And under my plan, nobody earning less than $400,000 a year will pay an additional penny in new taxes. Nobody.
336
+
337
+ The one thing all Americans agree on is that the tax system is not fair. We have to fix it.
338
+
339
+ I’m not looking to punish anyone. But let’s make sure corporations and the wealthiest Americans start paying their fair share.
340
+
341
+ Just last year, 55 Fortune 500 corporations earned $40 billion in profits and paid zero dollars in federal income tax.
342
+
343
+ That’s simply not fair. That’s why I’ve proposed a 15% minimum tax rate for corporations.
344
+
345
+ We got more than 130 countries to agree on a global minimum tax rate so companies can’t get out of paying their taxes at home by shipping jobs and factories overseas.
346
+
347
+ That’s why I’ve proposed closing loopholes so the very wealthy don’t pay a lower tax rate than a teacher or a firefighter.
348
+
349
+ So that’s my plan. It will grow the economy and lower costs for families.
350
+
351
+ So what are we waiting for? Let’s get this done. And while you’re at it, confirm my nominees to the Federal Reserve, which plays a critical role in fighting inflation.
352
+
353
+ My plan will not only lower costs to give families a fair shot, it will lower the deficit.
354
+
355
+ The previous Administration not only ballooned the deficit with tax cuts for the very wealthy and corporations, it undermined the watchdogs whose job was to keep pandemic relief funds from being wasted.
356
+
357
+ But in my administration, the watchdogs have been welcomed back.
358
+
359
+ We’re going after the criminals who stole billions in relief money meant for small businesses and millions of Americans.
360
+
361
+ And tonight, I’m announcing that the Justice Department will name a chief prosecutor for pandemic fraud.
362
+
363
+ By the end of this year, the deficit will be down to less than half what it was before I took office.
364
+
365
+ The only president ever to cut the deficit by more than one trillion dollars in a single year.
366
+
367
+ Lowering your costs also means demanding more competition.
368
+
369
+ I’m a capitalist, but capitalism without competition isn’t capitalism.
370
+
371
+ It’s exploitation—and it drives up prices.
372
+
373
+ When corporations don’t have to compete, their profits go up, your prices go up, and small businesses and family farmers and ranchers go under.
374
+
375
+ We see it happening with ocean carriers moving goods in and out of America.
376
+
377
+ During the pandemic, these foreign-owned companies raised prices by as much as 1,000% and made record profits.
378
+
379
+ Tonight, I’m announcing a crackdown on these companies overcharging American businesses and consumers.
380
+
381
+ And as Wall Street firms take over more nursing homes, quality in those homes has gone down and costs have gone up.
382
+
383
+ That ends on my watch.
384
+
385
+ Medicare is going to set higher standards for nursing homes and make sure your loved ones get the care they deserve and expect.
386
+
387
+ We’ll also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not degrees.
388
+
389
+ Let’s pass the Paycheck Fairness Act and paid leave.
390
+
391
+ Raise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty.
392
+
393
+ Let’s increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls America’s best-kept secret: community colleges.
394
+
395
+ And let’s pass the PRO Act when a majority of workers want to form a union—they shouldn’t be stopped.
396
+
397
+ When we invest in our workers, when we build the economy from the bottom up and the middle out together, we can do something we haven’t done in a long time: build a better America.
398
+
399
+ For more than two years, COVID-19 has impacted every decision in our lives and the life of the nation.
400
+
401
+ And I know you’re tired, frustrated, and exhausted.
402
+
403
+ But I also know this.
404
+
405
+ Because of the progress we’ve made, because of your resilience and the tools we have, tonight I can say
406
+ we are moving forward safely, back to more normal routines.
407
+
408
+ We’ve reached a new moment in the fight against COVID-19, with severe cases down to a level not seen since last July.
409
+
410
+ Just a few days ago, the Centers for Disease Control and Prevention—the CDC—issued new mask guidelines.
411
+
412
+ Under these new guidelines, most Americans in most of the country can now be mask free.
413
+
414
+ And based on the projections, more of the country will reach that point across the next couple of weeks.
415
+
416
+ Thanks to the progress we have made this past year, COVID-19 need no longer control our lives.
417
+
418
+ I know some are talking about “living with COVID-19”. Tonight – I say that we will never just accept living with COVID-19.
419
+
420
+ We will continue to combat the virus as we do other diseases. And because this is a virus that mutates and spreads, we will stay on guard.
421
+
422
+ Here are four common sense steps as we move forward safely.
423
+
424
+ First, stay protected with vaccines and treatments. We know how incredibly effective vaccines are. If you’re vaccinated and boosted you have the highest degree of protection.
425
+
426
+ We will never give up on vaccinating more Americans. Now, I know parents with kids under 5 are eager to see a vaccine authorized for their children.
427
+
428
+ The scientists are working hard to get that done and we’ll be ready with plenty of vaccines when they do.
429
+
430
+ We’re also ready with anti-viral treatments. If you get COVID-19, the Pfizer pill reduces your chances of ending up in the hospital by 90%.
431
+
432
+ We’ve ordered more of these pills than anyone in the world. And Pfizer is working overtime to get us 1 Million pills this month and more than double that next month.
433
+
434
+ And we’re launching the “Test to Treat” initiative so people can get tested at a pharmacy, and if they’re positive, receive antiviral pills on the spot at no cost.
435
+
436
+ If you’re immunocompromised or have some other vulnerability, we have treatments and free high-quality masks.
437
+
438
+ We’re leaving no one behind or ignoring anyone’s needs as we move forward.
439
+
440
+ And on testing, we have made hundreds of millions of tests available for you to order for free.
441
+
442
+ Even if you already ordered free tests tonight, I am announcing that you can order more from covidtests.gov starting next week.
443
+
444
+ Second – we must prepare for new variants. Over the past year, we’ve gotten much better at detecting new variants.
445
+
446
+ If necessary, we’ll be able to deploy new vaccines within 100 days instead of many more months or years.
447
+
448
+ And, if Congress provides the funds we need, we’ll have new stockpiles of tests, masks, and pills ready if needed.
449
+
450
+ I cannot promise a new variant won’t come. But I can promise you we’ll do everything within our power to be ready if it does.
451
+
452
+ Third – we can end the shutdown of schools and businesses. We have the tools we need.
453
+
454
+ It’s time for Americans to get back to work and fill our great downtowns again. People working from home can feel safe to begin to return to the office.
455
+
456
+ We’re doing that here in the federal government. The vast majority of federal workers will once again work in person.
457
+
458
+ Our schools are open. Let’s keep it that way. Our kids need to be in school.
459
+
460
+ And with 75% of adult Americans fully vaccinated and hospitalizations down by 77%, most Americans can remove their masks, return to work, stay in the classroom, and move forward safely.
461
+
462
+ We achieved this because we provided free vaccines, treatments, tests, and masks.
463
+
464
+ Of course, continuing this costs money.
465
+
466
+ I will soon send Congress a request.
467
+
468
+ The vast majority of Americans have used these tools and may want to again, so I expect Congress to pass it quickly.
469
+
470
+ Fourth, we will continue vaccinating the world.
471
+
472
+ We’ve sent 475 Million vaccine doses to 112 countries, more than any other nation.
473
+
474
+ And we won’t stop.
475
+
476
+ We have lost so much to COVID-19. Time with one another. And worst of all, so much loss of life.
477
+
478
+ Let’s use this moment to reset. Let’s stop looking at COVID-19 as a partisan dividing line and see it for what it is: A God-awful disease.
479
+
480
+ Let’s stop seeing each other as enemies, and start seeing each other for who we really are: Fellow Americans.
481
+
482
+ We can’t change how divided we’ve been. But we can change how we move forward—on COVID-19 and other issues we must face together.
483
+
484
+ I recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera.
485
+
486
+ They were responding to a 9-1-1 call when a man shot and killed them with a stolen gun.
487
+
488
+ Officer Mora was 27 years old.
489
+
490
+ Officer Rivera was 22.
491
+
492
+ Both Dominican Americans who’d grown up on the same streets they later chose to patrol as police officers.
493
+
494
+ I spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves.
495
+
496
+ I’ve worked on these issues a long time.
497
+
498
+ I know what works: Investing in crime prevention and community police officers who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and safety.
499
+
500
+ So let’s not abandon our streets. Or choose between safety and equal justice.
501
+
502
+ Let’s come together to protect our communities, restore trust, and hold law enforcement accountable.
503
+
504
+ That’s why the Justice Department required body cameras, banned chokeholds, and restricted no-knock warrants for its officers.
505
+
506
+ That’s why the American Rescue Plan provided $350 Billion that cities, states, and counties can use to hire more police and invest in proven strategies like community violence interruption—trusted messengers breaking the cycle of violence and trauma and giving young people hope.
507
+
508
+ We should all agree: The answer is not to Defund the police. The answer is to FUND the police with the resources and training they need to protect our communities.
509
+
510
+ I ask Democrats and Republicans alike: Pass my budget and keep our neighborhoods safe.
511
+
512
+ And I will keep doing everything in my power to crack down on gun trafficking and ghost guns you can buy online and make at home—they have no serial numbers and can’t be traced.
513
+
514
+ And I ask Congress to pass proven measures to reduce gun violence. Pass universal background checks. Why should anyone on a terrorist list be able to purchase a weapon?
515
+
516
+ Ban assault weapons and high-capacity magazines.
517
+
518
+ Repeal the liability shield that makes gun manufacturers the only industry in America that can’t be sued.
519
+
520
+ These laws don’t infringe on the Second Amendment. They save lives.
521
+
522
+ The most fundamental right in America is the right to vote – and to have it counted. And it’s under assault.
523
+
524
+ In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections.
525
+
526
+ We cannot let this happen.
527
+
528
+ Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.
529
+
530
+ Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.
531
+
532
+ One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
533
+
534
+ And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
535
+
536
+ A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.
537
+
538
+ And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system.
539
+
540
+ We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling.
541
+
542
+ We’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers.
543
+
544
+ We’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster.
545
+
546
+ We’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.
547
+
548
+ We can do all this while keeping lit the torch of liberty that has led generations of immigrants to this land—my forefathers and so many of yours.
549
+
550
+ Provide a pathway to citizenship for Dreamers, those on temporary status, farm workers, and essential workers.
551
+
552
+ Revise our laws so businesses have the workers they need and families don’t wait decades to reunite.
553
+
554
+ It’s not only the right thing to do—it’s the economically smart thing to do.
555
+
556
+ That’s why immigration reform is supported by everyone from labor unions to religious leaders to the U.S. Chamber of Commerce.
557
+
558
+ Let’s get it done once and for all.
559
+
560
+ Advancing liberty and justice also requires protecting the rights of women.
561
+
562
+ The constitutional right affirmed in Roe v. Wade—standing precedent for half a century—is under attack as never before.
563
+
564
+ If we want to go forward—not backward—we must protect access to health care. Preserve a woman’s right to choose. And let’s continue to advance maternal health care in America.
565
+
566
+ And for our LGBTQ+ Americans, let’s finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong.
567
+
568
+ As I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential.
569
+
570
+ While it often appears that we never agree, that isn’t true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice.
571
+
572
+ And soon, we’ll strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things.
573
+
574
+ So tonight I’m offering a Unity Agenda for the Nation. Four big things we can do together.
575
+
576
+ First, beat the opioid epidemic.
577
+
578
+ There is so much we can do. Increase funding for prevention, treatment, harm reduction, and recovery.
579
+
580
+ Get rid of outdated rules that stop doctors from prescribing treatments. And stop the flow of illicit drugs by working with state and local law enforcement to go after traffickers.
581
+
582
+ If you’re suffering from addiction, know you are not alone. I believe in recovery, and I celebrate the 23 million Americans in recovery.
583
+
584
+ Second, let’s take on mental health. Especially among our children, whose lives and education have been turned upside down.
585
+
586
+ The American Rescue Plan gave schools money to hire teachers and help students make up for lost learning.
587
+
588
+ I urge every parent to make sure your school does just that. And we can all play a part—sign up to be a tutor or a mentor.
589
+
590
+ Children were also struggling before the pandemic. Bullying, violence, trauma, and the harms of social media.
591
+
592
+ As Frances Haugen, who is here with us tonight, has shown, we must hold social media platforms accountable for the national experiment they’re conducting on our children for profit.
593
+
594
+ It’s time to strengthen privacy protections, ban targeted advertising to children, demand tech companies stop collecting personal data on our children.
595
+
596
+ And let’s get all Americans the mental health services they need. More people they can turn to for help, and full parity between physical and mental health care.
597
+
598
+ Third, support our veterans.
599
+
600
+ Veterans are the best of us.
601
+
602
+ I’ve always believed that we have a sacred obligation to equip all those we send to war and care for them and their families when they come home.
603
+
604
+ My administration is providing assistance with job training and housing, and now helping lower-income veterans get VA care debt-free.
605
+
606
+ Our troops in Iraq and Afghanistan faced many dangers.
607
+
608
+ One was stationed at bases and breathing in toxic smoke from “burn pits” that incinerated wastes of war—medical and hazard material, jet fuel, and more.
609
+
610
+ When they came home, many of the world’s fittest and best trained warriors were never the same.
611
+
612
+ Headaches. Numbness. Dizziness.
613
+
614
+ A cancer that would put them in a flag-draped coffin.
615
+
616
+ I know.
617
+
618
+ One of those soldiers was my son Major Beau Biden.
619
+
620
+ We don’t know for sure if a burn pit was the cause of his brain cancer, or the diseases of so many of our troops.
621
+
622
+ But I’m committed to finding out everything we can.
623
+
624
+ Committed to military families like Danielle Robinson from Ohio.
625
+
626
+ The widow of Sergeant First Class Heath Robinson.
627
+
628
+ He was born a soldier. Army National Guard. Combat medic in Kosovo and Iraq.
629
+
630
+ Stationed near Baghdad, just yards from burn pits the size of football fields.
631
+
632
+ Heath’s widow Danielle is here with us tonight. They loved going to Ohio State football games. He loved building Legos with their daughter.
633
+
634
+ But cancer from prolonged exposure to burn pits ravaged Heath’s lungs and body.
635
+
636
+ Danielle says Heath was a fighter to the very end.
637
+
638
+ He didn’t know how to stop fighting, and neither did she.
639
+
640
+ Through her pain she found purpose to demand we do better.
641
+
642
+ Tonight, Danielle—we are.
643
+
644
+ The VA is pioneering new ways of linking toxic exposures to diseases, already helping more veterans get benefits.
645
+
646
+ And tonight, I’m announcing we’re expanding eligibility to veterans suffering from nine respiratory cancers.
647
+
648
+ I’m also calling on Congress: pass a law to make sure veterans devastated by toxic exposures in Iraq and Afghanistan finally get the benefits and comprehensive health care they deserve.
649
+
650
+ And fourth, let’s end cancer as we know it.
651
+
652
+ This is personal to me and Jill, to Kamala, and to so many of you.
653
+
654
+ Cancer is the #2 cause of death in America–second only to heart disease.
655
+
656
+ Last month, I announced our plan to supercharge
657
+ the Cancer Moonshot that President Obama asked me to lead six years ago.
658
+
659
+ Our goal is to cut the cancer death rate by at least 50% over the next 25 years, turn more cancers from death sentences into treatable diseases.
660
+
661
+ More support for patients and families.
662
+
663
+ To get there, I call on Congress to fund ARPA-H, the Advanced Research Projects Agency for Health.
664
+
665
+ It’s based on DARPA—the Defense Department project that led to the Internet, GPS, and so much more.
666
+
667
+ ARPA-H will have a singular purpose—to drive breakthroughs in cancer, Alzheimer’s, diabetes, and more.
668
+
669
+ A unity agenda for the nation.
670
+
671
+ We can do this.
672
+
673
+ My fellow Americans—tonight , we have gathered in a sacred space—the citadel of our democracy.
674
+
675
+ In this Capitol, generation after generation, Americans have debated great questions amid great strife, and have done great things.
676
+
677
+ We have fought for freedom, expanded liberty, defeated totalitarianism and terror.
678
+
679
+ And built the strongest, freest, and most prosperous nation the world has ever known.
680
+
681
+ Now is the hour.
682
+
683
+ Our moment of responsibility.
684
+
685
+ Our test of resolve and conscience, of history itself.
686
+
687
+ It is in this moment that our character is formed. Our purpose is found. Our future is forged.
688
+
689
+ Well I know this nation.
690
+
691
+ We will meet the test.
692
+
693
+ To protect freedom and liberty, to expand fairness and opportunity.
694
+
695
+ We will save democracy.
696
+
697
+ As hard as these times have been, I am more optimistic about America today than I have been my whole life.
698
+
699
+ Because I see the future that is within our grasp.
700
+
701
+ Because I know there is simply nothing beyond our capacity.
702
+
703
+ We are the only nation on Earth that has always turned every crisis we have faced into an opportunity.
704
+
705
+ The only nation that can be defined by a single word: possibilities.
706
+
707
+ So on this night, in our 245th year as a nation, I have come to report on the State of the Union.
708
+
709
+ And my report is this: the State of the Union is strong—because you, the American people, are strong.
710
+
711
+ We are stronger today than we were a year ago.
712
+
713
+ And we will be stronger a year from now than we are today.
714
+
715
+ Now is our moment to meet and overcome the challenges of our time.
716
+
717
+ And we will, as one people.
718
+
719
+ One America.
720
+
721
+ The United States of America.
722
+
723
+ May God bless you all. May God protect our troops.
rag_demo.py ADDED
@@ -0,0 +1,128 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ 参考博客:https://mp.weixin.qq.com/s/RUdZjQMSlVOfHfhErSNXnA
3
+ """
4
+ # 导入必要的库与模块
5
+ import os
6
+ import textwrap
7
+
8
+ from dotenv import load_dotenv
9
+ from langchain.chat_models import ChatOpenAI
10
+ from langchain.document_loaders import TextLoader
11
+ from langchain.embeddings import OpenAIEmbeddings
12
+ from langchain.prompts import ChatPromptTemplate
13
+ from langchain.schema.output_parser import StrOutputParser
14
+ from langchain.schema.runnable import RunnablePassthrough
15
+ from langchain.text_splitter import CharacterTextSplitter
16
+ from langchain.vectorstores import Weaviate
17
+ from weaviate import Client
18
+ from weaviate.embedded import EmbeddedOptions
19
+
20
+ # 环境设置与文档下载
21
+ load_dotenv() # 加载环境变量
22
+ OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") # 从环境变量获取 OpenAI API 密钥
23
+
24
+ # 确保 OPENAI_API_KEY 被正确设置
25
+ if not OPENAI_API_KEY:
26
+ raise ValueError("OpenAI API Key not found in the environment variables.")
27
+
28
+
29
+ # 文档加载与分割
30
+ def load_and_split_document(file_path, chunk_size=500, chunk_overlap=50):
31
+ """加载文档并分割成小块"""
32
+ loader = TextLoader(file_path)
33
+ documents = loader.load()
34
+ text_splitter = CharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
35
+ chunks = text_splitter.split_documents(documents)
36
+ return chunks
37
+
38
+
39
+ # 向量存储建立
40
+ def create_vector_store(chunks, model="OpenAI"):
41
+ """将文档块转换为向量并存储到 Weaviate 中"""
42
+ client = Client(embedded_options=EmbeddedOptions())
43
+ embedding_model = OpenAIEmbeddings() if model == "OpenAI" else None # 可以根据需要替换为其他嵌入模型
44
+ vectorstore = Weaviate.from_documents(
45
+ client=client,
46
+ documents=chunks,
47
+ embedding=embedding_model,
48
+ by_text=False
49
+ )
50
+ return vectorstore.as_retriever()
51
+
52
+
53
+ # 定义检索增强生成流程
54
+ def setup_rag_chain(model_name="gpt-4", temperature=0):
55
+ """设置检索增强生成流程"""
56
+ prompt_template = """You are an assistant for question-answering tasks.
57
+ Use your knowledge to answer the question if the provided context is not relevant.
58
+ Otherwise, use the context to inform your answer.
59
+ Question: {question}
60
+ Context: {context}
61
+ Answer:
62
+ """
63
+ prompt = ChatPromptTemplate.from_template(prompt_template)
64
+ llm = ChatOpenAI(model_name=model_name, temperature=temperature)
65
+ # 创建 RAG 链,参考 https://python.langchain.com/docs/expression_language/
66
+ rag_chain = (
67
+ prompt
68
+ | llm
69
+ | StrOutputParser()
70
+ )
71
+ return rag_chain
72
+
73
+
74
+ # 执行查询并打印结果
75
+ def execute_query(retriever, rag_chain, query):
76
+ """执行查询并返回结果及检索到的文档块"""
77
+ retrieved_documents = retriever.invoke(query)
78
+ rag_chain_response = rag_chain.invoke({"context": retrieved_documents, "question": query})
79
+ return retrieved_documents, rag_chain_response
80
+
81
+
82
+ # 执行无 RAG 链的查询
83
+ def execute_query_no_rag(model_name="gpt-4", temperature=0, query=""):
84
+ """执行无 RAG 链的查询"""
85
+ llm = ChatOpenAI(model_name=model_name, temperature=temperature)
86
+ response = llm.invoke(query)
87
+ return response.content
88
+
89
+
90
+ # rag_demo.py 相对 rag_demo_v0.py 的不同之处在于可以输出检索到的文档块。
91
+ if __name__ == "__main__":
92
+ # 假设文档已存在于本地
93
+ file_path = './documents/LightZero_README.zh.md'
94
+
95
+ # 加载和分割文档
96
+ chunks = load_and_split_document(file_path)
97
+
98
+ # 创建向量存储
99
+ retriever = create_vector_store(chunks)
100
+
101
+ # 设置 RAG 流程
102
+ rag_chain = setup_rag_chain()
103
+
104
+ # 提出问题并获取答案
105
+ query = "请问 LightZero 里面实现的 AlphaZero 算法支持在 Atari 环境上运行吗?请详细解释原因"
106
+ # query = "请详细解释 MCTS 算法的原理,并给出带有详细中文注释的 Python 代码示例"
107
+
108
+ # 使用 RAG 链获取参考的文档与答案
109
+ retrieved_documents, result_with_rag = execute_query(retriever, rag_chain, query)
110
+
111
+ # 不使用 RAG 链获取答案
112
+ result_without_rag = execute_query_no_rag(query=query)
113
+
114
+ # 打印并对比两种方法的结果
115
+ # 使用textwrap.fill来自动分段文本,width参数可以根据你的屏幕宽度进行调整
116
+ wrapped_result_with_rag = textwrap.fill(result_with_rag, width=80)
117
+ wrapped_result_without_rag = textwrap.fill(result_without_rag, width=80)
118
+ context = '\n'.join(
119
+ [f'**Document {i}**: ' + retrieved_documents[i].page_content for i in range(len(retrieved_documents))])
120
+
121
+ # 打印自动分段后的文本
122
+ print("=" * 40)
123
+ print(f"我的问题是:\n{query}")
124
+ print("=" * 40)
125
+ print(f"Result with RAG:\n{wrapped_result_with_rag}\n检索得到的context是: \n{context}")
126
+ print("=" * 40)
127
+ print(f"Result without RAG:\n{wrapped_result_without_rag}")
128
+ print("=" * 40)
rag_demo_v0.py ADDED
@@ -0,0 +1,136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ 参考博客:https://mp.weixin.qq.com/s/RUdZjQMSlVOfHfhErSNXnA
3
+ """
4
+ # 导入必要的库与模块
5
+ import os
6
+ import textwrap
7
+
8
+ from dotenv import load_dotenv
9
+ from langchain.chat_models import ChatOpenAI
10
+ from langchain.document_loaders import TextLoader
11
+ from langchain.embeddings import OpenAIEmbeddings
12
+ from langchain.prompts import ChatPromptTemplate
13
+ from langchain.schema.output_parser import StrOutputParser
14
+ from langchain.schema.runnable import RunnablePassthrough
15
+ from langchain.text_splitter import CharacterTextSplitter
16
+ from langchain.vectorstores import Weaviate
17
+ from weaviate import Client
18
+ from weaviate.embedded import EmbeddedOptions
19
+
20
+ # 环境设置与文档下载
21
+ load_dotenv() # 加载环境变量
22
+ OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") # 从环境变量获取 OpenAI API 密钥
23
+
24
+ # 确保 OPENAI_API_KEY 被正确设置
25
+ if not OPENAI_API_KEY:
26
+ raise ValueError("OpenAI API Key not found in the environment variables.")
27
+
28
+
29
+ # 文档加载与分割
30
+ def load_and_split_document(file_path, chunk_size=500, chunk_overlap=50):
31
+ """加载文档并分割成小块"""
32
+ loader = TextLoader(file_path)
33
+ documents = loader.load()
34
+ text_splitter = CharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
35
+ chunks = text_splitter.split_documents(documents)
36
+ return chunks
37
+
38
+
39
+ # 向量存储建立
40
+ def create_vector_store(chunks, model="OpenAI"):
41
+ """将文档块转换为向量并存储到 Weaviate 中"""
42
+ client = Client(embedded_options=EmbeddedOptions())
43
+ embedding_model = OpenAIEmbeddings() if model == "OpenAI" else None # 可以根据需要替换为其他嵌入模型
44
+ vectorstore = Weaviate.from_documents(
45
+ client=client,
46
+ documents=chunks,
47
+ embedding=embedding_model,
48
+ by_text=False
49
+ )
50
+ return vectorstore.as_retriever()
51
+
52
+
53
+ # 定义检索增强生成流程
54
+ def setup_rag_chain_v0(retriever, model_name="gpt-4", temperature=0):
55
+ """设置检索增强生成流程"""
56
+ prompt_template = """You are an assistant for question-answering tasks.
57
+ Use your knowledge to answer the question if the provided context is not relevant.
58
+ Otherwise, use the context to inform your answer.
59
+ Question: {question}
60
+ Context: {context}
61
+ Answer:
62
+ """
63
+ prompt = ChatPromptTemplate.from_template(prompt_template)
64
+ llm = ChatOpenAI(model_name=model_name, temperature=temperature)
65
+ # 创建 RAG 链,参考 https://python.langchain.com/docs/expression_language/
66
+ rag_chain = (
67
+ {"context": retriever, "question": RunnablePassthrough()}
68
+ | prompt
69
+ | llm
70
+ | StrOutputParser()
71
+ )
72
+ return rag_chain
73
+
74
+
75
+ # 执行查询并打印结果
76
+ def execute_query_v0(rag_chain, query):
77
+ """执行查询并返回结果"""
78
+ return rag_chain.invoke(query)
79
+
80
+
81
+ # 执行无 RAG 链的查询
82
+ def execute_query_no_rag(model_name="gpt-4", temperature=0, query=""):
83
+ """执行无 RAG 链的查询"""
84
+ llm = ChatOpenAI(model_name=model_name, temperature=temperature)
85
+ response = llm.invoke(query)
86
+ return response.content
87
+
88
+
89
+ # rag_demo.py 相对 rag_demo_v0.py 的不同之处在于可以输出检索到的文档块。
90
+ if __name__ == "__main__":
91
+ # 下载并保存文档到本地(这里被注释掉了,因为已经假设文档存在于本地)
92
+ # url = "https://raw.githubusercontent.com/langchain-ai/langchain/master/docs/docs/modules/state_of_the_union.txt"
93
+ # res = requests.get(url)
94
+ # with open("state_of_the_union.txt", "w") as f:
95
+ # f.write(res.text)
96
+
97
+ # 假设文档已存在于本地
98
+ # file_path = './documents/state_of_the_union.txt'
99
+ file_path = './documents/LightZero_README.zh.md'
100
+
101
+ # 加载和分割文档
102
+ chunks = load_and_split_document(file_path)
103
+
104
+ # 创建向量存储
105
+ retriever = create_vector_store(chunks)
106
+
107
+ # 设置 RAG 流程
108
+ rag_chain = setup_rag_chain_v0(retriever)
109
+
110
+ # 提出问题并获取答案
111
+ # query = "请你分别用中英文简介 LightZero"
112
+ # query = "请你用英文简介 LightZero"
113
+ query = "请你用中文简介 LightZero"
114
+ # query = "请问 LightZero 支持哪些环境和算法,应该如何快速上手使用?"
115
+ # query = "请问 LightZero 里面实现的 MuZero 算法支持在 Atari 环境上运行吗?"
116
+ # query = "请问 LightZero 里面实现的 AlphaZero 算法支持在 Atari 环境上运行吗?请详细解释原因"
117
+ # query = "请详细解释 MCTS 算法的原理,并给出带有详细中文注释的 Python 代码示例"
118
+
119
+ # 使用 RAG 链获取答案
120
+ result_with_rag = execute_query_v0(rag_chain, query)
121
+
122
+ # 不使用 RAG 链获取答案
123
+ result_without_rag = execute_query_no_rag(query=query)
124
+
125
+ # 打印并对比两种方法的结果
126
+ # 使用textwrap.fill来自动分段文本,width参数可以根据你的屏幕宽度进行调整
127
+ wrapped_result_with_rag = textwrap.fill(result_with_rag, width=80)
128
+ wrapped_result_without_rag = textwrap.fill(result_without_rag, width=80)
129
+
130
+ # 打印自动分段后的文本
131
+ print("="*40)
132
+ print(f"我的问题是:\n{query}")
133
+ print("="*40)
134
+ print(f"Result with RAG:\n{wrapped_result_with_rag}")
135
+ print("="*40)
136
+ print(f"Result without RAG:\n{wrapped_result_without_rag}")
requirements.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ gradio
2
+ openai
3
+ langchain
4
+ weaviate-client
5
+ requests
6
+ python-dotenv
7
+ tiktoken