Spaces:
Running
Running
่ฒๆบ
commited on
Commit
ยท
ac6c422
1
Parent(s):
f62e186
polish(pu): use HuggingFace default embedding_model, update lightzero readme
Browse files- app_mqa_database.py +1 -1
- documents/LightZero_README.md +96 -52
- documents/LightZero_README_zh.md +91 -59
app_mqa_database.py
CHANGED
@@ -106,7 +106,7 @@ def close_db_connection():
|
|
106 |
|
107 |
|
108 |
chunks = load_and_split_document(file_path, chunk_size=5000, chunk_overlap=500)
|
109 |
-
vectorstore = create_vector_store(chunks, model='
|
110 |
|
111 |
# ๅ ่ฝฝ้ข่ฎญ็ป็SBERTๆจกๅ
|
112 |
sbert_model = SentenceTransformer('all-MiniLM-L6-v2')
|
|
|
106 |
|
107 |
|
108 |
chunks = load_and_split_document(file_path, chunk_size=5000, chunk_overlap=500)
|
109 |
+
vectorstore = create_vector_store(chunks, model='HuggingFace')
|
110 |
|
111 |
# ๅ ่ฝฝ้ข่ฎญ็ป็SBERTๆจกๅ
|
112 |
sbert_model = SentenceTransformer('all-MiniLM-L6-v2')
|
documents/LightZero_README.md
CHANGED
@@ -26,14 +26,17 @@
|
|
26 |
[](https://github.com/opendilab/LightZero/pulls)
|
27 |
[](https://github.com/opendilab/LightZero/graphs/contributors)
|
28 |
[](https://github.com/opendilab/LightZero/blob/master/LICENSE)
|
|
|
29 |
|
30 |
-
Updated on 2024.
|
31 |
|
32 |
-
|
33 |
|
34 |
-
|
|
|
35 |
|
36 |
-
|
|
|
37 |
|
38 |
The integration of Monte Carlo Tree Search and Deep Reinforcement Learning,
|
39 |
exemplified by AlphaZero and MuZero,
|
@@ -42,9 +45,9 @@ This advanced methodology has also made significant strides in scientific domain
|
|
42 |
The following is an overview of the historical evolution of the Monte Carlo Tree Search algorithm series:
|
43 |

|
44 |
|
45 |
-
## Overview
|
46 |
|
47 |
-
**LightZero** is an open-source algorithm toolkit that combines MCTS and RL for PyTorch. It
|
48 |
- Lightweight.
|
49 |
- Efficient.
|
50 |
- Easy-to-understand.
|
@@ -62,6 +65,7 @@ For further details, please refer to [Features](#features), [Framework Structure
|
|
62 |
- [Integrated Algorithms](#integrated-algorithms)
|
63 |
- [Installation](#installation)
|
64 |
- [Quick Start](#quick-start)
|
|
|
65 |
- [Benchmark](#benchmark)
|
66 |
- [Awesome-MCTS Notes](#awesome-mcts-notes)
|
67 |
- [Paper Notes](#paper-notes)
|
@@ -74,7 +78,7 @@ For further details, please refer to [Features](#features), [Framework Structure
|
|
74 |
- [Acknowledgments](#acknowledgments)
|
75 |
- [License](#license)
|
76 |
|
77 |
-
### Features
|
78 |
|
79 |
**Lightweight**: LightZero integrates multiple MCTS algorithm families and can solve decision-making problems with various attributes in a lightweight framework. The algorithms and environments LightZero implemented can be found [here](#integrated-algorithms).
|
80 |
|
@@ -82,7 +86,7 @@ For further details, please refer to [Features](#features), [Framework Structure
|
|
82 |
|
83 |
**Easy-to-understand**: LightZero provides detailed documentation and algorithm framework diagrams for all integrated algorithms to help users understand the algorithm's core and compare the differences and similarities between algorithms under the same paradigm. LightZero also provides function call graphs and network structure diagrams for algorithm code implementation, making it easier for users to locate critical code. All the documentation can be found [here](#paper-notes).
|
84 |
|
85 |
-
### Framework Structure
|
86 |
|
87 |
[comment]: <> (<p align="center">)
|
88 |
|
@@ -109,7 +113,7 @@ The above picture is the framework pipeline of LightZero. We briefly introduce t
|
|
109 |
|
110 |
For the file structure of LightZero, please refer to [lightzero_file_structure](https://github.com/opendilab/LightZero/blob/main/assets/lightzero_file_structure.svg).
|
111 |
|
112 |
-
### Integrated Algorithms
|
113 |
LightZero is a library with a [PyTorch](https://pytorch.org/) implementation of MCTS algorithms (sometimes combined with cython and cpp), including:
|
114 |
- [AlphaZero](https://www.science.org/doi/10.1126/science.aar6404)
|
115 |
- [MuZero](https://arxiv.org/abs/1911.08265)
|
@@ -117,25 +121,33 @@ LightZero is a library with a [PyTorch](https://pytorch.org/) implementation of
|
|
117 |
- [Stochastic MuZero](https://openreview.net/pdf?id=X6D9bAHhBQ1)
|
118 |
- [EfficientZero](https://arxiv.org/abs/2111.00210)
|
119 |
- [Gumbel MuZero](https://openreview.net/pdf?id=bERaNdoegnO&)
|
|
|
|
|
120 |
|
121 |
The environments and algorithms currently supported by LightZero are shown in the table below:
|
122 |
|
123 |
-
|
124 |
-
|
125 |
-
|
|
126 |
-
|
|
127 |
-
|
|
128 |
-
|
|
129 |
-
|
|
130 |
-
|
|
131 |
-
|
|
132 |
-
|
|
133 |
-
|
|
134 |
-
|
|
135 |
-
|
|
136 |
-
|
|
137 |
-
|
|
138 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
139 |
|
140 |
<sup>(1): "โ" means that the corresponding item is finished and well-tested.</sup>
|
141 |
|
@@ -144,7 +156,7 @@ The environments and algorithms currently supported by LightZero are shown in th
|
|
144 |
<sup>(3): "---" means that this algorithm doesn't support this environment.</sup>
|
145 |
|
146 |
|
147 |
-
## Installation
|
148 |
|
149 |
You can install the latest LightZero in development from the GitHub source codes with the following command:
|
150 |
|
@@ -158,7 +170,7 @@ Kindly note that LightZero currently supports compilation only on `Linux` and `m
|
|
158 |
We are actively working towards extending this support to the `Windows` platform.
|
159 |
Your patience during this transition is greatly appreciated.
|
160 |
|
161 |
-
|
162 |
|
163 |
We also provide a Dockerfile that sets up an environment with all dependencies needed to run the LightZero library. This Docker image is based on Ubuntu 20.04 and installs Python 3.8, along with other necessary tools and libraries.
|
164 |
Here's how to use our Dockerfile to build a Docker image, run a container from this image, and execute LightZero code inside the container.
|
@@ -184,7 +196,7 @@ Here's how to use our Dockerfile to build a Docker image, run a container from t
|
|
184 |
|
185 |
[comment]: <> (- [AlphaGo Zero](https://www.nature.com/articles/nature24270) )
|
186 |
|
187 |
-
## Quick Start
|
188 |
|
189 |
Train a MuZero agent to play [CartPole](https://gymnasium.farama.org/environments/classic_control/cart_pole/):
|
190 |
|
@@ -207,18 +219,30 @@ cd LightZero
|
|
207 |
python3 -u zoo/board_games/tictactoe/config/tictactoe_muzero_bot_mode_config.py
|
208 |
```
|
209 |
|
210 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
211 |
|
212 |
-
For those
|
213 |
|
214 |
-
-
|
215 |
-
-
|
|
|
|
|
216 |
|
217 |
Should you have any questions, feel free to contact us for support.
|
218 |
|
219 |
-
## Benchmark
|
220 |
|
221 |
-
<details
|
222 |
|
223 |
- Below are the benchmark results of [AlphaZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/alphazero.py) and [MuZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/muzero.py) on three board games: [TicTacToe](https://github.com/opendilab/LightZero/blob/main/zoo/board_games/tictactoe/envs/tictactoe_env.py), [Connect4](https://github.com/opendilab/LightZero/blob/main/zoo/board_games/connect4/envs/connect4_env.py), [Gomoku](https://github.com/opendilab/LightZero/blob/main/zoo/board_games/gomoku/envs/gomoku_env.py).
|
224 |
<p align="center">
|
@@ -273,7 +297,7 @@ and two MuJoCo continuous action space games: [Hopper-v3](https://github.com/ope
|
|
273 |
</details>
|
274 |
|
275 |
|
276 |
-
## Awesome-MCTS Notes
|
277 |
|
278 |
### Paper Notes
|
279 |
The following are the detailed paper notes (in Chinese) of the above algorithms:
|
@@ -291,6 +315,8 @@ The following are the detailed paper notes (in Chinese) of the above algorithms:
|
|
291 |
|
292 |
</details>
|
293 |
|
|
|
|
|
294 |
### Algo. Overview
|
295 |
|
296 |
The following are the overview MCTS principle diagrams of the above algorithms:
|
@@ -299,10 +325,11 @@ The following are the overview MCTS principle diagrams of the above algorithms:
|
|
299 |
|
300 |
- [MCTS](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/mcts_overview.pdf)
|
301 |
- [AlphaZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/alphazero_overview.pdf)
|
302 |
-
- [MuZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/muzero_overview.
|
303 |
-
- [EfficientZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/efficientzero_overview.
|
304 |
-
- [SampledMuZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/sampled_muzero_overview.
|
305 |
-
- [GumbelMuZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/gumbel_muzero_overview.
|
|
|
306 |
|
307 |
</details>
|
308 |
|
@@ -335,6 +362,7 @@ Here is a collection of research papers about **Monte Carlo Tree Search**.
|
|
335 |
- [2022 Online and Offline Reinforcement Learning by Planning with a Learned Model](https://arxiv.org/abs/2104.06294)
|
336 |
- [2021 Vector Quantized Models for Planning](https://arxiv.org/abs/2106.04615)
|
337 |
- [2021 Muesli: Combining Improvements in Policy Optimization. ](https://arxiv.org/abs/2104.06159)
|
|
|
338 |
#### MCTS Analysis
|
339 |
- [2020 Monte-Carlo Tree Search as Regularized Policy Optimization](https://arxiv.org/abs/2007.12509)
|
340 |
- [2021 Self-Consistent Models and Values](https://arxiv.org/abs/2110.12840)
|
@@ -482,12 +510,12 @@ Here is a collection of research papers about **Monte Carlo Tree Search**.
|
|
482 |
- ExpEnv: synthetic functions for nonlinear optimization, reinforcement learning problems in MuJoCo locomotion environments, and optimization problems in Neural Architecture Search (NAS).
|
483 |
- [Monte Carlo Tree Search based Variable Selection for High Dimensional Bayesian Optimization](https://openreview.net/pdf?id=SUzPos_pUC) 2022
|
484 |
- Lei Songโ , Ke Xueโ , Xiaobin Huang, Chao Qian
|
485 |
-
- Key:
|
486 |
- ExpEnv: NAS-bench problems and MuJoCo locomotion
|
487 |
- [Monte Carlo Tree Search With Iteratively Refining State Abstractions](https://proceedings.neurips.cc/paper/2021/file/9b0ead00a217ea2c12e06a72eec4923f-Paper.pdf) 2021
|
488 |
- Samuel Sokota, Caleb Ho, Zaheen Ahmad, J. Zico Kolter
|
489 |
- Key: stochastic environments, Progressive widening, abstraction refining
|
490 |
-
- ExpEnv:
|
491 |
- [Deep Synoptic Monte Carlo Planning in Reconnaissance Blind Chess](https://proceedings.neurips.cc/paper/2021/file/215a71a12769b056c3c32e7299f1c5ed-Paper.pdf) 2021
|
492 |
- Gregory Clark
|
493 |
- Key: imperfect information, belief state with an unweighted particle filter, a novel stochastic abstraction of information states.
|
@@ -512,8 +540,11 @@ Here is a collection of research papers about **Monte Carlo Tree Search**.
|
|
512 |
</details>
|
513 |
|
514 |
|
515 |
-
## Feedback and Contribution
|
|
|
516 |
- [File an issue](https://github.com/opendilab/LightZero/issues/new/choose) on Github
|
|
|
|
|
517 |
- Contact our email (opendilab@pjlab.org.cn)
|
518 |
|
519 |
- We appreciate all the feedback and contributions to improve LightZero, both algorithms and system designs.
|
@@ -523,19 +554,32 @@ Here is a collection of research papers about **Monte Carlo Tree Search**.
|
|
523 |
[comment]: <> (And `CONTRIBUTING.md` offers some necessary information.)
|
524 |
|
525 |
|
526 |
-
## Citation
|
527 |
```latex
|
528 |
-
@
|
529 |
-
|
530 |
-
|
531 |
-
|
532 |
-
|
533 |
-
|
534 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
535 |
}
|
536 |
```
|
537 |
|
538 |
-
## Acknowledgments
|
539 |
|
540 |
This project has been developed partially based on the following pioneering works on GitHub repositories.
|
541 |
We express our profound gratitude for these foundational resources:
|
@@ -553,7 +597,7 @@ Thanks to all who contributed to this project:
|
|
553 |
</a>
|
554 |
|
555 |
|
556 |
-
## License
|
557 |
All code within this repository is under [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
|
558 |
|
559 |
<p align="right">(<a href="#top">Back to top</a>)</p>
|
|
|
26 |
[](https://github.com/opendilab/LightZero/pulls)
|
27 |
[](https://github.com/opendilab/LightZero/graphs/contributors)
|
28 |
[](https://github.com/opendilab/LightZero/blob/master/LICENSE)
|
29 |
+
[](https://discord.gg/dkZS2JF56X)
|
30 |
|
31 |
+
Updated on 2024.08.18 LightZero-v0.1.0
|
32 |
|
33 |
+
English | [็ฎไฝไธญๆ(Simplified Chinese)](https://github.com/opendilab/LightZero/blob/main/README.zh.md) | [Documentation](https://opendilab.github.io/LightZero) | [LightZero Paper](https://arxiv.org/abs/2310.08348) | [๐ฅUniZero Paper](https://arxiv.org/abs/2406.10667) | [๐ฅReZero Paper](https://arxiv.org/abs/2404.16364)
|
34 |
|
35 |
+
> LightZero is a lightweight, efficient, and easy-to-understand open-source algorithm toolkit that combines Monte Carlo Tree Search (MCTS) and Deep Reinforcement Learning (RL).
|
36 |
+
> For any questions about LightZero, you can consult the RAG-based Q&A assistant: [ZeroPal](https://huggingface.co/spaces/OpenDILabCommunity/ZeroPal).
|
37 |
|
38 |
+
|
39 |
+
## ๐ Background
|
40 |
|
41 |
The integration of Monte Carlo Tree Search and Deep Reinforcement Learning,
|
42 |
exemplified by AlphaZero and MuZero,
|
|
|
45 |
The following is an overview of the historical evolution of the Monte Carlo Tree Search algorithm series:
|
46 |

|
47 |
|
48 |
+
## ๐จ Overview
|
49 |
|
50 |
+
**LightZero** is an open-source algorithm toolkit that combines Monte Carlo Tree Search (MCTS) and Reinforcement Learning (RL) for PyTorch. It supports a range of MCTS-based RL algorithms and applications, offering several key advantages:
|
51 |
- Lightweight.
|
52 |
- Efficient.
|
53 |
- Easy-to-understand.
|
|
|
65 |
- [Integrated Algorithms](#integrated-algorithms)
|
66 |
- [Installation](#installation)
|
67 |
- [Quick Start](#quick-start)
|
68 |
+
- [Documentation](#documentation)
|
69 |
- [Benchmark](#benchmark)
|
70 |
- [Awesome-MCTS Notes](#awesome-mcts-notes)
|
71 |
- [Paper Notes](#paper-notes)
|
|
|
78 |
- [Acknowledgments](#acknowledgments)
|
79 |
- [License](#license)
|
80 |
|
81 |
+
### ๐ฅ Features
|
82 |
|
83 |
**Lightweight**: LightZero integrates multiple MCTS algorithm families and can solve decision-making problems with various attributes in a lightweight framework. The algorithms and environments LightZero implemented can be found [here](#integrated-algorithms).
|
84 |
|
|
|
86 |
|
87 |
**Easy-to-understand**: LightZero provides detailed documentation and algorithm framework diagrams for all integrated algorithms to help users understand the algorithm's core and compare the differences and similarities between algorithms under the same paradigm. LightZero also provides function call graphs and network structure diagrams for algorithm code implementation, making it easier for users to locate critical code. All the documentation can be found [here](#paper-notes).
|
88 |
|
89 |
+
### ๐งฉ Framework Structure
|
90 |
|
91 |
[comment]: <> (<p align="center">)
|
92 |
|
|
|
113 |
|
114 |
For the file structure of LightZero, please refer to [lightzero_file_structure](https://github.com/opendilab/LightZero/blob/main/assets/lightzero_file_structure.svg).
|
115 |
|
116 |
+
### ๐ Integrated Algorithms
|
117 |
LightZero is a library with a [PyTorch](https://pytorch.org/) implementation of MCTS algorithms (sometimes combined with cython and cpp), including:
|
118 |
- [AlphaZero](https://www.science.org/doi/10.1126/science.aar6404)
|
119 |
- [MuZero](https://arxiv.org/abs/1911.08265)
|
|
|
121 |
- [Stochastic MuZero](https://openreview.net/pdf?id=X6D9bAHhBQ1)
|
122 |
- [EfficientZero](https://arxiv.org/abs/2111.00210)
|
123 |
- [Gumbel MuZero](https://openreview.net/pdf?id=bERaNdoegnO&)
|
124 |
+
- [ReZero](https://arxiv.org/abs/2404.16364)
|
125 |
+
- [UniZero](https://arxiv.org/abs/2406.10667)
|
126 |
|
127 |
The environments and algorithms currently supported by LightZero are shown in the table below:
|
128 |
|
129 |
+
|
130 |
+
| Env./Algo. | AlphaZero | MuZero | Sampled MuZero | EfficientZero | Sampled EfficientZero | Gumbel MuZero | Stochastic MuZero | UniZero | Sampled UniZero | ReZero |
|
131 |
+
|------------------------| -------- | ---- |---------------| ---------- | ------------------ | ------------- | ---------------- | ------- | --- | ------ |
|
132 |
+
| TicTacToe | โ | โ | ๐ | ๐ | ๐ | โ | ๐ | โ | ๐ | ๐ |
|
133 |
+
| Gomoku | โ | โ | ๐ | ๐ | ๐ | โ | ๐ | โ | ๐ | โ |
|
134 |
+
| Connect4 | โ | โ | ๐ | ๐ | ๐ | ๐ | ๐ | โ | ๐ | โ |
|
135 |
+
| 2048 | --- | โ | ๐ | ๐ | ๐ | ๐ | โ | โ | ๐ | ๐ |
|
136 |
+
| Chess | ๐ | ๐ | ๐ | ๐ | ๐ | ๐ | ๐ | ๐ | ๐ | ๐ |
|
137 |
+
| Go | ๐ | ๐ | ๐ | ๐ | ๐ | ๐ | ๐ | ๐ | ๐ | ๐ |
|
138 |
+
| CartPole | --- | โ | ๐ | โ | โ | โ | โ | โ | ๐ | โ |
|
139 |
+
| Pendulum | --- | โ | โ | โ | โ | โ | โ | ๐ | โ | ๐ |
|
140 |
+
| LunarLander | --- | โ | โ | โ | โ | โ | โ | โ | โ | ๐ |
|
141 |
+
| BipedalWalker | --- | โ | โ | โ | โ | โ | ๐ | ๐ | โ | ๐ |
|
142 |
+
| Atari | --- | โ | ๐ | โ | โ | โ | โ | โ | ๐ | โ |
|
143 |
+
| DeepMind Control | --- | --- | โ | --- | โ | ๐ | ๐ | ๐ | โ | ๐ |
|
144 |
+
| MuJoCo | --- | โ | ๐ | โ | โ | ๐ | ๐ | ๐ | ๐ | ๐ |
|
145 |
+
| MiniGrid | --- | โ | ๐ | โ | โ | ๐ | ๐ | โ | ๐ | ๐ |
|
146 |
+
| Bsuite | --- | โ | ๐ | โ | โ | ๐ | ๐ | โ | ๐ | ๐ |
|
147 |
+
| Memory | --- | โ | ๐ | โ | โ | ๐ | ๐ | โ | ๐ | ๐ |
|
148 |
+
| SumToThree (billiards) | --- | ๐ | ๐ | ๐ | โ | ๐ | ๐ | ๐ | ๐ | ๐ |
|
149 |
+
| MetaDrive | --- | ๐ | ๐ | ๐ | โ | ๐ | ๐ | ๐ | ๐ |๐ |
|
150 |
+
|
151 |
|
152 |
<sup>(1): "โ" means that the corresponding item is finished and well-tested.</sup>
|
153 |
|
|
|
156 |
<sup>(3): "---" means that this algorithm doesn't support this environment.</sup>
|
157 |
|
158 |
|
159 |
+
## โ๏ธ Installation
|
160 |
|
161 |
You can install the latest LightZero in development from the GitHub source codes with the following command:
|
162 |
|
|
|
170 |
We are actively working towards extending this support to the `Windows` platform.
|
171 |
Your patience during this transition is greatly appreciated.
|
172 |
|
173 |
+
### Installation with Docker
|
174 |
|
175 |
We also provide a Dockerfile that sets up an environment with all dependencies needed to run the LightZero library. This Docker image is based on Ubuntu 20.04 and installs Python 3.8, along with other necessary tools and libraries.
|
176 |
Here's how to use our Dockerfile to build a Docker image, run a container from this image, and execute LightZero code inside the container.
|
|
|
196 |
|
197 |
[comment]: <> (- [AlphaGo Zero](https://www.nature.com/articles/nature24270) )
|
198 |
|
199 |
+
## ๐ Quick Start
|
200 |
|
201 |
Train a MuZero agent to play [CartPole](https://gymnasium.farama.org/environments/classic_control/cart_pole/):
|
202 |
|
|
|
219 |
python3 -u zoo/board_games/tictactoe/config/tictactoe_muzero_bot_mode_config.py
|
220 |
```
|
221 |
|
222 |
+
Train a UniZero agent to play [Pong](http
|
223 |
+
g/):
|
224 |
+
|
225 |
+
```bash
|
226 |
+
cd LightZero
|
227 |
+
python3 -u zoo/atari/config/atari_unizero_config.py
|
228 |
+
```
|
229 |
+
|
230 |
+
## ๐ Documentation
|
231 |
+
|
232 |
+
The LightZero documentation can be found [here](https://opendilab.github.io/LightZero/). It contains tutorials and the API reference.
|
233 |
|
234 |
+
For those interested in customizing environments and algorithms, we provide relevant guides:
|
235 |
|
236 |
+
- [Customize Environments](https://github.com/opendilab/LightZero/blob/main/docs/source//tutorials/envs/customize_envs.md)
|
237 |
+
- [Customize Algorithms](https://github.com/opendilab/LightZero/blob/main/docs/source//tutorials/algos/customize_algos.md)
|
238 |
+
- [How to Set Configuration Files?](https://github.com/opendilab/LightZero/blob/main/docs/source//tutorials/config/config.md)
|
239 |
+
- [Logging and Monitoring System](https://github.com/opendilab/LightZero/blob/main/docs/source//tutorials/logs/logs.md)
|
240 |
|
241 |
Should you have any questions, feel free to contact us for support.
|
242 |
|
243 |
+
## ๐ Benchmark
|
244 |
|
245 |
+
<details><summary>Click to expand</summary>
|
246 |
|
247 |
- Below are the benchmark results of [AlphaZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/alphazero.py) and [MuZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/muzero.py) on three board games: [TicTacToe](https://github.com/opendilab/LightZero/blob/main/zoo/board_games/tictactoe/envs/tictactoe_env.py), [Connect4](https://github.com/opendilab/LightZero/blob/main/zoo/board_games/connect4/envs/connect4_env.py), [Gomoku](https://github.com/opendilab/LightZero/blob/main/zoo/board_games/gomoku/envs/gomoku_env.py).
|
248 |
<p align="center">
|
|
|
297 |
</details>
|
298 |
|
299 |
|
300 |
+
## ๐ Awesome-MCTS Notes
|
301 |
|
302 |
### Paper Notes
|
303 |
The following are the detailed paper notes (in Chinese) of the above algorithms:
|
|
|
315 |
|
316 |
</details>
|
317 |
|
318 |
+
You can also refer to the relevant Zhihu column (in Chinese): [In-depth Analysis of MCTS+RL Frontier Theories and Applications](https://www.zhihu.com/column/c_1764308735227662336).
|
319 |
+
|
320 |
### Algo. Overview
|
321 |
|
322 |
The following are the overview MCTS principle diagrams of the above algorithms:
|
|
|
325 |
|
326 |
- [MCTS](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/mcts_overview.pdf)
|
327 |
- [AlphaZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/alphazero_overview.pdf)
|
328 |
+
- [MuZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/muzero_overview.png)
|
329 |
+
- [EfficientZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/efficientzero_overview.png)
|
330 |
+
- [SampledMuZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/sampled_muzero_overview.png)
|
331 |
+
- [GumbelMuZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/gumbel_muzero_overview.png)
|
332 |
+
- [StochasticMuZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/stochastic_muzero_overview.png)
|
333 |
|
334 |
</details>
|
335 |
|
|
|
362 |
- [2022 Online and Offline Reinforcement Learning by Planning with a Learned Model](https://arxiv.org/abs/2104.06294)
|
363 |
- [2021 Vector Quantized Models for Planning](https://arxiv.org/abs/2106.04615)
|
364 |
- [2021 Muesli: Combining Improvements in Policy Optimization. ](https://arxiv.org/abs/2104.06159)
|
365 |
+
|
366 |
#### MCTS Analysis
|
367 |
- [2020 Monte-Carlo Tree Search as Regularized Policy Optimization](https://arxiv.org/abs/2007.12509)
|
368 |
- [2021 Self-Consistent Models and Values](https://arxiv.org/abs/2110.12840)
|
|
|
510 |
- ExpEnv: synthetic functions for nonlinear optimization, reinforcement learning problems in MuJoCo locomotion environments, and optimization problems in Neural Architecture Search (NAS).
|
511 |
- [Monte Carlo Tree Search based Variable Selection for High Dimensional Bayesian Optimization](https://openreview.net/pdf?id=SUzPos_pUC) 2022
|
512 |
- Lei Songโ , Ke Xueโ , Xiaobin Huang, Chao Qian
|
513 |
+
- Key: a low-dimensional subspace via MCTS, optimizes in the subspace with any Bayesian optimization algorithm.
|
514 |
- ExpEnv: NAS-bench problems and MuJoCo locomotion
|
515 |
- [Monte Carlo Tree Search With Iteratively Refining State Abstractions](https://proceedings.neurips.cc/paper/2021/file/9b0ead00a217ea2c12e06a72eec4923f-Paper.pdf) 2021
|
516 |
- Samuel Sokota, Caleb Ho, Zaheen Ahmad, J. Zico Kolter
|
517 |
- Key: stochastic environments, Progressive widening, abstraction refining
|
518 |
+
- ExpEnv: Blackjack, Trap, five by five Go.
|
519 |
- [Deep Synoptic Monte Carlo Planning in Reconnaissance Blind Chess](https://proceedings.neurips.cc/paper/2021/file/215a71a12769b056c3c32e7299f1c5ed-Paper.pdf) 2021
|
520 |
- Gregory Clark
|
521 |
- Key: imperfect information, belief state with an unweighted particle filter, a novel stochastic abstraction of information states.
|
|
|
540 |
</details>
|
541 |
|
542 |
|
543 |
+
## ๐ฌ Feedback and Contribution
|
544 |
+
|
545 |
- [File an issue](https://github.com/opendilab/LightZero/issues/new/choose) on Github
|
546 |
+
- Open or participate in our [discussion forum](https://github.com/opendilab/LightZero/discussions)
|
547 |
+
- Discuss on LightZero [discord server](https://discord.gg/dkZS2JF56X)
|
548 |
- Contact our email (opendilab@pjlab.org.cn)
|
549 |
|
550 |
- We appreciate all the feedback and contributions to improve LightZero, both algorithms and system designs.
|
|
|
554 |
[comment]: <> (And `CONTRIBUTING.md` offers some necessary information.)
|
555 |
|
556 |
|
557 |
+
## ๐ Citation
|
558 |
```latex
|
559 |
+
@article{niu2024lightzero,
|
560 |
+
title={LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios},
|
561 |
+
author={Niu, Yazhe and Pu, Yuan and Yang, Zhenjie and Li, Xueyan and Zhou, Tong and Ren, Jiyuan and Hu, Shuai and Li, Hongsheng and Liu, Yu},
|
562 |
+
journal={Advances in Neural Information Processing Systems},
|
563 |
+
volume={36},
|
564 |
+
year={2024}
|
565 |
+
}
|
566 |
+
|
567 |
+
@article{pu2024unizero,
|
568 |
+
title={UniZero: Generalized and Efficient Planning with Scalable Latent World Models},
|
569 |
+
author={Pu, Yuan and Niu, Yazhe and Ren, Jiyuan and Yang, Zhenjie and Li, Hongsheng and Liu, Yu},
|
570 |
+
journal={arXiv preprint arXiv:2406.10667},
|
571 |
+
year={2024}
|
572 |
+
}
|
573 |
+
|
574 |
+
@article{xuan2024rezero,
|
575 |
+
title={ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze},
|
576 |
+
author={Xuan, Chunyu and Niu, Yazhe and Pu, Yuan and Hu, Shuai and Liu, Yu and Yang, Jing},
|
577 |
+
journal={arXiv preprint arXiv:2404.16364},
|
578 |
+
year={2024}
|
579 |
}
|
580 |
```
|
581 |
|
582 |
+
## ๐ Acknowledgments
|
583 |
|
584 |
This project has been developed partially based on the following pioneering works on GitHub repositories.
|
585 |
We express our profound gratitude for these foundational resources:
|
|
|
597 |
</a>
|
598 |
|
599 |
|
600 |
+
## ๐ท๏ธ License
|
601 |
All code within this repository is under [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
|
602 |
|
603 |
<p align="right">(<a href="#top">Back to top</a>)</p>
|
documents/LightZero_README_zh.md
CHANGED
@@ -27,18 +27,20 @@
|
|
27 |
[](https://github.com/opendilab/LightZero/graphs/contributors)
|
28 |
[](https://github.com/opendilab/LightZero/blob/master/LICENSE)
|
29 |
|
30 |
-
ๆ่ฟๆดๆฐไบ 2024.
|
|
|
|
|
31 |
|
32 |
> LightZero ๆฏไธไธช่ฝป้ใ้ซๆใๆๆ็ MCTS+RL ๅผๆบ็ฎๆณๅบใ
|
|
|
33 |
|
34 |
-
[English](https://github.com/opendilab/LightZero/blob/main/README.md) | ็ฎไฝไธญๆ | [่ฎบๆ้พๆฅ](https://arxiv.org/pdf/2310.08348.pdf)
|
35 |
|
36 |
-
## ่ๆฏ
|
37 |
|
38 |
ไปฅ AlphaZero, MuZero ไธบไปฃ่กจ็็ปๅ่็นๅกๆดๆ ๆ็ดข (Monte Carlo Tree Search, MCTS) ๅๆทฑๅบฆๅผบๅๅญฆไน (Deep Reinforcemeent Learning, DRL) ็ๆนๆณ๏ผๅจ่ฏธๅฆๅดๆฃ๏ผAtari ็ญๅ็งๆธธๆไธๅๅพไบ่ถ
ไบบ็ๆฐดๅนณ๏ผไนๅจ่ฏธๅฆ่็ฝ่ดจ็ปๆ้ขๆต๏ผ็ฉ้ตไนๆณ็ฎๆณๅฏปๆพ็ญ็งๅญฆ้ขๅๅๅพไบๅฏๅ็่ฟๅฑใไธๅพไธบ่็นๅกๆดๆ ๆ็ดข๏ผMCTS๏ผ็ฎๆณๆ็ๅๅฑๅๅฒ๏ผ
|
39 |

|
40 |
|
41 |
-
## ๆฆ่ง
|
42 |
|
43 |
**LightZero** ๆฏไธไธช็ปๅไบ่็นๅกๆดๆ ๆ็ดขๅๅผบๅๅญฆไน ็ๅผๆบ็ฎๆณๅทฅๅ
ทๅ
ใ ๅฎๆฏๆไธ็ณปๅๅบไบ MCTS ็ RL ็ฎๆณ๏ผๅ
ทๆไปฅไธไผ็น๏ผ
|
44 |
- ่ฝป้ใ
|
@@ -57,6 +59,7 @@
|
|
57 |
- [้ๆ็ฎๆณ](#้ๆ็ฎๆณ)
|
58 |
- [ๅฎ่ฃ
ๆนๆณ](#ๅฎ่ฃ
ๆนๆณ)
|
59 |
- [ๅฟซ้ๅผๅง](#ๅฟซ้ๅผๅง)
|
|
|
60 |
- [ๅบ็บฟ็ฎๆณๆฏ่พ](#ๅบ็บฟ็ฎๆณๆฏ่พ)
|
61 |
- [MCTS็ธๅ
ณ็ฌ่ฎฐ](#MCTS-็ธๅ
ณ็ฌ่ฎฐ)
|
62 |
- [่ฎบๆ็ฌ่ฎฐ](#่ฎบๆ็ฌ่ฎฐ)
|
@@ -69,14 +72,14 @@
|
|
69 |
- [่ด่ฐข](#่ด่ฐข)
|
70 |
- [่ฎธๅฏ่ฏ](#่ฎธๅฏ่ฏ)
|
71 |
|
72 |
-
### ็น็น
|
73 |
**่ฝป้**๏ผLightZero ไธญ้ๆไบๅค็ง MCTS ๆ็ฎๆณ๏ผ่ฝๅคๅจๅไธๆกๆถไธ่ฝป้ๅๅฐ่งฃๅณๅค็งๅฑๆง็ๅณ็ญ้ฎ้ขใ
|
74 |
|
75 |
**้ซๆ**๏ผLightZero ้ๅฏน MCTS ๆ็ฎๆณไธญ่ๆถๆ้ฟ็็ฏ่๏ผ้็จๆททๅๅผๆ่ฎก็ฎ็ผ็จๆ้ซ่ฎก็ฎๆ็ใ
|
76 |
|
77 |
**ๆๆ**๏ผLightZero ไธบๆๆ้ๆ็็ฎๆณๆไพไบ่ฏฆ็ปๆๆกฃๅ็ฎๆณๆกๆถๅพ๏ผๅธฎๅฉ็จๆท็่งฃ็ฎๆณๅ
ๆ ธ๏ผๅจๅไธ่ๅผไธๆฏ่พ็ฎๆณไน้ด็ๅผๅใๅๆถ๏ผLightZero ไนไธบ็ฎๆณ็ไปฃ็ ๅฎ็ฐๆไพไบๅฝๆฐ่ฐ็จๅพๅ็ฝ็ป็ปๆๅพ๏ผไพฟไบ็จๆทๅฎไฝๅ
ณ้ฎไปฃ็ ใ
|
78 |
|
79 |
-
### ๆกๆถ็ปๆ
|
80 |
|
81 |
<p align="center">
|
82 |
<img src="assets/lightzero_pipeline.svg" alt="Image Description 2" width="50%" height="auto" style="margin: 0 1%;">
|
@@ -96,7 +99,7 @@
|
|
96 |
|
97 |
ๅ
ณไบ LightZero ็ๆไปถ็ปๆ๏ผ่ฏทๅ่ [lightzero_file_structure](https://github.com/opendilab/LightZero/blob/main/assets/lightzero_file_structure.svg)ใ
|
98 |
|
99 |
-
### ้ๆ็ฎๆณ
|
100 |
LightZero ๆฏๅบไบ [PyTorch](https://pytorch.org/) ๅฎ็ฐ็ MCTS ็ฎๆณๅบ๏ผๅจ MCTS ็ๅฎ็ฐไธญไน็จๅฐไบ cython ๅ cppใๅๆถ๏ผLightZero ็ๆกๆถไธป่ฆๅบไบ [DI-engine](https://github.com/opendilab/DI-engine) ๅฎ็ฐใ็ฎๅ LightZero ไธญ้ๆ็็ฎๆณๅ
ๆฌ๏ผ
|
101 |
- [AlphaZero](https://www.science.org/doi/10.1126/science.aar6404)
|
102 |
- [MuZero](https://arxiv.org/abs/1911.08265)
|
@@ -104,26 +107,30 @@ LightZero ๆฏๅบไบ [PyTorch](https://pytorch.org/) ๅฎ็ฐ็ MCTS ็ฎๆณๅบ๏ผ
|
|
104 |
- [Stochastic MuZero](https://openreview.net/pdf?id=X6D9bAHhBQ1)
|
105 |
- [EfficientZero](https://arxiv.org/abs/2111.00210)
|
106 |
- [Gumbel MuZero](https://openreview.net/pdf?id=bERaNdoegnO&)
|
107 |
-
|
|
|
108 |
|
109 |
LightZero ็ฎๅๆฏๆ็็ฏๅขๅ็ฎๆณๅฆไธ่กจๆ็คบ๏ผ
|
110 |
|
111 |
-
| Env./Algo.
|
112 |
-
|
113 |
-
| TicTacToe
|
114 |
-
| Gomoku
|
115 |
-
| Connect4
|
116 |
-
| 2048
|
117 |
-
| Chess
|
118 |
-
| Go | ๐
|
119 |
-
| CartPole
|
120 |
-
| Pendulum
|
121 |
-
| LunarLander
|
122 |
-
| BipedalWalker
|
123 |
-
| Atari
|
124 |
-
|
|
125 |
-
|
|
126 |
-
|
|
|
|
|
|
|
|
127 |
|
128 |
<sup>(1): "โ" ่กจ็คบๅฏนๅบ็้กน็ฎๅทฒ็ปๅฎๆๅนถ็ป่ฟ่ฏๅฅฝ็ๆต่ฏใ</sup>
|
129 |
|
@@ -131,7 +138,7 @@ LightZero ็ฎๅๆฏๆ็็ฏๅขๅ็ฎๆณๅฆไธ่กจๆ็คบ๏ผ
|
|
131 |
|
132 |
<sup>(3): "---" ่กจ็คบ่ฏฅ็ฎๆณไธๆฏๆๆญค็ฏๅขใ</sup>
|
133 |
|
134 |
-
## ๅฎ่ฃ
ๆนๆณ
|
135 |
|
136 |
ๅฏไปฅ็จไปฅไธๅฝไปคไป Github ็ๆบ็ ไธญๅฎ่ฃ
ๆๆฐ็็ LightZero๏ผ
|
137 |
|
@@ -170,7 +177,7 @@ pip3 install -e .
|
|
170 |
python ./LightZero/zoo/classic_control/cartpole/config/cartpole_muzero_config.py
|
171 |
```
|
172 |
|
173 |
-
## ๅฟซ้ๅผๅง
|
174 |
ไฝฟ็จๅฆไธไปฃ็ ๅจ [CartPole](https://gymnasium.farama.org/environments/classic_control/cart_pole/) ็ฏๅขไธๅฟซ้่ฎญ็ปไธไธช MuZero ๆบ่ฝไฝ:
|
175 |
|
176 |
```bash
|
@@ -191,18 +198,30 @@ python3 -u zoo/atari/config/atari_muzero_config.py
|
|
191 |
cd LightZero
|
192 |
python3 -u zoo/board_games/tictactoe/config/tictactoe_muzero_bot_mode_config.py
|
193 |
```
|
194 |
-
## ๅฎๅถๅๆๆกฃ
|
195 |
|
196 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
197 |
|
198 |
-
|
199 |
-
- **็ฎๆณๅฎๅถ๏ผ** [ๅฎๅถ็ฎๆณ](https://github.com/opendilab/LightZero/blob/main/docs/source/tutorials/algos/customize_algos_zh.md)
|
200 |
|
201 |
-
|
202 |
|
203 |
-
|
|
|
|
|
|
|
204 |
|
205 |
-
|
|
|
|
|
|
|
|
|
206 |
|
207 |
- [AlphaZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/alphazero.py) ๅ [MuZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/muzero.py) ๅจ3ไธชๆฃ็ฑปๆธธๆ๏ผ[TicTacToe (ไบๅญๆฃ)](https://github.com/opendilab/LightZero/blob/main/zoo/board_games/tictactoe/envs/tictactoe_env.py)๏ผ[Connect4](https://github.com/opendilab/LightZero/blob/main/zoo/board_games/connect4/envs/connect4_env.py) ๅ [Gomoku (ไบๅญๆฃ)](https://github.com/opendilab/LightZero/blob/main/zoo/board_games/gomoku/envs/gomoku_env.py)๏ผไธ็ๅบ็บฟ็ปๆ๏ผ
|
208 |
<p align="center">
|
@@ -255,7 +274,7 @@ python3 -u zoo/board_games/tictactoe/config/tictactoe_muzero_bot_mode_config.py
|
|
255 |
|
256 |
</details>
|
257 |
|
258 |
-
## MCTS ็ธๅ
ณ็ฌ่ฎฐ
|
259 |
|
260 |
### ่ฎบๆ็ฌ่ฎฐ
|
261 |
|
@@ -279,24 +298,22 @@ python3 -u zoo/board_games/tictactoe/config/tictactoe_muzero_bot_mode_config.py
|
|
279 |
|
280 |
</details>
|
281 |
|
|
|
|
|
282 |
### ็ฎๆณๆกๆถๅพ
|
283 |
|
284 |
ไปฅไธๆฏ LightZero ไธญ้ๆ็ฎๆณ็ๆกๆถๆฆ่งๅพ๏ผ
|
285 |
|
286 |
<details closed>
|
287 |
-
<summary>(
|
288 |
-
|
289 |
-
[MCTS](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/mcts_overview.pdf)
|
290 |
-
|
291 |
-
[AlphaZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/alphazero_overview.pdf)
|
292 |
-
|
293 |
-
[MuZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/muzero_overview.pdf)
|
294 |
-
|
295 |
-
[EfficientZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/efficientzero_overview.pdf)
|
296 |
|
297 |
-
[
|
298 |
-
|
299 |
-
[
|
|
|
|
|
|
|
|
|
300 |
|
301 |
</details>
|
302 |
|
@@ -307,7 +324,7 @@ python3 -u zoo/board_games/tictactoe/config/tictactoe_muzero_bot_mode_config.py
|
|
307 |
### ้่ฆ่ฎบๆ
|
308 |
|
309 |
<details closed>
|
310 |
-
<summary>(
|
311 |
|
312 |
#### LightZero Implemented series
|
313 |
|
@@ -351,7 +368,7 @@ python3 -u zoo/board_games/tictactoe/config/tictactoe_muzero_bot_mode_config.py
|
|
351 |
### ๅ
ถไป่ฎบๆ
|
352 |
|
353 |
<details closed>
|
354 |
-
<summary>(
|
355 |
|
356 |
#### ICML
|
357 |
- [Scalable Safe Policy Improvement via Monte Carlo Tree Search](https://openreview.net/pdf?id=tevbBSzSfK) 2023
|
@@ -511,27 +528,42 @@ and internal state transition dynamics,
|
|
511 |
- [Sample-Efficient Neural Architecture Search by Learning Actions for Monte Carlo Tree Search](https://arxiv.org/pdf/1906.06832) IEEE Transactions on Pattern Analysis and Machine Intelligence 2022.
|
512 |
</details>
|
513 |
|
514 |
-
## ๅ้ฆๆ่งๅ่ดก็ฎ
|
515 |
- ๆไปปไฝ็้ฎๆๆ่ง้ฝๅฏไปฅๅจ github ไธ็ดๆฅ [ๆๅบ issue](https://github.com/opendilab/LightZero/issues/new/choose)
|
|
|
|
|
516 |
- ๆ๏ฟฝ๏ฟฝ่็ณปๆไปฌ็้ฎ็ฎฑ (opendilab@pjlab.org.cn)
|
517 |
|
518 |
- ๆ่ฐขๆๆ็ๅ้ฆๆ่ง๏ผๅ
ๆฌๅฏน็ฎๆณๅ็ณป็ป่ฎพ่ฎกใ่ฟไบๅ้ฆๆ่งๅๅปบ่ฎฎ้ฝไผ่ฎฉ LightZero ๅๅพๆดๅฅฝใ
|
519 |
|
520 |
|
521 |
-
## ๅผ็จ
|
522 |
|
523 |
```latex
|
524 |
-
@
|
525 |
-
|
526 |
-
|
527 |
-
|
528 |
-
|
529 |
-
|
530 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
531 |
}
|
532 |
```
|
533 |
|
534 |
-
## ่ด่ฐข
|
535 |
ๆญค็ฎๆณๅบ็ๅฎ็ฐ้จๅๅบไบไปฅไธ GitHub ไปๅบ๏ผ้ๅธธๆ่ฐข่ฟไบๅผๅๆงๅทฅไฝ๏ผ
|
536 |
- https://github.com/opendilab/DI-engine
|
537 |
- https://github.com/deepmind/mctx
|
@@ -546,7 +578,7 @@ and internal state transition dynamics,
|
|
546 |
<img src="https://contrib.rocks/image?repo=opendilab/LightZero" />
|
547 |
</a>
|
548 |
|
549 |
-
## ่ฎธๅฏ่ฏ
|
550 |
|
551 |
ๆฌไปๅบไธญ็ๆๆไปฃ็ ้ฝ็ฌฆๅ [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)ใ
|
552 |
|
|
|
27 |
[](https://github.com/opendilab/LightZero/graphs/contributors)
|
28 |
[](https://github.com/opendilab/LightZero/blob/master/LICENSE)
|
29 |
|
30 |
+
ๆ่ฟๆดๆฐไบ 2024.08.18 LightZero-v0.1.0
|
31 |
+
|
32 |
+
[English](https://github.com/opendilab/LightZero/blob/main/README.md) | ็ฎไฝไธญๆ | [ๆๆกฃ](https://opendilab.github.io/LightZero) | [LightZero ่ฎบๆ](https://arxiv.org/abs/2310.08348) | [๐ฅUniZero ่ฎบๆ](https://arxiv.org/abs/2406.10667) | [๐ฅReZero ่ฎบๆ](https://arxiv.org/abs/2404.16364)
|
33 |
|
34 |
> LightZero ๆฏไธไธช่ฝป้ใ้ซๆใๆๆ็ MCTS+RL ๅผๆบ็ฎๆณๅบใ
|
35 |
+
> ๆๅ
ณ LightZero ็ไปปไฝ็้ฎ๏ผๆจ้ฝๅฏไปฅๅจ่ฏขๅบไบ RAG ๆๆฏ็้ฎ็ญๅฉๆ๏ผ[ZeroPal](https://huggingface.co/spaces/OpenDILabCommunity/ZeroPal)ใ
|
36 |
|
|
|
37 |
|
38 |
+
## ๐ ่ๆฏ
|
39 |
|
40 |
ไปฅ AlphaZero, MuZero ไธบไปฃ่กจ็็ปๅ่็นๅกๆดๆ ๆ็ดข (Monte Carlo Tree Search, MCTS) ๅๆทฑๅบฆๅผบๅๅญฆไน (Deep Reinforcemeent Learning, DRL) ็ๆนๆณ๏ผๅจ่ฏธๅฆๅดๆฃ๏ผAtari ็ญๅ็งๆธธๆไธๅๅพไบ่ถ
ไบบ็ๆฐดๅนณ๏ผไนๅจ่ฏธๅฆ่็ฝ่ดจ็ปๆ้ขๆต๏ผ็ฉ้ตไนๆณ็ฎๆณๅฏปๆพ็ญ็งๅญฆ้ขๅๅๅพไบๅฏๅ็่ฟๅฑใไธๅพไธบ่็นๅกๆดๆ ๆ็ดข๏ผMCTS๏ผ็ฎๆณๆ็ๅๅฑๅๅฒ๏ผ
|
41 |

|
42 |
|
43 |
+
## ๐จ ๆฆ่ง
|
44 |
|
45 |
**LightZero** ๆฏไธไธช็ปๅไบ่็นๅกๆดๆ ๆ็ดขๅๅผบๅๅญฆไน ็ๅผๆบ็ฎๆณๅทฅๅ
ทๅ
ใ ๅฎๆฏๆไธ็ณปๅๅบไบ MCTS ็ RL ็ฎๆณ๏ผๅ
ทๆไปฅไธไผ็น๏ผ
|
46 |
- ่ฝป้ใ
|
|
|
59 |
- [้ๆ็ฎๆณ](#้ๆ็ฎๆณ)
|
60 |
- [ๅฎ่ฃ
ๆนๆณ](#ๅฎ่ฃ
ๆนๆณ)
|
61 |
- [ๅฟซ้ๅผๅง](#ๅฟซ้ๅผๅง)
|
62 |
+
- [ๆๆกฃ](#ๆๆกฃ)
|
63 |
- [ๅบ็บฟ็ฎๆณๆฏ่พ](#ๅบ็บฟ็ฎๆณๆฏ่พ)
|
64 |
- [MCTS็ธๅ
ณ็ฌ่ฎฐ](#MCTS-็ธๅ
ณ็ฌ่ฎฐ)
|
65 |
- [่ฎบๆ็ฌ่ฎฐ](#่ฎบๆ็ฌ่ฎฐ)
|
|
|
72 |
- [่ด่ฐข](#่ด่ฐข)
|
73 |
- [่ฎธๅฏ่ฏ](#่ฎธๅฏ่ฏ)
|
74 |
|
75 |
+
### ๐ฅ ็น็น
|
76 |
**่ฝป้**๏ผLightZero ไธญ้ๆไบๅค็ง MCTS ๆ็ฎๆณ๏ผ่ฝๅคๅจๅไธๆกๆถไธ่ฝป้ๅๅฐ่งฃๅณๅค็งๅฑๆง็ๅณ็ญ้ฎ้ขใ
|
77 |
|
78 |
**้ซๆ**๏ผLightZero ้ๅฏน MCTS ๆ็ฎๆณไธญ่ๆถๆ้ฟ็็ฏ่๏ผ้็จๆททๅๅผๆ่ฎก็ฎ็ผ็จๆ้ซ่ฎก็ฎๆ็ใ
|
79 |
|
80 |
**ๆๆ**๏ผLightZero ไธบๆๆ้ๆ็็ฎๆณๆไพไบ่ฏฆ็ปๆๆกฃๅ็ฎๆณๆกๆถๅพ๏ผๅธฎๅฉ็จๆท็่งฃ็ฎๆณๅ
ๆ ธ๏ผๅจๅไธ่ๅผไธๆฏ่พ็ฎๆณไน้ด็ๅผๅใๅๆถ๏ผLightZero ไนไธบ็ฎๆณ็ไปฃ็ ๅฎ็ฐๆไพไบๅฝๆฐ่ฐ็จๅพๅ็ฝ็ป็ปๆๅพ๏ผไพฟไบ็จๆทๅฎไฝๅ
ณ้ฎไปฃ็ ใ
|
81 |
|
82 |
+
### ๐งฉ ๆกๆถ็ปๆ
|
83 |
|
84 |
<p align="center">
|
85 |
<img src="assets/lightzero_pipeline.svg" alt="Image Description 2" width="50%" height="auto" style="margin: 0 1%;">
|
|
|
99 |
|
100 |
ๅ
ณไบ LightZero ็ๆไปถ็ปๆ๏ผ่ฏทๅ่ [lightzero_file_structure](https://github.com/opendilab/LightZero/blob/main/assets/lightzero_file_structure.svg)ใ
|
101 |
|
102 |
+
### ๐ ้ๆ็ฎๆณ
|
103 |
LightZero ๆฏๅบไบ [PyTorch](https://pytorch.org/) ๅฎ็ฐ็ MCTS ็ฎๆณๅบ๏ผๅจ MCTS ็ๅฎ็ฐไธญไน็จๅฐไบ cython ๅ cppใๅๆถ๏ผLightZero ็ๆกๆถไธป่ฆๅบไบ [DI-engine](https://github.com/opendilab/DI-engine) ๅฎ็ฐใ็ฎๅ LightZero ไธญ้ๆ็็ฎๆณๅ
ๆฌ๏ผ
|
104 |
- [AlphaZero](https://www.science.org/doi/10.1126/science.aar6404)
|
105 |
- [MuZero](https://arxiv.org/abs/1911.08265)
|
|
|
107 |
- [Stochastic MuZero](https://openreview.net/pdf?id=X6D9bAHhBQ1)
|
108 |
- [EfficientZero](https://arxiv.org/abs/2111.00210)
|
109 |
- [Gumbel MuZero](https://openreview.net/pdf?id=bERaNdoegnO&)
|
110 |
+
- [ReZero](https://arxiv.org/abs/2404.16364)
|
111 |
+
- [UniZero](https://arxiv.org/abs/2406.10667)
|
112 |
|
113 |
LightZero ็ฎๅๆฏๆ็็ฏๅขๅ็ฎๆณๅฆไธ่กจๆ็คบ๏ผ
|
114 |
|
115 |
+
| Env./Algo. | AlphaZero | MuZero | Sampled MuZero | EfficientZero | Sampled EfficientZero | Gumbel MuZero | Stochastic MuZero | UniZero | Sampled UniZero | ReZero |
|
116 |
+
|------------------------| -------- | ---- |---------------| ---------- | ------------------ | ------------- | ---------------- | ------- | --- | ------ |
|
117 |
+
| TicTacToe | โ | โ | ๐ | ๐ | ๐ | โ | ๐ | โ | ๐ | ๐ |
|
118 |
+
| Gomoku | โ | โ | ๐ | ๐ | ๐ | โ | ๐ | โ | ๐ | โ |
|
119 |
+
| Connect4 | โ | โ | ๐ | ๐ | ๐ | ๐ | ๐ | โ | ๐ | โ |
|
120 |
+
| 2048 | --- | โ | ๐ | ๐ | ๐ | ๐ | โ | โ | ๐ | ๐ |
|
121 |
+
| Chess | ๐ | ๐ | ๐ | ๐ | ๐ | ๐ | ๐ | ๐ | ๐ | ๐ |
|
122 |
+
| Go | ๐ | ๐ | ๐ | ๐ | ๐ | ๐ | ๐ | ๐ | ๐ | ๐ |
|
123 |
+
| CartPole | --- | โ | ๐ | โ | โ | โ | โ | โ | ๐ | โ |
|
124 |
+
| Pendulum | --- | โ | โ | โ | โ | โ | โ | ๐ | โ | ๐ |
|
125 |
+
| LunarLander | --- | โ | โ | โ | โ | โ | โ | โ | โ | ๐ |
|
126 |
+
| BipedalWalker | --- | โ | โ | โ | โ | โ | ๐ | ๐ | โ | ๐ |
|
127 |
+
| Atari | --- | โ | ๐ | โ | โ | โ | โ | โ | ๐ | โ |
|
128 |
+
| DeepMind Control | --- | --- | โ | --- | โ | ๐ | ๐ | ๐ | โ | ๐ |
|
129 |
+
| MuJoCo | --- | โ | ๐ | โ | โ | ๐ | ๐ | ๐ | ๐ | ๐ |
|
130 |
+
| MiniGrid | --- | โ | ๐ | โ | โ | ๐ | ๐ | โ | ๐ | ๐ |
|
131 |
+
| Bsuite | --- | โ | ๐ | โ | โ | ๐ | ๐ | โ | ๐ | ๐ |
|
132 |
+
| Memory | --- | โ | ๐ | โ | โ | ๐ | ๐ | โ | ๐ | ๐ |
|
133 |
+
| SumToThree (billiards) | --- | ๐ | ๐ | ๐ | โ | ๐ | ๐ | ๐ | ๐ | ๐ |
|
134 |
|
135 |
<sup>(1): "โ" ่กจ็คบๅฏนๅบ็้กน็ฎๅทฒ็ปๅฎๆๅนถ็ป่ฟ่ฏๅฅฝ็ๆต่ฏใ</sup>
|
136 |
|
|
|
138 |
|
139 |
<sup>(3): "---" ่กจ็คบ่ฏฅ็ฎๆณไธๆฏๆๆญค็ฏๅขใ</sup>
|
140 |
|
141 |
+
## โ๏ธ ๅฎ่ฃ
ๆนๆณ
|
142 |
|
143 |
ๅฏไปฅ็จไปฅไธๅฝไปคไป Github ็ๆบ็ ไธญๅฎ่ฃ
ๆๆฐ็็ LightZero๏ผ
|
144 |
|
|
|
177 |
python ./LightZero/zoo/classic_control/cartpole/config/cartpole_muzero_config.py
|
178 |
```
|
179 |
|
180 |
+
## ๐ ๅฟซ้ๅผๅง
|
181 |
ไฝฟ็จๅฆไธไปฃ็ ๅจ [CartPole](https://gymnasium.farama.org/environments/classic_control/cart_pole/) ็ฏๅขไธๅฟซ้่ฎญ็ปไธไธช MuZero ๆบ่ฝไฝ:
|
182 |
|
183 |
```bash
|
|
|
198 |
cd LightZero
|
199 |
python3 -u zoo/board_games/tictactoe/config/tictactoe_muzero_bot_mode_config.py
|
200 |
```
|
|
|
201 |
|
202 |
+
ไฝฟ็จๅฆไธไปฃ็ ๅจ [Pong](https://gymnasium.farama.org/environments/atari/pong/) ็ฏๅขไธๅฟซ้่ฎญ็ปไธไธช UniZero ๆบ่ฝไฝ๏ผ
|
203 |
+
|
204 |
+
```bash
|
205 |
+
cd LightZero
|
206 |
+
python3 -u zoo/atari/config/atari_unizero_config.py
|
207 |
+
```
|
208 |
+
|
209 |
+
## ๐ ๆๆกฃ
|
210 |
|
211 |
+
LightZero็ๆๆกฃๅฏไปฅๅจ[่ฟ้](https://opendilab.github.io/LightZero/)ๆพๅฐใๆๆกฃไธญๅ
ๅซๆ็จๅAPIๅ่ใ
|
|
|
212 |
|
213 |
+
ไธบๅธๆๅฎๅถ็ฏๅขๅ็ฎๆณ็็จๆท๏ผๆไปฌๆไพไบ็ธๅบ็ๆๅ๏ผ
|
214 |
|
215 |
+
- [ๅฆไฝ่ชๅฎไน็ฏๅข?](https://github.com/opendilab/LightZero/blob/main/docs/source/tutorials/envs/customize_envs_zh.md)
|
216 |
+
- [ๅฆไฝ่ชๅฎไน็ฎๆณ?](https://github.com/opendilab/LightZero/blob/main/docs/source/tutorials/algos/customize_algos_zh.md)
|
217 |
+
- [ๅฆไฝ่ฎพ็ฝฎ้
็ฝฎๆไปถ๏ผ](https://github.com/opendilab/LightZero/blob/main/docs/source/tutorials/config/config_zh.md)
|
218 |
+
- [ๆฅๅฟ็ณป็ป](https://github.com/opendilab/LightZero/blob/main/docs/source/tutorials/logs/logs_zh.md)
|
219 |
|
220 |
+
ๅฆๆไปปไฝ็้ฎ๏ผๆฌข่ฟ้ๆถ่็ณปๆไปฌใ
|
221 |
+
|
222 |
+
## ๐ ๅบ็บฟ็ฎๆณๆฏ่พ
|
223 |
+
|
224 |
+
<details><summary>็นๅปๆฅ็</summary>
|
225 |
|
226 |
- [AlphaZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/alphazero.py) ๅ [MuZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/muzero.py) ๅจ3ไธชๆฃ็ฑปๆธธๆ๏ผ[TicTacToe (ไบๅญๆฃ)](https://github.com/opendilab/LightZero/blob/main/zoo/board_games/tictactoe/envs/tictactoe_env.py)๏ผ[Connect4](https://github.com/opendilab/LightZero/blob/main/zoo/board_games/connect4/envs/connect4_env.py) ๅ [Gomoku (ไบๅญๆฃ)](https://github.com/opendilab/LightZero/blob/main/zoo/board_games/gomoku/envs/gomoku_env.py)๏ผไธ็ๅบ็บฟ็ปๆ๏ผ
|
227 |
<p align="center">
|
|
|
274 |
|
275 |
</details>
|
276 |
|
277 |
+
## ๐ MCTS ็ธๅ
ณ็ฌ่ฎฐ
|
278 |
|
279 |
### ่ฎบๆ็ฌ่ฎฐ
|
280 |
|
|
|
298 |
|
299 |
</details>
|
300 |
|
301 |
+
ไนๅฏๅ่็ธๅบ็็ฅไนไธๆ : [MCTS+RL ๅๆฒฟ็่ฎบๅๅบ็จ็ๆทฑๅ
ฅ่งฃๆ](https://www.zhihu.com/column/c_1764308735227662336)ใ
|
302 |
+
|
303 |
### ็ฎๆณๆกๆถๅพ
|
304 |
|
305 |
ไปฅไธๆฏ LightZero ไธญ้ๆ็ฎๆณ็ๆกๆถๆฆ่งๅพ๏ผ
|
306 |
|
307 |
<details closed>
|
308 |
+
<summary>(็นๅปๆฅ็)</summary>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
309 |
|
310 |
+
- [MCTS](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/mcts_overview.pdf)
|
311 |
+
- [AlphaZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/alphazero_overview.pdf)
|
312 |
+
- [MuZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/muzero_overview.png)
|
313 |
+
- [EfficientZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/efficientzero_overview.png)
|
314 |
+
- [SampledMuZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/sampled_muzero_overview.png)
|
315 |
+
- [GumbelMuZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/gumbel_muzero_overview.png)
|
316 |
+
- [StochasticMuZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/stochastic_muzero_overview.png)
|
317 |
|
318 |
</details>
|
319 |
|
|
|
324 |
### ้่ฆ่ฎบๆ
|
325 |
|
326 |
<details closed>
|
327 |
+
<summary>(็นๅปๆฅ็)</summary>
|
328 |
|
329 |
#### LightZero Implemented series
|
330 |
|
|
|
368 |
### ๅ
ถไป่ฎบๆ
|
369 |
|
370 |
<details closed>
|
371 |
+
<summary>(็นๅปๆฅ็)</summary>
|
372 |
|
373 |
#### ICML
|
374 |
- [Scalable Safe Policy Improvement via Monte Carlo Tree Search](https://openreview.net/pdf?id=tevbBSzSfK) 2023
|
|
|
528 |
- [Sample-Efficient Neural Architecture Search by Learning Actions for Monte Carlo Tree Search](https://arxiv.org/pdf/1906.06832) IEEE Transactions on Pattern Analysis and Machine Intelligence 2022.
|
529 |
</details>
|
530 |
|
531 |
+
## ๐ฌ ๅ้ฆๆ่งๅ่ดก็ฎ
|
532 |
- ๆไปปไฝ็้ฎๆๆ่ง้ฝๅฏไปฅๅจ github ไธ็ดๆฅ [ๆๅบ issue](https://github.com/opendilab/LightZero/issues/new/choose)
|
533 |
+
- ๅผๅฏๆๅๅ [GitHub ่ฎบๅ](https://github.com/opendilab/LightZero/discussions)
|
534 |
+
- ๅจ LightZero [discord server](https://discord.gg/qZTQTycu) ไธ่ฟ่ก่ฎจ่ฎบ
|
535 |
- ๆ๏ฟฝ๏ฟฝ่็ณปๆไปฌ็้ฎ็ฎฑ (opendilab@pjlab.org.cn)
|
536 |
|
537 |
- ๆ่ฐขๆๆ็ๅ้ฆๆ่ง๏ผๅ
ๆฌๅฏน็ฎๆณๅ็ณป็ป่ฎพ่ฎกใ่ฟไบๅ้ฆๆ่งๅๅปบ่ฎฎ้ฝไผ่ฎฉ LightZero ๅๅพๆดๅฅฝใ
|
538 |
|
539 |
|
540 |
+
## ๐ ๅผ็จ
|
541 |
|
542 |
```latex
|
543 |
+
@article{niu2024lightzero,
|
544 |
+
title={LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios},
|
545 |
+
author={Niu, Yazhe and Pu, Yuan and Yang, Zhenjie and Li, Xueyan and Zhou, Tong and Ren, Jiyuan and Hu, Shuai and Li, Hongsheng and Liu, Yu},
|
546 |
+
journal={Advances in Neural Information Processing Systems},
|
547 |
+
volume={36},
|
548 |
+
year={2024}
|
549 |
+
}
|
550 |
+
|
551 |
+
@article{pu2024unizero,
|
552 |
+
title={UniZero: Generalized and Efficient Planning with Scalable Latent World Models},
|
553 |
+
author={Pu, Yuan and Niu, Yazhe and Ren, Jiyuan and Yang, Zhenjie and Li, Hongsheng and Liu, Yu},
|
554 |
+
journal={arXiv preprint arXiv:2406.10667},
|
555 |
+
year={2024}
|
556 |
+
}
|
557 |
+
|
558 |
+
@article{xuan2024rezero,
|
559 |
+
title={ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze},
|
560 |
+
author={Xuan, Chunyu and Niu, Yazhe and Pu, Yuan and Hu, Shuai and Liu, Yu and Yang, Jing},
|
561 |
+
journal={arXiv preprint arXiv:2404.16364},
|
562 |
+
year={2024}
|
563 |
}
|
564 |
```
|
565 |
|
566 |
+
## ๐ ่ด่ฐข
|
567 |
ๆญค็ฎๆณๅบ็ๅฎ็ฐ้จๅๅบไบไปฅไธ GitHub ไปๅบ๏ผ้ๅธธๆ่ฐข่ฟไบๅผๅๆงๅทฅไฝ๏ผ
|
568 |
- https://github.com/opendilab/DI-engine
|
569 |
- https://github.com/deepmind/mctx
|
|
|
578 |
<img src="https://contrib.rocks/image?repo=opendilab/LightZero" />
|
579 |
</a>
|
580 |
|
581 |
+
## ๐ท๏ธ ่ฎธๅฏ่ฏ
|
582 |
|
583 |
ๆฌไปๅบไธญ็ๆๆไปฃ็ ้ฝ็ฌฆๅ [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)ใ
|
584 |
|