่’ฒๆบ commited on
Commit
ac6c422
ยท
1 Parent(s): f62e186

polish(pu): use HuggingFace default embedding_model, update lightzero readme

Browse files
app_mqa_database.py CHANGED
@@ -106,7 +106,7 @@ def close_db_connection():
106
 
107
 
108
  chunks = load_and_split_document(file_path, chunk_size=5000, chunk_overlap=500)
109
- vectorstore = create_vector_store(chunks, model='OpenAI')
110
 
111
  # ๅŠ ่ฝฝ้ข„่ฎญ็ปƒ็š„SBERTๆจกๅž‹
112
  sbert_model = SentenceTransformer('all-MiniLM-L6-v2')
 
106
 
107
 
108
  chunks = load_and_split_document(file_path, chunk_size=5000, chunk_overlap=500)
109
+ vectorstore = create_vector_store(chunks, model='HuggingFace')
110
 
111
  # ๅŠ ่ฝฝ้ข„่ฎญ็ปƒ็š„SBERTๆจกๅž‹
112
  sbert_model = SentenceTransformer('all-MiniLM-L6-v2')
documents/LightZero_README.md CHANGED
@@ -26,14 +26,17 @@
26
  [![GitHub pulls](https://img.shields.io/github/issues-pr/opendilab/LightZero)](https://github.com/opendilab/LightZero/pulls)
27
  [![Contributors](https://img.shields.io/github/contributors/opendilab/LightZero)](https://github.com/opendilab/LightZero/graphs/contributors)
28
  [![GitHub license](https://img.shields.io/github/license/opendilab/LightZero)](https://github.com/opendilab/LightZero/blob/master/LICENSE)
 
29
 
30
- Updated on 2024.03.15 LightZero-v0.0.4
31
 
32
- > LightZero is a lightweight, efficient, and easy-to-understand open-source algorithm toolkit that combines Monte Carlo Tree Search (MCTS) and Deep Reinforcement Learning (RL).
33
 
34
- English | [็ฎ€ไฝ“ไธญๆ–‡(Simplified Chinese)](https://github.com/opendilab/LightZero/blob/main/README.zh.md) | [Paper](https://arxiv.org/pdf/2310.08348.pdf)
 
35
 
36
- ## Background
 
37
 
38
  The integration of Monte Carlo Tree Search and Deep Reinforcement Learning,
39
  exemplified by AlphaZero and MuZero,
@@ -42,9 +45,9 @@ This advanced methodology has also made significant strides in scientific domain
42
  The following is an overview of the historical evolution of the Monte Carlo Tree Search algorithm series:
43
  ![pipeline](assets/mcts_rl_evolution_overview.png)
44
 
45
- ## Overview
46
 
47
- **LightZero** is an open-source algorithm toolkit that combines MCTS and RL for PyTorch. It provides support for a range of MCTS-based RL algorithms and applications with the following advantages:
48
  - Lightweight.
49
  - Efficient.
50
  - Easy-to-understand.
@@ -62,6 +65,7 @@ For further details, please refer to [Features](#features), [Framework Structure
62
  - [Integrated Algorithms](#integrated-algorithms)
63
  - [Installation](#installation)
64
  - [Quick Start](#quick-start)
 
65
  - [Benchmark](#benchmark)
66
  - [Awesome-MCTS Notes](#awesome-mcts-notes)
67
  - [Paper Notes](#paper-notes)
@@ -74,7 +78,7 @@ For further details, please refer to [Features](#features), [Framework Structure
74
  - [Acknowledgments](#acknowledgments)
75
  - [License](#license)
76
 
77
- ### Features
78
 
79
  **Lightweight**: LightZero integrates multiple MCTS algorithm families and can solve decision-making problems with various attributes in a lightweight framework. The algorithms and environments LightZero implemented can be found [here](#integrated-algorithms).
80
 
@@ -82,7 +86,7 @@ For further details, please refer to [Features](#features), [Framework Structure
82
 
83
  **Easy-to-understand**: LightZero provides detailed documentation and algorithm framework diagrams for all integrated algorithms to help users understand the algorithm's core and compare the differences and similarities between algorithms under the same paradigm. LightZero also provides function call graphs and network structure diagrams for algorithm code implementation, making it easier for users to locate critical code. All the documentation can be found [here](#paper-notes).
84
 
85
- ### Framework Structure
86
 
87
  [comment]: <> (<p align="center">)
88
 
@@ -109,7 +113,7 @@ The above picture is the framework pipeline of LightZero. We briefly introduce t
109
 
110
  For the file structure of LightZero, please refer to [lightzero_file_structure](https://github.com/opendilab/LightZero/blob/main/assets/lightzero_file_structure.svg).
111
 
112
- ### Integrated Algorithms
113
  LightZero is a library with a [PyTorch](https://pytorch.org/) implementation of MCTS algorithms (sometimes combined with cython and cpp), including:
114
  - [AlphaZero](https://www.science.org/doi/10.1126/science.aar6404)
115
  - [MuZero](https://arxiv.org/abs/1911.08265)
@@ -117,25 +121,33 @@ LightZero is a library with a [PyTorch](https://pytorch.org/) implementation of
117
  - [Stochastic MuZero](https://openreview.net/pdf?id=X6D9bAHhBQ1)
118
  - [EfficientZero](https://arxiv.org/abs/2111.00210)
119
  - [Gumbel MuZero](https://openreview.net/pdf?id=bERaNdoegnO&)
 
 
120
 
121
  The environments and algorithms currently supported by LightZero are shown in the table below:
122
 
123
- | Env./Algo. | AlphaZero | MuZero | EfficientZero | Sampled EfficientZero | Gumbel MuZero | Stochastic MuZero |
124
- |---------------| --------- | ------ |-------------| ------------------ | ---------- |----------------|
125
- | TicTacToe | โœ” | โœ” | ๐Ÿ”’ | ๐Ÿ”’ | โœ” | ๐Ÿ”’ |
126
- | Gomoku | โœ” | โœ” | ๐Ÿ”’ | ๐Ÿ”’ | โœ” | ๐Ÿ”’ |
127
- | Connect4 | โœ” | โœ” | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ |
128
- | 2048 | โœ” | โœ” | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | โœ” |
129
- | Chess | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ |
130
- | Go | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ |
131
- | CartPole | --- | โœ” | โœ” | โœ” | โœ” | โœ” |
132
- | Pendulum | --- | โœ” | โœ” | โœ” | โœ” | โœ” |
133
- | LunarLander | --- | โœ” | โœ” | โœ” | โœ” | โœ” |
134
- | BipedalWalker | --- | โœ” | โœ” | โœ” | โœ” | ๐Ÿ”’ |
135
- | Atari | --- | โœ” | โœ” | โœ” | โœ” | โœ” |
136
- | MuJoCo | --- | โœ” | โœ” | โœ” | ๐Ÿ”’ | ๐Ÿ”’ |
137
- | MiniGrid | --- | โœ” | โœ” | โœ” | ๐Ÿ”’ | ๐Ÿ”’ |
138
- | Bsuite | --- | โœ” | โœ” | โœ” | ๐Ÿ”’ | ๐Ÿ”’ |
 
 
 
 
 
 
139
 
140
  <sup>(1): "โœ”" means that the corresponding item is finished and well-tested.</sup>
141
 
@@ -144,7 +156,7 @@ The environments and algorithms currently supported by LightZero are shown in th
144
  <sup>(3): "---" means that this algorithm doesn't support this environment.</sup>
145
 
146
 
147
- ## Installation
148
 
149
  You can install the latest LightZero in development from the GitHub source codes with the following command:
150
 
@@ -158,7 +170,7 @@ Kindly note that LightZero currently supports compilation only on `Linux` and `m
158
  We are actively working towards extending this support to the `Windows` platform.
159
  Your patience during this transition is greatly appreciated.
160
 
161
- ## Installation with Docker
162
 
163
  We also provide a Dockerfile that sets up an environment with all dependencies needed to run the LightZero library. This Docker image is based on Ubuntu 20.04 and installs Python 3.8, along with other necessary tools and libraries.
164
  Here's how to use our Dockerfile to build a Docker image, run a container from this image, and execute LightZero code inside the container.
@@ -184,7 +196,7 @@ Here's how to use our Dockerfile to build a Docker image, run a container from t
184
 
185
  [comment]: <> (- [AlphaGo Zero]&#40;https://www.nature.com/articles/nature24270&#41; )
186
 
187
- ## Quick Start
188
 
189
  Train a MuZero agent to play [CartPole](https://gymnasium.farama.org/environments/classic_control/cart_pole/):
190
 
@@ -207,18 +219,30 @@ cd LightZero
207
  python3 -u zoo/board_games/tictactoe/config/tictactoe_muzero_bot_mode_config.py
208
  ```
209
 
210
- ## Customization Documentation
 
 
 
 
 
 
 
 
 
 
211
 
212
- For those looking to tailor environments and algorithms, we offer comprehensive guides:
213
 
214
- - **Environments:** [Customize Environments](https://github.com/opendilab/LightZero/blob/main/docs/source/tutorials/envs/customize_envs.md)
215
- - **Algorithms:** [Customize Algorithms](https://github.com/opendilab/LightZero/blob/main/docs/source/tutorials/algos/customize_algos.md)
 
 
216
 
217
  Should you have any questions, feel free to contact us for support.
218
 
219
- ## Benchmark
220
 
221
- <details open><summary>Click to collapse</summary>
222
 
223
  - Below are the benchmark results of [AlphaZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/alphazero.py) and [MuZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/muzero.py) on three board games: [TicTacToe](https://github.com/opendilab/LightZero/blob/main/zoo/board_games/tictactoe/envs/tictactoe_env.py), [Connect4](https://github.com/opendilab/LightZero/blob/main/zoo/board_games/connect4/envs/connect4_env.py), [Gomoku](https://github.com/opendilab/LightZero/blob/main/zoo/board_games/gomoku/envs/gomoku_env.py).
224
  <p align="center">
@@ -273,7 +297,7 @@ and two MuJoCo continuous action space games: [Hopper-v3](https://github.com/ope
273
  </details>
274
 
275
 
276
- ## Awesome-MCTS Notes
277
 
278
  ### Paper Notes
279
  The following are the detailed paper notes (in Chinese) of the above algorithms:
@@ -291,6 +315,8 @@ The following are the detailed paper notes (in Chinese) of the above algorithms:
291
 
292
  </details>
293
 
 
 
294
  ### Algo. Overview
295
 
296
  The following are the overview MCTS principle diagrams of the above algorithms:
@@ -299,10 +325,11 @@ The following are the overview MCTS principle diagrams of the above algorithms:
299
 
300
  - [MCTS](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/mcts_overview.pdf)
301
  - [AlphaZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/alphazero_overview.pdf)
302
- - [MuZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/muzero_overview.pdf)
303
- - [EfficientZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/efficientzero_overview.pdf)
304
- - [SampledMuZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/sampled_muzero_overview.pdf)
305
- - [GumbelMuZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/gumbel_muzero_overview.pdf)
 
306
 
307
  </details>
308
 
@@ -335,6 +362,7 @@ Here is a collection of research papers about **Monte Carlo Tree Search**.
335
  - [2022 Online and Offline Reinforcement Learning by Planning with a Learned Model](https://arxiv.org/abs/2104.06294)
336
  - [2021 Vector Quantized Models for Planning](https://arxiv.org/abs/2106.04615)
337
  - [2021 Muesli: Combining Improvements in Policy Optimization. ](https://arxiv.org/abs/2104.06159)
 
338
  #### MCTS Analysis
339
  - [2020 Monte-Carlo Tree Search as Regularized Policy Optimization](https://arxiv.org/abs/2007.12509)
340
  - [2021 Self-Consistent Models and Values](https://arxiv.org/abs/2110.12840)
@@ -482,12 +510,12 @@ Here is a collection of research papers about **Monte Carlo Tree Search**.
482
  - ExpEnv: synthetic functions for nonlinear optimization, reinforcement learning problems in MuJoCo locomotion environments, and optimization problems in Neural Architecture Search (NAS).
483
  - [Monte Carlo Tree Search based Variable Selection for High Dimensional Bayesian Optimization](https://openreview.net/pdf?id=SUzPos_pUC) 2022
484
  - Lei Songโˆ— , Ke Xueโˆ— , Xiaobin Huang, Chao Qian
485
- - Key: a low-dimensional subspace via MCTS, optimizes in the subspace with any Bayesian optimization algorithm.
486
  - ExpEnv: NAS-bench problems and MuJoCo locomotion
487
  - [Monte Carlo Tree Search With Iteratively Refining State Abstractions](https://proceedings.neurips.cc/paper/2021/file/9b0ead00a217ea2c12e06a72eec4923f-Paper.pdf) 2021
488
  - Samuel Sokota, Caleb Ho, Zaheen Ahmad, J. Zico Kolter
489
  - Key: stochastic environments, Progressive widening, abstraction refining
490
- - ExpEnv: Blackjack, Trap, five by five Go.
491
  - [Deep Synoptic Monte Carlo Planning in Reconnaissance Blind Chess](https://proceedings.neurips.cc/paper/2021/file/215a71a12769b056c3c32e7299f1c5ed-Paper.pdf) 2021
492
  - Gregory Clark
493
  - Key: imperfect information, belief state with an unweighted particle filter, a novel stochastic abstraction of information states.
@@ -512,8 +540,11 @@ Here is a collection of research papers about **Monte Carlo Tree Search**.
512
  </details>
513
 
514
 
515
- ## Feedback and Contribution
 
516
  - [File an issue](https://github.com/opendilab/LightZero/issues/new/choose) on Github
 
 
517
  - Contact our email (opendilab@pjlab.org.cn)
518
 
519
  - We appreciate all the feedback and contributions to improve LightZero, both algorithms and system designs.
@@ -523,19 +554,32 @@ Here is a collection of research papers about **Monte Carlo Tree Search**.
523
  [comment]: <> (And `CONTRIBUTING.md` offers some necessary information.)
524
 
525
 
526
- ## Citation
527
  ```latex
528
- @misc{lightzero,
529
- title={LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios},
530
- author={Yazhe Niu and Yuan Pu and Zhenjie Yang and Xueyan Li and Tong Zhou and Jiyuan Ren and Shuai Hu and Hongsheng Li and Yu Liu},
531
- year={2023},
532
- eprint={2310.08348},
533
- archivePrefix={arXiv},
534
- primaryClass={cs.LG}
 
 
 
 
 
 
 
 
 
 
 
 
 
535
  }
536
  ```
537
 
538
- ## Acknowledgments
539
 
540
  This project has been developed partially based on the following pioneering works on GitHub repositories.
541
  We express our profound gratitude for these foundational resources:
@@ -553,7 +597,7 @@ Thanks to all who contributed to this project:
553
  </a>
554
 
555
 
556
- ## License
557
  All code within this repository is under [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
558
 
559
  <p align="right">(<a href="#top">Back to top</a>)</p>
 
26
  [![GitHub pulls](https://img.shields.io/github/issues-pr/opendilab/LightZero)](https://github.com/opendilab/LightZero/pulls)
27
  [![Contributors](https://img.shields.io/github/contributors/opendilab/LightZero)](https://github.com/opendilab/LightZero/graphs/contributors)
28
  [![GitHub license](https://img.shields.io/github/license/opendilab/LightZero)](https://github.com/opendilab/LightZero/blob/master/LICENSE)
29
+ [![discord badge](https://dcbadge.vercel.app/api/server/dkZS2JF56X?style=flat)](https://discord.gg/dkZS2JF56X)
30
 
31
+ Updated on 2024.08.18 LightZero-v0.1.0
32
 
33
+ English | [็ฎ€ไฝ“ไธญๆ–‡(Simplified Chinese)](https://github.com/opendilab/LightZero/blob/main/README.zh.md) | [Documentation](https://opendilab.github.io/LightZero) | [LightZero Paper](https://arxiv.org/abs/2310.08348) | [๐Ÿ”ฅUniZero Paper](https://arxiv.org/abs/2406.10667) | [๐Ÿ”ฅReZero Paper](https://arxiv.org/abs/2404.16364)
34
 
35
+ > LightZero is a lightweight, efficient, and easy-to-understand open-source algorithm toolkit that combines Monte Carlo Tree Search (MCTS) and Deep Reinforcement Learning (RL).
36
+ > For any questions about LightZero, you can consult the RAG-based Q&A assistant: [ZeroPal](https://huggingface.co/spaces/OpenDILabCommunity/ZeroPal).
37
 
38
+
39
+ ## ๐Ÿ” Background
40
 
41
  The integration of Monte Carlo Tree Search and Deep Reinforcement Learning,
42
  exemplified by AlphaZero and MuZero,
 
45
  The following is an overview of the historical evolution of the Monte Carlo Tree Search algorithm series:
46
  ![pipeline](assets/mcts_rl_evolution_overview.png)
47
 
48
+ ## ๐ŸŽจ Overview
49
 
50
+ **LightZero** is an open-source algorithm toolkit that combines Monte Carlo Tree Search (MCTS) and Reinforcement Learning (RL) for PyTorch. It supports a range of MCTS-based RL algorithms and applications, offering several key advantages:
51
  - Lightweight.
52
  - Efficient.
53
  - Easy-to-understand.
 
65
  - [Integrated Algorithms](#integrated-algorithms)
66
  - [Installation](#installation)
67
  - [Quick Start](#quick-start)
68
+ - [Documentation](#documentation)
69
  - [Benchmark](#benchmark)
70
  - [Awesome-MCTS Notes](#awesome-mcts-notes)
71
  - [Paper Notes](#paper-notes)
 
78
  - [Acknowledgments](#acknowledgments)
79
  - [License](#license)
80
 
81
+ ### ๐Ÿ’ฅ Features
82
 
83
  **Lightweight**: LightZero integrates multiple MCTS algorithm families and can solve decision-making problems with various attributes in a lightweight framework. The algorithms and environments LightZero implemented can be found [here](#integrated-algorithms).
84
 
 
86
 
87
  **Easy-to-understand**: LightZero provides detailed documentation and algorithm framework diagrams for all integrated algorithms to help users understand the algorithm's core and compare the differences and similarities between algorithms under the same paradigm. LightZero also provides function call graphs and network structure diagrams for algorithm code implementation, making it easier for users to locate critical code. All the documentation can be found [here](#paper-notes).
88
 
89
+ ### ๐Ÿงฉ Framework Structure
90
 
91
  [comment]: <> (<p align="center">)
92
 
 
113
 
114
  For the file structure of LightZero, please refer to [lightzero_file_structure](https://github.com/opendilab/LightZero/blob/main/assets/lightzero_file_structure.svg).
115
 
116
+ ### ๐ŸŽ Integrated Algorithms
117
  LightZero is a library with a [PyTorch](https://pytorch.org/) implementation of MCTS algorithms (sometimes combined with cython and cpp), including:
118
  - [AlphaZero](https://www.science.org/doi/10.1126/science.aar6404)
119
  - [MuZero](https://arxiv.org/abs/1911.08265)
 
121
  - [Stochastic MuZero](https://openreview.net/pdf?id=X6D9bAHhBQ1)
122
  - [EfficientZero](https://arxiv.org/abs/2111.00210)
123
  - [Gumbel MuZero](https://openreview.net/pdf?id=bERaNdoegnO&)
124
+ - [ReZero](https://arxiv.org/abs/2404.16364)
125
+ - [UniZero](https://arxiv.org/abs/2406.10667)
126
 
127
  The environments and algorithms currently supported by LightZero are shown in the table below:
128
 
129
+
130
+ | Env./Algo. | AlphaZero | MuZero | Sampled MuZero | EfficientZero | Sampled EfficientZero | Gumbel MuZero | Stochastic MuZero | UniZero | Sampled UniZero | ReZero |
131
+ |------------------------| -------- | ---- |---------------| ---------- | ------------------ | ------------- | ---------------- | ------- | --- | ------ |
132
+ | TicTacToe | โœ” | โœ” | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | โœ” | ๐Ÿ”’ | โœ” | ๐Ÿ”’ | ๐Ÿ”’ |
133
+ | Gomoku | โœ” | โœ” | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | โœ” | ๐Ÿ”’ | โœ” | ๐Ÿ”’ | โœ” |
134
+ | Connect4 | โœ” | โœ” | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | โœ” | ๐Ÿ”’ | โœ” |
135
+ | 2048 | --- | โœ” | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | โœ” | โœ” | ๐Ÿ”’ | ๐Ÿ”’ |
136
+ | Chess | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ |
137
+ | Go | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ |
138
+ | CartPole | --- | โœ” | ๐Ÿ”’ | โœ” | โœ” | โœ” | โœ” | โœ” | ๐Ÿ”’ | โœ” |
139
+ | Pendulum | --- | โœ” | โœ” | โœ” | โœ” | โœ” | โœ” | ๐Ÿ”’ | โœ” | ๐Ÿ”’ |
140
+ | LunarLander | --- | โœ” | โœ” | โœ” | โœ” | โœ” | โœ” | โœ” | โœ” | ๐Ÿ”’ |
141
+ | BipedalWalker | --- | โœ” | โœ” | โœ” | โœ” | โœ” | ๐Ÿ”’ | ๐Ÿ”’ | โœ” | ๐Ÿ”’ |
142
+ | Atari | --- | โœ” | ๐Ÿ”’ | โœ” | โœ” | โœ” | โœ” | โœ” | ๐Ÿ”’ | โœ” |
143
+ | DeepMind Control | --- | --- | โœ” | --- | โœ” | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | โœ” | ๐Ÿ”’ |
144
+ | MuJoCo | --- | โœ” | ๐Ÿ”’ | โœ” | โœ” | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ |
145
+ | MiniGrid | --- | โœ” | ๐Ÿ”’ | โœ” | โœ” | ๐Ÿ”’ | ๐Ÿ”’ | โœ” | ๐Ÿ”’ | ๐Ÿ”’ |
146
+ | Bsuite | --- | โœ” | ๐Ÿ”’ | โœ” | โœ” | ๐Ÿ”’ | ๐Ÿ”’ | โœ” | ๐Ÿ”’ | ๐Ÿ”’ |
147
+ | Memory | --- | โœ” | ๐Ÿ”’ | โœ” | โœ” | ๐Ÿ”’ | ๐Ÿ”’ | โœ” | ๐Ÿ”’ | ๐Ÿ”’ |
148
+ | SumToThree (billiards) | --- | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | โœ” | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ |
149
+ | MetaDrive | --- | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | โœ” | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ |๐Ÿ”’ |
150
+
151
 
152
  <sup>(1): "โœ”" means that the corresponding item is finished and well-tested.</sup>
153
 
 
156
  <sup>(3): "---" means that this algorithm doesn't support this environment.</sup>
157
 
158
 
159
+ ## โš™๏ธ Installation
160
 
161
  You can install the latest LightZero in development from the GitHub source codes with the following command:
162
 
 
170
  We are actively working towards extending this support to the `Windows` platform.
171
  Your patience during this transition is greatly appreciated.
172
 
173
+ ### Installation with Docker
174
 
175
  We also provide a Dockerfile that sets up an environment with all dependencies needed to run the LightZero library. This Docker image is based on Ubuntu 20.04 and installs Python 3.8, along with other necessary tools and libraries.
176
  Here's how to use our Dockerfile to build a Docker image, run a container from this image, and execute LightZero code inside the container.
 
196
 
197
  [comment]: <> (- [AlphaGo Zero]&#40;https://www.nature.com/articles/nature24270&#41; )
198
 
199
+ ## ๐Ÿš€ Quick Start
200
 
201
  Train a MuZero agent to play [CartPole](https://gymnasium.farama.org/environments/classic_control/cart_pole/):
202
 
 
219
  python3 -u zoo/board_games/tictactoe/config/tictactoe_muzero_bot_mode_config.py
220
  ```
221
 
222
+ Train a UniZero agent to play [Pong](http
223
+ g/):
224
+
225
+ ```bash
226
+ cd LightZero
227
+ python3 -u zoo/atari/config/atari_unizero_config.py
228
+ ```
229
+
230
+ ## ๐Ÿ“š Documentation
231
+
232
+ The LightZero documentation can be found [here](https://opendilab.github.io/LightZero/). It contains tutorials and the API reference.
233
 
234
+ For those interested in customizing environments and algorithms, we provide relevant guides:
235
 
236
+ - [Customize Environments](https://github.com/opendilab/LightZero/blob/main/docs/source//tutorials/envs/customize_envs.md)
237
+ - [Customize Algorithms](https://github.com/opendilab/LightZero/blob/main/docs/source//tutorials/algos/customize_algos.md)
238
+ - [How to Set Configuration Files?](https://github.com/opendilab/LightZero/blob/main/docs/source//tutorials/config/config.md)
239
+ - [Logging and Monitoring System](https://github.com/opendilab/LightZero/blob/main/docs/source//tutorials/logs/logs.md)
240
 
241
  Should you have any questions, feel free to contact us for support.
242
 
243
+ ## ๐Ÿ“Š Benchmark
244
 
245
+ <details><summary>Click to expand</summary>
246
 
247
  - Below are the benchmark results of [AlphaZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/alphazero.py) and [MuZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/muzero.py) on three board games: [TicTacToe](https://github.com/opendilab/LightZero/blob/main/zoo/board_games/tictactoe/envs/tictactoe_env.py), [Connect4](https://github.com/opendilab/LightZero/blob/main/zoo/board_games/connect4/envs/connect4_env.py), [Gomoku](https://github.com/opendilab/LightZero/blob/main/zoo/board_games/gomoku/envs/gomoku_env.py).
248
  <p align="center">
 
297
  </details>
298
 
299
 
300
+ ## ๐Ÿ“ Awesome-MCTS Notes
301
 
302
  ### Paper Notes
303
  The following are the detailed paper notes (in Chinese) of the above algorithms:
 
315
 
316
  </details>
317
 
318
+ You can also refer to the relevant Zhihu column (in Chinese): [In-depth Analysis of MCTS+RL Frontier Theories and Applications](https://www.zhihu.com/column/c_1764308735227662336).
319
+
320
  ### Algo. Overview
321
 
322
  The following are the overview MCTS principle diagrams of the above algorithms:
 
325
 
326
  - [MCTS](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/mcts_overview.pdf)
327
  - [AlphaZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/alphazero_overview.pdf)
328
+ - [MuZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/muzero_overview.png)
329
+ - [EfficientZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/efficientzero_overview.png)
330
+ - [SampledMuZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/sampled_muzero_overview.png)
331
+ - [GumbelMuZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/gumbel_muzero_overview.png)
332
+ - [StochasticMuZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/stochastic_muzero_overview.png)
333
 
334
  </details>
335
 
 
362
  - [2022 Online and Offline Reinforcement Learning by Planning with a Learned Model](https://arxiv.org/abs/2104.06294)
363
  - [2021 Vector Quantized Models for Planning](https://arxiv.org/abs/2106.04615)
364
  - [2021 Muesli: Combining Improvements in Policy Optimization. ](https://arxiv.org/abs/2104.06159)
365
+
366
  #### MCTS Analysis
367
  - [2020 Monte-Carlo Tree Search as Regularized Policy Optimization](https://arxiv.org/abs/2007.12509)
368
  - [2021 Self-Consistent Models and Values](https://arxiv.org/abs/2110.12840)
 
510
  - ExpEnv: synthetic functions for nonlinear optimization, reinforcement learning problems in MuJoCo locomotion environments, and optimization problems in Neural Architecture Search (NAS).
511
  - [Monte Carlo Tree Search based Variable Selection for High Dimensional Bayesian Optimization](https://openreview.net/pdf?id=SUzPos_pUC) 2022
512
  - Lei Songโˆ— , Ke Xueโˆ— , Xiaobin Huang, Chao Qian
513
+ - Key: a low-dimensional subspace via MCTS, optimizes in the subspace with any Bayesian optimization algorithm.
514
  - ExpEnv: NAS-bench problems and MuJoCo locomotion
515
  - [Monte Carlo Tree Search With Iteratively Refining State Abstractions](https://proceedings.neurips.cc/paper/2021/file/9b0ead00a217ea2c12e06a72eec4923f-Paper.pdf) 2021
516
  - Samuel Sokota, Caleb Ho, Zaheen Ahmad, J. Zico Kolter
517
  - Key: stochastic environments, Progressive widening, abstraction refining
518
+ - ExpEnv: Blackjack, Trap, five by five Go.
519
  - [Deep Synoptic Monte Carlo Planning in Reconnaissance Blind Chess](https://proceedings.neurips.cc/paper/2021/file/215a71a12769b056c3c32e7299f1c5ed-Paper.pdf) 2021
520
  - Gregory Clark
521
  - Key: imperfect information, belief state with an unweighted particle filter, a novel stochastic abstraction of information states.
 
540
  </details>
541
 
542
 
543
+ ## ๐Ÿ’ฌ Feedback and Contribution
544
+
545
  - [File an issue](https://github.com/opendilab/LightZero/issues/new/choose) on Github
546
+ - Open or participate in our [discussion forum](https://github.com/opendilab/LightZero/discussions)
547
+ - Discuss on LightZero [discord server](https://discord.gg/dkZS2JF56X)
548
  - Contact our email (opendilab@pjlab.org.cn)
549
 
550
  - We appreciate all the feedback and contributions to improve LightZero, both algorithms and system designs.
 
554
  [comment]: <> (And `CONTRIBUTING.md` offers some necessary information.)
555
 
556
 
557
+ ## ๐ŸŒ Citation
558
  ```latex
559
+ @article{niu2024lightzero,
560
+ title={LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios},
561
+ author={Niu, Yazhe and Pu, Yuan and Yang, Zhenjie and Li, Xueyan and Zhou, Tong and Ren, Jiyuan and Hu, Shuai and Li, Hongsheng and Liu, Yu},
562
+ journal={Advances in Neural Information Processing Systems},
563
+ volume={36},
564
+ year={2024}
565
+ }
566
+
567
+ @article{pu2024unizero,
568
+ title={UniZero: Generalized and Efficient Planning with Scalable Latent World Models},
569
+ author={Pu, Yuan and Niu, Yazhe and Ren, Jiyuan and Yang, Zhenjie and Li, Hongsheng and Liu, Yu},
570
+ journal={arXiv preprint arXiv:2406.10667},
571
+ year={2024}
572
+ }
573
+
574
+ @article{xuan2024rezero,
575
+ title={ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze},
576
+ author={Xuan, Chunyu and Niu, Yazhe and Pu, Yuan and Hu, Shuai and Liu, Yu and Yang, Jing},
577
+ journal={arXiv preprint arXiv:2404.16364},
578
+ year={2024}
579
  }
580
  ```
581
 
582
+ ## ๐Ÿ’“ Acknowledgments
583
 
584
  This project has been developed partially based on the following pioneering works on GitHub repositories.
585
  We express our profound gratitude for these foundational resources:
 
597
  </a>
598
 
599
 
600
+ ## ๐Ÿท๏ธ License
601
  All code within this repository is under [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
602
 
603
  <p align="right">(<a href="#top">Back to top</a>)</p>
documents/LightZero_README_zh.md CHANGED
@@ -27,18 +27,20 @@
27
  [![Contributors](https://img.shields.io/github/contributors/opendilab/LightZero)](https://github.com/opendilab/LightZero/graphs/contributors)
28
  [![GitHub license](https://img.shields.io/github/license/opendilab/LightZero)](https://github.com/opendilab/LightZero/blob/master/LICENSE)
29
 
30
- ๆœ€่ฟ‘ๆ›ดๆ–ฐไบŽ 2024.03.15 LightZero-v0.0.4
 
 
31
 
32
  > LightZero ๆ˜ฏไธ€ไธช่ฝป้‡ใ€้ซ˜ๆ•ˆใ€ๆ˜“ๆ‡‚็š„ MCTS+RL ๅผ€ๆบ็ฎ—ๆณ•ๅบ“ใ€‚
 
33
 
34
- [English](https://github.com/opendilab/LightZero/blob/main/README.md) | ็ฎ€ไฝ“ไธญๆ–‡ | [่ฎบๆ–‡้“พๆŽฅ](https://arxiv.org/pdf/2310.08348.pdf)
35
 
36
- ## ่ƒŒๆ™ฏ
37
 
38
  ไปฅ AlphaZero, MuZero ไธบไปฃ่กจ็š„็ป“ๅˆ่’™็‰นๅกๆด›ๆ ‘ๆœ็ดข (Monte Carlo Tree Search, MCTS) ๅ’ŒๆทฑๅบฆๅผบๅŒ–ๅญฆไน  (Deep Reinforcemeent Learning, DRL) ็š„ๆ–นๆณ•๏ผŒๅœจ่ฏธๅฆ‚ๅ›ดๆฃ‹๏ผŒAtari ็ญ‰ๅ„็งๆธธๆˆไธŠๅ–ๅพ—ไบ†่ถ…ไบบ็š„ๆฐดๅนณ๏ผŒไนŸๅœจ่ฏธๅฆ‚่›‹็™ฝ่ดจ็ป“ๆž„้ข„ๆต‹๏ผŒ็Ÿฉ้˜ตไน˜ๆณ•็ฎ—ๆณ•ๅฏปๆ‰พ็ญ‰็ง‘ๅญฆ้ข†ๅŸŸๅ–ๅพ—ไบ†ๅฏๅ–œ็š„่ฟ›ๅฑ•ใ€‚ไธ‹ๅ›พไธบ่’™็‰นๅกๆด›ๆ ‘ๆœ็ดข๏ผˆMCTS๏ผ‰็ฎ—ๆณ•ๆ—็š„ๅ‘ๅฑ•ๅŽ†ๅฒ๏ผš
39
  ![pipeline](assets/mcts_rl_evolution_overview.png)
40
 
41
- ## ๆฆ‚่งˆ
42
 
43
  **LightZero** ๆ˜ฏไธ€ไธช็ป“ๅˆไบ†่’™็‰นๅกๆด›ๆ ‘ๆœ็ดขๅ’ŒๅผบๅŒ–ๅญฆไน ็š„ๅผ€ๆบ็ฎ—ๆณ•ๅทฅๅ…ทๅŒ…ใ€‚ ๅฎƒๆ”ฏๆŒไธ€็ณปๅˆ—ๅŸบไบŽ MCTS ็š„ RL ็ฎ—ๆณ•๏ผŒๅ…ทๆœ‰ไปฅไธ‹ไผ˜็‚น๏ผš
44
  - ่ฝป้‡ใ€‚
@@ -57,6 +59,7 @@
57
  - [้›†ๆˆ็ฎ—ๆณ•](#้›†ๆˆ็ฎ—ๆณ•)
58
  - [ๅฎ‰่ฃ…ๆ–นๆณ•](#ๅฎ‰่ฃ…ๆ–นๆณ•)
59
  - [ๅฟซ้€Ÿๅผ€ๅง‹](#ๅฟซ้€Ÿๅผ€ๅง‹)
 
60
  - [ๅŸบ็บฟ็ฎ—ๆณ•ๆฏ”่พƒ](#ๅŸบ็บฟ็ฎ—ๆณ•ๆฏ”่พƒ)
61
  - [MCTS็›ธๅ…ณ็ฌ”่ฎฐ](#MCTS-็›ธๅ…ณ็ฌ”่ฎฐ)
62
  - [่ฎบๆ–‡็ฌ”่ฎฐ](#่ฎบๆ–‡็ฌ”่ฎฐ)
@@ -69,14 +72,14 @@
69
  - [่‡ด่ฐข](#่‡ด่ฐข)
70
  - [่ฎธๅฏ่ฏ](#่ฎธๅฏ่ฏ)
71
 
72
- ### ็‰น็‚น
73
  **่ฝป้‡**๏ผšLightZero ไธญ้›†ๆˆไบ†ๅคš็ง MCTS ๆ—็ฎ—ๆณ•๏ผŒ่ƒฝๅคŸๅœจๅŒไธ€ๆก†ๆžถไธ‹่ฝป้‡ๅŒ–ๅœฐ่งฃๅ†ณๅคš็งๅฑžๆ€ง็š„ๅ†ณ็ญ–้—ฎ้ข˜ใ€‚
74
 
75
  **้ซ˜ๆ•ˆ**๏ผšLightZero ้’ˆๅฏน MCTS ๆ—็ฎ—ๆณ•ไธญ่€—ๆ—ถๆœ€้•ฟ็š„็Žฏ่Š‚๏ผŒ้‡‡็”จๆททๅˆๅผ‚ๆž„่ฎก็ฎ—็ผ–็จ‹ๆ้ซ˜่ฎก็ฎ—ๆ•ˆ็Ž‡ใ€‚
76
 
77
  **ๆ˜“ๆ‡‚**๏ผšLightZero ไธบๆ‰€ๆœ‰้›†ๆˆ็š„็ฎ—ๆณ•ๆไพ›ไบ†่ฏฆ็ป†ๆ–‡ๆกฃๅ’Œ็ฎ—ๆณ•ๆก†ๆžถๅ›พ๏ผŒๅธฎๅŠฉ็”จๆˆท็†่งฃ็ฎ—ๆณ•ๅ†…ๆ ธ๏ผŒๅœจๅŒไธ€่Œƒๅผไธ‹ๆฏ”่พƒ็ฎ—ๆณ•ไน‹้—ด็š„ๅผ‚ๅŒใ€‚ๅŒๆ—ถ๏ผŒLightZero ไนŸไธบ็ฎ—ๆณ•็š„ไปฃ็ ๅฎž็Žฐๆไพ›ไบ†ๅ‡ฝๆ•ฐ่ฐƒ็”จๅ›พๅ’Œ็ฝ‘็ปœ็ป“ๆž„ๅ›พ๏ผŒไพฟไบŽ็”จๆˆทๅฎšไฝๅ…ณ้”ฎไปฃ็ ใ€‚
78
 
79
- ### ๆก†ๆžถ็ป“ๆž„
80
 
81
  <p align="center">
82
  <img src="assets/lightzero_pipeline.svg" alt="Image Description 2" width="50%" height="auto" style="margin: 0 1%;">
@@ -96,7 +99,7 @@
96
 
97
  ๅ…ณไบŽ LightZero ็š„ๆ–‡ไปถ็ป“ๆž„๏ผŒ่ฏทๅ‚่€ƒ [lightzero_file_structure](https://github.com/opendilab/LightZero/blob/main/assets/lightzero_file_structure.svg)ใ€‚
98
 
99
- ### ้›†ๆˆ็ฎ—ๆณ•
100
  LightZero ๆ˜ฏๅŸบไบŽ [PyTorch](https://pytorch.org/) ๅฎž็Žฐ็š„ MCTS ็ฎ—ๆณ•ๅบ“๏ผŒๅœจ MCTS ็š„ๅฎž็ŽฐไธญไนŸ็”จๅˆฐไบ† cython ๅ’Œ cppใ€‚ๅŒๆ—ถ๏ผŒLightZero ็š„ๆก†ๆžถไธป่ฆๅŸบไบŽ [DI-engine](https://github.com/opendilab/DI-engine) ๅฎž็Žฐใ€‚็›ฎๅ‰ LightZero ไธญ้›†ๆˆ็š„็ฎ—ๆณ•ๅŒ…ๆ‹ฌ๏ผš
101
  - [AlphaZero](https://www.science.org/doi/10.1126/science.aar6404)
102
  - [MuZero](https://arxiv.org/abs/1911.08265)
@@ -104,26 +107,30 @@ LightZero ๆ˜ฏๅŸบไบŽ [PyTorch](https://pytorch.org/) ๅฎž็Žฐ็š„ MCTS ็ฎ—ๆณ•ๅบ“๏ผŒ
104
  - [Stochastic MuZero](https://openreview.net/pdf?id=X6D9bAHhBQ1)
105
  - [EfficientZero](https://arxiv.org/abs/2111.00210)
106
  - [Gumbel MuZero](https://openreview.net/pdf?id=bERaNdoegnO&)
107
-
 
108
 
109
  LightZero ็›ฎๅ‰ๆ”ฏๆŒ็š„็ŽฏๅขƒๅŠ็ฎ—ๆณ•ๅฆ‚ไธ‹่กจๆ‰€็คบ๏ผš
110
 
111
- | Env./Algo. | AlphaZero | MuZero | EfficientZero | Sampled EfficientZero | Gumbel MuZero | Stochastic MuZero |
112
- |---------------| -------- | ------ |-------------| ------------------ | ---------- |----------------|
113
- | TicTacToe | โœ” | โœ” | ๐Ÿ”’ | ๐Ÿ”’ | โœ” | ๐Ÿ”’ |
114
- | Gomoku | โœ” | โœ” | ๐Ÿ”’ | ๐Ÿ”’ | โœ” | ๐Ÿ”’ |
115
- | Connect4 | โœ” | โœ” | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ |
116
- | 2048 | --- | โœ” | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | โœ” |
117
- | Chess | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ |
118
- | Go | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ |
119
- | CartPole | --- | โœ” | โœ” | โœ” | โœ” | โœ” |
120
- | Pendulum | --- | โœ” | โœ” | โœ” | โœ” | โœ” |
121
- | LunarLander | --- | โœ” | โœ” | โœ” | โœ” | โœ” |
122
- | BipedalWalker | --- | โœ” | โœ” | โœ” | โœ” | ๐Ÿ”’ |
123
- | Atari | --- | โœ” | โœ” | โœ” | โœ” | โœ” |
124
- | MuJoCo | --- | โœ” | โœ” | โœ” | ๐Ÿ”’ | ๐Ÿ”’ |
125
- | MiniGrid | --- | โœ” | โœ” | โœ” | ๐Ÿ”’ | ๐Ÿ”’ |
126
- | Bsuite | --- | โœ” | โœ” | โœ” | ๐Ÿ”’ | ๐Ÿ”’ |
 
 
 
127
 
128
  <sup>(1): "โœ”" ่กจ็คบๅฏนๅบ”็š„้กน็›ฎๅทฒ็ปๅฎŒๆˆๅนถ็ป่ฟ‡่‰ฏๅฅฝ็š„ๆต‹่ฏ•ใ€‚</sup>
129
 
@@ -131,7 +138,7 @@ LightZero ็›ฎๅ‰ๆ”ฏๆŒ็š„็ŽฏๅขƒๅŠ็ฎ—ๆณ•ๅฆ‚ไธ‹่กจๆ‰€็คบ๏ผš
131
 
132
  <sup>(3): "---" ่กจ็คบ่ฏฅ็ฎ—ๆณ•ไธๆ”ฏๆŒๆญค็Žฏๅขƒใ€‚</sup>
133
 
134
- ## ๅฎ‰่ฃ…ๆ–นๆณ•
135
 
136
  ๅฏไปฅ็”จไปฅไธ‹ๅ‘ฝไปคไปŽ Github ็š„ๆบ็ ไธญๅฎ‰่ฃ…ๆœ€ๆ–ฐ็‰ˆ็š„ LightZero๏ผš
137
 
@@ -170,7 +177,7 @@ pip3 install -e .
170
  python ./LightZero/zoo/classic_control/cartpole/config/cartpole_muzero_config.py
171
  ```
172
 
173
- ## ๅฟซ้€Ÿๅผ€ๅง‹
174
  ไฝฟ็”จๅฆ‚ไธ‹ไปฃ็ ๅœจ [CartPole](https://gymnasium.farama.org/environments/classic_control/cart_pole/) ็ŽฏๅขƒไธŠๅฟซ้€Ÿ่ฎญ็ปƒไธ€ไธช MuZero ๆ™บ่ƒฝไฝ“:
175
 
176
  ```bash
@@ -191,18 +198,30 @@ python3 -u zoo/atari/config/atari_muzero_config.py
191
  cd LightZero
192
  python3 -u zoo/board_games/tictactoe/config/tictactoe_muzero_bot_mode_config.py
193
  ```
194
- ## ๅฎšๅˆถๅŒ–ๆ–‡ๆกฃ
195
 
196
- ไธบๅธŒๆœ›ๅฎšๅˆถ็Žฏๅขƒๅ’Œ็ฎ—ๆณ•็š„็”จๆˆท๏ผŒๆˆ‘ไปฌๆไพ›ไบ†ๅ…จ้ข็š„ๆŒ‡ๅ—๏ผš
 
 
 
 
 
 
 
197
 
198
- - **็Žฏ๏ฟฝ๏ฟฝ๏ฟฝๅฎšๅˆถ๏ผš** [ๅฎšๅˆถ็Žฏๅขƒ](https://github.com/opendilab/LightZero/blob/main/docs/source/tutorials/envs/customize_envs_zh.md)
199
- - **็ฎ—ๆณ•ๅฎšๅˆถ๏ผš** [ๅฎšๅˆถ็ฎ—ๆณ•](https://github.com/opendilab/LightZero/blob/main/docs/source/tutorials/algos/customize_algos_zh.md)
200
 
201
- ๅฆ‚ๆœ‰ไปปไฝ•็–‘้—ฎ๏ผŒๆฌข่ฟŽ้šๆ—ถ่”็ณปๆˆ‘ไปฌๅฏปๆฑ‚ๅธฎๅŠฉใ€‚
202
 
203
- ## ๅŸบ็บฟ็ฎ—ๆณ•ๆฏ”่พƒ
 
 
 
204
 
205
- <details open><summary>็‚นๅ‡ปๆŠ˜ๅ </summary>
 
 
 
 
206
 
207
  - [AlphaZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/alphazero.py) ๅ’Œ [MuZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/muzero.py) ๅœจ3ไธชๆฃ‹็ฑปๆธธๆˆ๏ผˆ[TicTacToe (ไบ•ๅญ—ๆฃ‹)](https://github.com/opendilab/LightZero/blob/main/zoo/board_games/tictactoe/envs/tictactoe_env.py)๏ผŒ[Connect4](https://github.com/opendilab/LightZero/blob/main/zoo/board_games/connect4/envs/connect4_env.py) ๅ’Œ [Gomoku (ไบ”ๅญๆฃ‹)](https://github.com/opendilab/LightZero/blob/main/zoo/board_games/gomoku/envs/gomoku_env.py)๏ผ‰ไธŠ็š„ๅŸบ็บฟ็ป“ๆžœ๏ผš
208
  <p align="center">
@@ -255,7 +274,7 @@ python3 -u zoo/board_games/tictactoe/config/tictactoe_muzero_bot_mode_config.py
255
 
256
  </details>
257
 
258
- ## MCTS ็›ธๅ…ณ็ฌ”่ฎฐ
259
 
260
  ### ่ฎบๆ–‡็ฌ”่ฎฐ
261
 
@@ -279,24 +298,22 @@ python3 -u zoo/board_games/tictactoe/config/tictactoe_muzero_bot_mode_config.py
279
 
280
  </details>
281
 
 
 
282
  ### ็ฎ—ๆณ•ๆก†ๆžถๅ›พ
283
 
284
  ไปฅไธ‹ๆ˜ฏ LightZero ไธญ้›†ๆˆ็ฎ—ๆณ•็š„ๆก†ๆžถๆฆ‚่งˆๅ›พ๏ผš
285
 
286
  <details closed>
287
- <summary>(็‚นๅ‡ปๆŸฅ็œ‹ๆ›ดๅคš)</summary>
288
-
289
- [MCTS](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/mcts_overview.pdf)
290
-
291
- [AlphaZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/alphazero_overview.pdf)
292
-
293
- [MuZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/muzero_overview.pdf)
294
-
295
- [EfficientZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/efficientzero_overview.pdf)
296
 
297
- [SampledMuZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/sampled_muzero_overview.pdf)
298
-
299
- [GumbelMuZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/gumbel_muzero_overview.pdf)
 
 
 
 
300
 
301
  </details>
302
 
@@ -307,7 +324,7 @@ python3 -u zoo/board_games/tictactoe/config/tictactoe_muzero_bot_mode_config.py
307
  ### ้‡่ฆ่ฎบๆ–‡
308
 
309
  <details closed>
310
- <summary>(็‚นๅ‡ปๆŸฅ็œ‹ๆ›ดๅคš)</summary>
311
 
312
  #### LightZero Implemented series
313
 
@@ -351,7 +368,7 @@ python3 -u zoo/board_games/tictactoe/config/tictactoe_muzero_bot_mode_config.py
351
  ### ๅ…ถไป–่ฎบๆ–‡
352
 
353
  <details closed>
354
- <summary>(็‚นๅ‡ปๆŸฅ็œ‹ๆ›ดๅคš)</summary>
355
 
356
  #### ICML
357
  - [Scalable Safe Policy Improvement via Monte Carlo Tree Search](https://openreview.net/pdf?id=tevbBSzSfK) 2023
@@ -511,27 +528,42 @@ and internal state transition dynamics,
511
  - [Sample-Efficient Neural Architecture Search by Learning Actions for Monte Carlo Tree Search](https://arxiv.org/pdf/1906.06832) IEEE Transactions on Pattern Analysis and Machine Intelligence 2022.
512
  </details>
513
 
514
- ## ๅ้ฆˆๆ„่งๅ’Œ่ดก็Œฎ
515
  - ๆœ‰ไปปไฝ•็–‘้—ฎๆˆ–ๆ„่ง้ƒฝๅฏไปฅๅœจ github ไธŠ็›ดๆŽฅ [ๆๅ‡บ issue](https://github.com/opendilab/LightZero/issues/new/choose)
 
 
516
  - ๆˆ–๏ฟฝ๏ฟฝ่”็ณปๆˆ‘ไปฌ็š„้‚ฎ็ฎฑ (opendilab@pjlab.org.cn)
517
 
518
  - ๆ„Ÿ่ฐขๆ‰€ๆœ‰็š„ๅ้ฆˆๆ„่ง๏ผŒๅŒ…ๆ‹ฌๅฏน็ฎ—ๆณ•ๅ’Œ็ณป็ปŸ่ฎพ่ฎกใ€‚่ฟ™ไบ›ๅ้ฆˆๆ„่งๅ’Œๅปบ่ฎฎ้ƒฝไผš่ฎฉ LightZero ๅ˜ๅพ—ๆ›ดๅฅฝใ€‚
519
 
520
 
521
- ## ๅผ•็”จ
522
 
523
  ```latex
524
- @misc{lightzero,
525
- title={LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios},
526
- author={Yazhe Niu and Yuan Pu and Zhenjie Yang and Xueyan Li and Tong Zhou and Jiyuan Ren and Shuai Hu and Hongsheng Li and Yu Liu},
527
- year={2023},
528
- eprint={2310.08348},
529
- archivePrefix={arXiv},
530
- primaryClass={cs.LG}
 
 
 
 
 
 
 
 
 
 
 
 
 
531
  }
532
  ```
533
 
534
- ## ่‡ด่ฐข
535
  ๆญค็ฎ—ๆณ•ๅบ“็š„ๅฎž็Žฐ้ƒจๅˆ†ๅŸบไบŽไปฅไธ‹ GitHub ไป“ๅบ“๏ผŒ้žๅธธๆ„Ÿ่ฐข่ฟ™ไบ›ๅผ€ๅˆ›ๆ€งๅทฅไฝœ๏ผš
536
  - https://github.com/opendilab/DI-engine
537
  - https://github.com/deepmind/mctx
@@ -546,7 +578,7 @@ and internal state transition dynamics,
546
  <img src="https://contrib.rocks/image?repo=opendilab/LightZero" />
547
  </a>
548
 
549
- ## ่ฎธๅฏ่ฏ
550
 
551
  ๆœฌไป“ๅบ“ไธญ็š„ๆ‰€ๆœ‰ไปฃ็ ้ƒฝ็ฌฆๅˆ [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)ใ€‚
552
 
 
27
  [![Contributors](https://img.shields.io/github/contributors/opendilab/LightZero)](https://github.com/opendilab/LightZero/graphs/contributors)
28
  [![GitHub license](https://img.shields.io/github/license/opendilab/LightZero)](https://github.com/opendilab/LightZero/blob/master/LICENSE)
29
 
30
+ ๆœ€่ฟ‘ๆ›ดๆ–ฐไบŽ 2024.08.18 LightZero-v0.1.0
31
+
32
+ [English](https://github.com/opendilab/LightZero/blob/main/README.md) | ็ฎ€ไฝ“ไธญๆ–‡ | [ๆ–‡ๆกฃ](https://opendilab.github.io/LightZero) | [LightZero ่ฎบๆ–‡](https://arxiv.org/abs/2310.08348) | [๐Ÿ”ฅUniZero ่ฎบๆ–‡](https://arxiv.org/abs/2406.10667) | [๐Ÿ”ฅReZero ่ฎบๆ–‡](https://arxiv.org/abs/2404.16364)
33
 
34
  > LightZero ๆ˜ฏไธ€ไธช่ฝป้‡ใ€้ซ˜ๆ•ˆใ€ๆ˜“ๆ‡‚็š„ MCTS+RL ๅผ€ๆบ็ฎ—ๆณ•ๅบ“ใ€‚
35
+ > ๆœ‰ๅ…ณ LightZero ็š„ไปปไฝ•็–‘้—ฎ๏ผŒๆ‚จ้ƒฝๅฏไปฅๅ’จ่ฏขๅŸบไบŽ RAG ๆŠ€ๆœฏ็š„้—ฎ็ญ”ๅŠฉๆ‰‹๏ผš[ZeroPal](https://huggingface.co/spaces/OpenDILabCommunity/ZeroPal)ใ€‚
36
 
 
37
 
38
+ ## ๐Ÿ” ่ƒŒๆ™ฏ
39
 
40
  ไปฅ AlphaZero, MuZero ไธบไปฃ่กจ็š„็ป“ๅˆ่’™็‰นๅกๆด›ๆ ‘ๆœ็ดข (Monte Carlo Tree Search, MCTS) ๅ’ŒๆทฑๅบฆๅผบๅŒ–ๅญฆไน  (Deep Reinforcemeent Learning, DRL) ็š„ๆ–นๆณ•๏ผŒๅœจ่ฏธๅฆ‚ๅ›ดๆฃ‹๏ผŒAtari ็ญ‰ๅ„็งๆธธๆˆไธŠๅ–ๅพ—ไบ†่ถ…ไบบ็š„ๆฐดๅนณ๏ผŒไนŸๅœจ่ฏธๅฆ‚่›‹็™ฝ่ดจ็ป“ๆž„้ข„ๆต‹๏ผŒ็Ÿฉ้˜ตไน˜ๆณ•็ฎ—ๆณ•ๅฏปๆ‰พ็ญ‰็ง‘ๅญฆ้ข†ๅŸŸๅ–ๅพ—ไบ†ๅฏๅ–œ็š„่ฟ›ๅฑ•ใ€‚ไธ‹ๅ›พไธบ่’™็‰นๅกๆด›ๆ ‘ๆœ็ดข๏ผˆMCTS๏ผ‰็ฎ—ๆณ•ๆ—็š„ๅ‘ๅฑ•ๅŽ†ๅฒ๏ผš
41
  ![pipeline](assets/mcts_rl_evolution_overview.png)
42
 
43
+ ## ๐ŸŽจ ๆฆ‚่งˆ
44
 
45
  **LightZero** ๆ˜ฏไธ€ไธช็ป“ๅˆไบ†่’™็‰นๅกๆด›ๆ ‘ๆœ็ดขๅ’ŒๅผบๅŒ–ๅญฆไน ็š„ๅผ€ๆบ็ฎ—ๆณ•ๅทฅๅ…ทๅŒ…ใ€‚ ๅฎƒๆ”ฏๆŒไธ€็ณปๅˆ—ๅŸบไบŽ MCTS ็š„ RL ็ฎ—ๆณ•๏ผŒๅ…ทๆœ‰ไปฅไธ‹ไผ˜็‚น๏ผš
46
  - ่ฝป้‡ใ€‚
 
59
  - [้›†ๆˆ็ฎ—ๆณ•](#้›†ๆˆ็ฎ—ๆณ•)
60
  - [ๅฎ‰่ฃ…ๆ–นๆณ•](#ๅฎ‰่ฃ…ๆ–นๆณ•)
61
  - [ๅฟซ้€Ÿๅผ€ๅง‹](#ๅฟซ้€Ÿๅผ€ๅง‹)
62
+ - [ๆ–‡ๆกฃ](#ๆ–‡ๆกฃ)
63
  - [ๅŸบ็บฟ็ฎ—ๆณ•ๆฏ”่พƒ](#ๅŸบ็บฟ็ฎ—ๆณ•ๆฏ”่พƒ)
64
  - [MCTS็›ธๅ…ณ็ฌ”่ฎฐ](#MCTS-็›ธๅ…ณ็ฌ”่ฎฐ)
65
  - [่ฎบๆ–‡็ฌ”่ฎฐ](#่ฎบๆ–‡็ฌ”่ฎฐ)
 
72
  - [่‡ด่ฐข](#่‡ด่ฐข)
73
  - [่ฎธๅฏ่ฏ](#่ฎธๅฏ่ฏ)
74
 
75
+ ### ๐Ÿ’ฅ ็‰น็‚น
76
  **่ฝป้‡**๏ผšLightZero ไธญ้›†ๆˆไบ†ๅคš็ง MCTS ๆ—็ฎ—ๆณ•๏ผŒ่ƒฝๅคŸๅœจๅŒไธ€ๆก†ๆžถไธ‹่ฝป้‡ๅŒ–ๅœฐ่งฃๅ†ณๅคš็งๅฑžๆ€ง็š„ๅ†ณ็ญ–้—ฎ้ข˜ใ€‚
77
 
78
  **้ซ˜ๆ•ˆ**๏ผšLightZero ้’ˆๅฏน MCTS ๆ—็ฎ—ๆณ•ไธญ่€—ๆ—ถๆœ€้•ฟ็š„็Žฏ่Š‚๏ผŒ้‡‡็”จๆททๅˆๅผ‚ๆž„่ฎก็ฎ—็ผ–็จ‹ๆ้ซ˜่ฎก็ฎ—ๆ•ˆ็Ž‡ใ€‚
79
 
80
  **ๆ˜“ๆ‡‚**๏ผšLightZero ไธบๆ‰€ๆœ‰้›†ๆˆ็š„็ฎ—ๆณ•ๆไพ›ไบ†่ฏฆ็ป†ๆ–‡ๆกฃๅ’Œ็ฎ—ๆณ•ๆก†ๆžถๅ›พ๏ผŒๅธฎๅŠฉ็”จๆˆท็†่งฃ็ฎ—ๆณ•ๅ†…ๆ ธ๏ผŒๅœจๅŒไธ€่Œƒๅผไธ‹ๆฏ”่พƒ็ฎ—ๆณ•ไน‹้—ด็š„ๅผ‚ๅŒใ€‚ๅŒๆ—ถ๏ผŒLightZero ไนŸไธบ็ฎ—ๆณ•็š„ไปฃ็ ๅฎž็Žฐๆไพ›ไบ†ๅ‡ฝๆ•ฐ่ฐƒ็”จๅ›พๅ’Œ็ฝ‘็ปœ็ป“ๆž„ๅ›พ๏ผŒไพฟไบŽ็”จๆˆทๅฎšไฝๅ…ณ้”ฎไปฃ็ ใ€‚
81
 
82
+ ### ๐Ÿงฉ ๆก†ๆžถ็ป“ๆž„
83
 
84
  <p align="center">
85
  <img src="assets/lightzero_pipeline.svg" alt="Image Description 2" width="50%" height="auto" style="margin: 0 1%;">
 
99
 
100
  ๅ…ณไบŽ LightZero ็š„ๆ–‡ไปถ็ป“ๆž„๏ผŒ่ฏทๅ‚่€ƒ [lightzero_file_structure](https://github.com/opendilab/LightZero/blob/main/assets/lightzero_file_structure.svg)ใ€‚
101
 
102
+ ### ๐ŸŽ ้›†ๆˆ็ฎ—ๆณ•
103
  LightZero ๆ˜ฏๅŸบไบŽ [PyTorch](https://pytorch.org/) ๅฎž็Žฐ็š„ MCTS ็ฎ—ๆณ•ๅบ“๏ผŒๅœจ MCTS ็š„ๅฎž็ŽฐไธญไนŸ็”จๅˆฐไบ† cython ๅ’Œ cppใ€‚ๅŒๆ—ถ๏ผŒLightZero ็š„ๆก†ๆžถไธป่ฆๅŸบไบŽ [DI-engine](https://github.com/opendilab/DI-engine) ๅฎž็Žฐใ€‚็›ฎๅ‰ LightZero ไธญ้›†ๆˆ็š„็ฎ—ๆณ•ๅŒ…ๆ‹ฌ๏ผš
104
  - [AlphaZero](https://www.science.org/doi/10.1126/science.aar6404)
105
  - [MuZero](https://arxiv.org/abs/1911.08265)
 
107
  - [Stochastic MuZero](https://openreview.net/pdf?id=X6D9bAHhBQ1)
108
  - [EfficientZero](https://arxiv.org/abs/2111.00210)
109
  - [Gumbel MuZero](https://openreview.net/pdf?id=bERaNdoegnO&)
110
+ - [ReZero](https://arxiv.org/abs/2404.16364)
111
+ - [UniZero](https://arxiv.org/abs/2406.10667)
112
 
113
  LightZero ็›ฎๅ‰ๆ”ฏๆŒ็š„็ŽฏๅขƒๅŠ็ฎ—ๆณ•ๅฆ‚ไธ‹่กจๆ‰€็คบ๏ผš
114
 
115
+ | Env./Algo. | AlphaZero | MuZero | Sampled MuZero | EfficientZero | Sampled EfficientZero | Gumbel MuZero | Stochastic MuZero | UniZero | Sampled UniZero | ReZero |
116
+ |------------------------| -------- | ---- |---------------| ---------- | ------------------ | ------------- | ---------------- | ------- | --- | ------ |
117
+ | TicTacToe | โœ” | โœ” | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | โœ” | ๐Ÿ”’ | โœ” | ๐Ÿ”’ | ๐Ÿ”’ |
118
+ | Gomoku | โœ” | โœ” | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | โœ” | ๐Ÿ”’ | โœ” | ๐Ÿ”’ | โœ” |
119
+ | Connect4 | โœ” | โœ” | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | โœ” | ๐Ÿ”’ | โœ” |
120
+ | 2048 | --- | โœ” | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | โœ” | โœ” | ๐Ÿ”’ | ๐Ÿ”’ |
121
+ | Chess | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ |
122
+ | Go | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ |
123
+ | CartPole | --- | โœ” | ๐Ÿ”’ | โœ” | โœ” | โœ” | โœ” | โœ” | ๐Ÿ”’ | โœ” |
124
+ | Pendulum | --- | โœ” | โœ” | โœ” | โœ” | โœ” | โœ” | ๐Ÿ”’ | โœ” | ๐Ÿ”’ |
125
+ | LunarLander | --- | โœ” | โœ” | โœ” | โœ” | โœ” | โœ” | โœ” | โœ” | ๐Ÿ”’ |
126
+ | BipedalWalker | --- | โœ” | โœ” | โœ” | โœ” | โœ” | ๐Ÿ”’ | ๐Ÿ”’ | โœ” | ๐Ÿ”’ |
127
+ | Atari | --- | โœ” | ๐Ÿ”’ | โœ” | โœ” | โœ” | โœ” | โœ” | ๐Ÿ”’ | โœ” |
128
+ | DeepMind Control | --- | --- | โœ” | --- | โœ” | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | โœ” | ๐Ÿ”’ |
129
+ | MuJoCo | --- | โœ” | ๐Ÿ”’ | โœ” | โœ” | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ |
130
+ | MiniGrid | --- | โœ” | ๐Ÿ”’ | โœ” | โœ” | ๐Ÿ”’ | ๐Ÿ”’ | โœ” | ๐Ÿ”’ | ๐Ÿ”’ |
131
+ | Bsuite | --- | โœ” | ๐Ÿ”’ | โœ” | โœ” | ๐Ÿ”’ | ๐Ÿ”’ | โœ” | ๐Ÿ”’ | ๐Ÿ”’ |
132
+ | Memory | --- | โœ” | ๐Ÿ”’ | โœ” | โœ” | ๐Ÿ”’ | ๐Ÿ”’ | โœ” | ๐Ÿ”’ | ๐Ÿ”’ |
133
+ | SumToThree (billiards) | --- | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | โœ” | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ | ๐Ÿ”’ |
134
 
135
  <sup>(1): "โœ”" ่กจ็คบๅฏนๅบ”็š„้กน็›ฎๅทฒ็ปๅฎŒๆˆๅนถ็ป่ฟ‡่‰ฏๅฅฝ็š„ๆต‹่ฏ•ใ€‚</sup>
136
 
 
138
 
139
  <sup>(3): "---" ่กจ็คบ่ฏฅ็ฎ—ๆณ•ไธๆ”ฏๆŒๆญค็Žฏๅขƒใ€‚</sup>
140
 
141
+ ## โš™๏ธ ๅฎ‰่ฃ…ๆ–นๆณ•
142
 
143
  ๅฏไปฅ็”จไปฅไธ‹ๅ‘ฝไปคไปŽ Github ็š„ๆบ็ ไธญๅฎ‰่ฃ…ๆœ€ๆ–ฐ็‰ˆ็š„ LightZero๏ผš
144
 
 
177
  python ./LightZero/zoo/classic_control/cartpole/config/cartpole_muzero_config.py
178
  ```
179
 
180
+ ## ๐Ÿš€ ๅฟซ้€Ÿๅผ€ๅง‹
181
  ไฝฟ็”จๅฆ‚ไธ‹ไปฃ็ ๅœจ [CartPole](https://gymnasium.farama.org/environments/classic_control/cart_pole/) ็ŽฏๅขƒไธŠๅฟซ้€Ÿ่ฎญ็ปƒไธ€ไธช MuZero ๆ™บ่ƒฝไฝ“:
182
 
183
  ```bash
 
198
  cd LightZero
199
  python3 -u zoo/board_games/tictactoe/config/tictactoe_muzero_bot_mode_config.py
200
  ```
 
201
 
202
+ ไฝฟ็”จๅฆ‚ไธ‹ไปฃ็ ๅœจ [Pong](https://gymnasium.farama.org/environments/atari/pong/) ็ŽฏๅขƒไธŠๅฟซ้€Ÿ่ฎญ็ปƒไธ€ไธช UniZero ๆ™บ่ƒฝไฝ“๏ผš
203
+
204
+ ```bash
205
+ cd LightZero
206
+ python3 -u zoo/atari/config/atari_unizero_config.py
207
+ ```
208
+
209
+ ## ๐Ÿ“š ๆ–‡ๆกฃ
210
 
211
+ LightZero็š„ๆ–‡ๆกฃๅฏไปฅๅœจ[่ฟ™้‡Œ](https://opendilab.github.io/LightZero/)ๆ‰พๅˆฐใ€‚ๆ–‡ๆกฃไธญๅŒ…ๅซๆ•™็จ‹ๅ’ŒAPIๅ‚่€ƒใ€‚
 
212
 
213
+ ไธบๅธŒๆœ›ๅฎšๅˆถ็Žฏๅขƒๅ’Œ็ฎ—ๆณ•็š„็”จๆˆท๏ผŒๆˆ‘ไปฌๆไพ›ไบ†็›ธๅบ”็š„ๆŒ‡ๅ—๏ผš
214
 
215
+ - [ๅฆ‚ไฝ•่‡ชๅฎšไน‰็Žฏๅขƒ?](https://github.com/opendilab/LightZero/blob/main/docs/source/tutorials/envs/customize_envs_zh.md)
216
+ - [ๅฆ‚ไฝ•่‡ชๅฎšไน‰็ฎ—ๆณ•?](https://github.com/opendilab/LightZero/blob/main/docs/source/tutorials/algos/customize_algos_zh.md)
217
+ - [ๅฆ‚ไฝ•่ฎพ็ฝฎ้…็ฝฎๆ–‡ไปถ๏ผŸ](https://github.com/opendilab/LightZero/blob/main/docs/source/tutorials/config/config_zh.md)
218
+ - [ๆ—ฅๅฟ—็ณป็ปŸ](https://github.com/opendilab/LightZero/blob/main/docs/source/tutorials/logs/logs_zh.md)
219
 
220
+ ๅฆ‚ๆœ‰ไปปไฝ•็–‘้—ฎ๏ผŒๆฌข่ฟŽ้šๆ—ถ่”็ณปๆˆ‘ไปฌใ€‚
221
+
222
+ ## ๐Ÿ“Š ๅŸบ็บฟ็ฎ—ๆณ•ๆฏ”่พƒ
223
+
224
+ <details><summary>็‚นๅ‡ปๆŸฅ็œ‹</summary>
225
 
226
  - [AlphaZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/alphazero.py) ๅ’Œ [MuZero](https://github.com/opendilab/LightZero/blob/main/lzero/policy/muzero.py) ๅœจ3ไธชๆฃ‹็ฑปๆธธๆˆ๏ผˆ[TicTacToe (ไบ•ๅญ—ๆฃ‹)](https://github.com/opendilab/LightZero/blob/main/zoo/board_games/tictactoe/envs/tictactoe_env.py)๏ผŒ[Connect4](https://github.com/opendilab/LightZero/blob/main/zoo/board_games/connect4/envs/connect4_env.py) ๅ’Œ [Gomoku (ไบ”ๅญๆฃ‹)](https://github.com/opendilab/LightZero/blob/main/zoo/board_games/gomoku/envs/gomoku_env.py)๏ผ‰ไธŠ็š„ๅŸบ็บฟ็ป“ๆžœ๏ผš
227
  <p align="center">
 
274
 
275
  </details>
276
 
277
+ ## ๐Ÿ“ MCTS ็›ธๅ…ณ็ฌ”่ฎฐ
278
 
279
  ### ่ฎบๆ–‡็ฌ”่ฎฐ
280
 
 
298
 
299
  </details>
300
 
301
+ ไนŸๅฏๅ‚่€ƒ็›ธๅบ”็š„็ŸฅไนŽไธ“ๆ : [MCTS+RL ๅ‰ๆฒฟ็†่ฎบๅ’Œๅบ”็”จ็š„ๆทฑๅ…ฅ่งฃๆž](https://www.zhihu.com/column/c_1764308735227662336)ใ€‚
302
+
303
  ### ็ฎ—ๆณ•ๆก†ๆžถๅ›พ
304
 
305
  ไปฅไธ‹ๆ˜ฏ LightZero ไธญ้›†ๆˆ็ฎ—ๆณ•็š„ๆก†ๆžถๆฆ‚่งˆๅ›พ๏ผš
306
 
307
  <details closed>
308
+ <summary>(็‚นๅ‡ปๆŸฅ็œ‹)</summary>
 
 
 
 
 
 
 
 
309
 
310
+ - [MCTS](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/mcts_overview.pdf)
311
+ - [AlphaZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/alphazero_overview.pdf)
312
+ - [MuZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/muzero_overview.png)
313
+ - [EfficientZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/efficientzero_overview.png)
314
+ - [SampledMuZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/sampled_muzero_overview.png)
315
+ - [GumbelMuZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/gumbel_muzero_overview.png)
316
+ - [StochasticMuZero](https://github.com/opendilab/LightZero/blob/main/assets/algo_overview/stochastic_muzero_overview.png)
317
 
318
  </details>
319
 
 
324
  ### ้‡่ฆ่ฎบๆ–‡
325
 
326
  <details closed>
327
+ <summary>(็‚นๅ‡ปๆŸฅ็œ‹)</summary>
328
 
329
  #### LightZero Implemented series
330
 
 
368
  ### ๅ…ถไป–่ฎบๆ–‡
369
 
370
  <details closed>
371
+ <summary>(็‚นๅ‡ปๆŸฅ็œ‹)</summary>
372
 
373
  #### ICML
374
  - [Scalable Safe Policy Improvement via Monte Carlo Tree Search](https://openreview.net/pdf?id=tevbBSzSfK) 2023
 
528
  - [Sample-Efficient Neural Architecture Search by Learning Actions for Monte Carlo Tree Search](https://arxiv.org/pdf/1906.06832) IEEE Transactions on Pattern Analysis and Machine Intelligence 2022.
529
  </details>
530
 
531
+ ## ๐Ÿ’ฌ ๅ้ฆˆๆ„่งๅ’Œ่ดก็Œฎ
532
  - ๆœ‰ไปปไฝ•็–‘้—ฎๆˆ–ๆ„่ง้ƒฝๅฏไปฅๅœจ github ไธŠ็›ดๆŽฅ [ๆๅ‡บ issue](https://github.com/opendilab/LightZero/issues/new/choose)
533
+ - ๅผ€ๅฏๆˆ–ๅ‚ๅŠ  [GitHub ่ฎบๅ›](https://github.com/opendilab/LightZero/discussions)
534
+ - ๅœจ LightZero [discord server](https://discord.gg/qZTQTycu) ไธŠ่ฟ›่กŒ่ฎจ่ฎบ
535
  - ๆˆ–๏ฟฝ๏ฟฝ่”็ณปๆˆ‘ไปฌ็š„้‚ฎ็ฎฑ (opendilab@pjlab.org.cn)
536
 
537
  - ๆ„Ÿ่ฐขๆ‰€ๆœ‰็š„ๅ้ฆˆๆ„่ง๏ผŒๅŒ…ๆ‹ฌๅฏน็ฎ—ๆณ•ๅ’Œ็ณป็ปŸ่ฎพ่ฎกใ€‚่ฟ™ไบ›ๅ้ฆˆๆ„่งๅ’Œๅปบ่ฎฎ้ƒฝไผš่ฎฉ LightZero ๅ˜ๅพ—ๆ›ดๅฅฝใ€‚
538
 
539
 
540
+ ## ๐ŸŒ ๅผ•็”จ
541
 
542
  ```latex
543
+ @article{niu2024lightzero,
544
+ title={LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios},
545
+ author={Niu, Yazhe and Pu, Yuan and Yang, Zhenjie and Li, Xueyan and Zhou, Tong and Ren, Jiyuan and Hu, Shuai and Li, Hongsheng and Liu, Yu},
546
+ journal={Advances in Neural Information Processing Systems},
547
+ volume={36},
548
+ year={2024}
549
+ }
550
+
551
+ @article{pu2024unizero,
552
+ title={UniZero: Generalized and Efficient Planning with Scalable Latent World Models},
553
+ author={Pu, Yuan and Niu, Yazhe and Ren, Jiyuan and Yang, Zhenjie and Li, Hongsheng and Liu, Yu},
554
+ journal={arXiv preprint arXiv:2406.10667},
555
+ year={2024}
556
+ }
557
+
558
+ @article{xuan2024rezero,
559
+ title={ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze},
560
+ author={Xuan, Chunyu and Niu, Yazhe and Pu, Yuan and Hu, Shuai and Liu, Yu and Yang, Jing},
561
+ journal={arXiv preprint arXiv:2404.16364},
562
+ year={2024}
563
  }
564
  ```
565
 
566
+ ## ๐Ÿ’“ ่‡ด่ฐข
567
  ๆญค็ฎ—ๆณ•ๅบ“็š„ๅฎž็Žฐ้ƒจๅˆ†ๅŸบไบŽไปฅไธ‹ GitHub ไป“ๅบ“๏ผŒ้žๅธธๆ„Ÿ่ฐข่ฟ™ไบ›ๅผ€ๅˆ›ๆ€งๅทฅไฝœ๏ผš
568
  - https://github.com/opendilab/DI-engine
569
  - https://github.com/deepmind/mctx
 
578
  <img src="https://contrib.rocks/image?repo=opendilab/LightZero" />
579
  </a>
580
 
581
+ ## ๐Ÿท๏ธ ่ฎธๅฏ่ฏ
582
 
583
  ๆœฌไป“ๅบ“ไธญ็š„ๆ‰€ๆœ‰ไปฃ็ ้ƒฝ็ฌฆๅˆ [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)ใ€‚
584