Spaces:
Runtime error
Runtime error
Jarvis-K
commited on
Commit
•
8c8cf65
1
Parent(s):
2a33798
update readme
Browse files
README.md
CHANGED
@@ -56,42 +56,3 @@ Here is an example of how to run the script:
|
|
56 |
```
|
57 |
./test.sh
|
58 |
```
|
59 |
-
The commands in test.sh are structured as follows:
|
60 |
-
|
61 |
-
```
|
62 |
-
python main.py --env_name ENV_NAME --init_summarizer INIT_SUMMARIZER --curr_summarizer CURR_SUMMARIZER [--future_summarizer FUTURE_SUMMARIZER --future_horizon FUTURE_HORIZON]
|
63 |
-
```
|
64 |
-
Where:
|
65 |
-
|
66 |
-
* ENV_NAME: The name of the Gym environment to be used (e.g., CartPole-v0).
|
67 |
-
* INIT_SUMMARIZER: The initial summarizer to be used (e.g., cart_init_translator).
|
68 |
-
* CURR_SUMMARIZER: The current summarizer to be used (e.g., cart_basic_translator).
|
69 |
-
* FUTURE_SUMMARIZER (optional): The future summarizer to be used (e.g., cart_basic_translator).
|
70 |
-
* FUTURE_HORIZON (optional): The horizon that each policy will look to (e.g., 3).
|
71 |
-
|
72 |
-
## Supported Environment Translators and LLM Deciders
|
73 |
-
|
74 |
-
| | Acrobot | Cart Pole | Mountain Car | Pendulum | Lunar Lander | Blackjack | Taxi | Cliff Walking | Frozen Lake |
|
75 |
-
|------------------------------|:------------------------:|:----------------------------------:|:------------------------:|:------------------------:|:------------------------:|:------------------------:|:------------------------:|:------------------------:|:------------------------:|
|
76 |
-
| Translator | :heavy_multiplication_x: | :white_check_mark: | :heavy_multiplication_x: | :heavy_multiplication_x: | :white_check_mark: | :heavy_multiplication_x: | :heavy_multiplication_x: | :heavy_multiplication_x: | :heavy_multiplication_x: |
|
77 |
-
| Chain-of-Thought | :heavy_minus_sign: | :white_check_mark:(L1)<br>:gift:<sup>[1]</sup>(~30) | :heavy_minus_sign: | :heavy_minus_sign: | :white_check_mark:(L1)<br/>:gift:<sup>[1]</sup>(-367) | :heavy_minus_sign: | :heavy_minus_sign: | :heavy_minus_sign: | :heavy_minus_sign: |
|
78 |
-
| Program-aided Language Model | :heavy_minus_sign: | :white_check_mark:(L1)<br>:gift:(168) | :heavy_minus_sign: | :heavy_minus_sign: | :white_check_mark:(L1)<br/>:gift:(-68) | :heavy_minus_sign: | :heavy_minus_sign: | :heavy_minus_sign: | :heavy_minus_sign: |
|
79 |
-
| Self-ask Prompting | :heavy_minus_sign: | :white_check_mark:(L1)<br>:gift:(~10) | :heavy_minus_sign: | :heavy_minus_sign: | :heavy_multiplication_x: | :heavy_minus_sign: | :heavy_minus_sign: | :heavy_minus_sign: | :heavy_minus_sign: |
|
80 |
-
| Self-consistency Prompting | :heavy_minus_sign: | :white_check_mark:(L1)<br>:gift:(~30) | :heavy_minus_sign: | :heavy_minus_sign: | :heavy_multiplication_x: | :heavy_minus_sign: | :heavy_minus_sign: | :heavy_minus_sign: | :heavy_minus_sign: |
|
81 |
-
| Reflexion | :heavy_minus_sign: | :heavy_multiplication_x: | :heavy_minus_sign: | :heavy_minus_sign: | :heavy_multiplication_x: | :heavy_minus_sign: | :heavy_minus_sign: | :heavy_minus_sign: | :heavy_minus_sign: |
|
82 |
-
| Solo Performance Prompting | :heavy_minus_sign: | :white_check_mark:(L1)<br/>:gift:(43) | :heavy_minus_sign: | :heavy_minus_sign: | :white_check_mark:(L1)<br/>:gift:(-583) | :heavy_minus_sign: | :heavy_minus_sign: | :heavy_minus_sign: | :heavy_minus_sign: |
|
83 |
-
|
84 |
-
<sup>[1]: Cumulative reward.</sup>
|
85 |
-
![Image text](https://github.com/mail-ecnu/LLM-Decider-Bench/blob/master/vis/Classic%20Control.png)
|
86 |
-
![Image text](https://github.com/mail-ecnu/LLM-Decider-Bench/blob/master/vis/Box%202D.png)
|
87 |
-
![Image text](https://github.com/mail-ecnu/LLM-Decider-Bench/blob/master/vis/Toy%20Text.png)
|
88 |
-
|
89 |
-
>
|
90 |
-
> 1. Except for the reflexion L3 decider, all other L3 deciders in this task do not have memory.
|
91 |
-
> 2. reflexion L1 and L3 both have memory.
|
92 |
-
> 3. reflexion L1 run 5 trails.
|
93 |
-
> 4. Blackjack、MountainCar、Cliffwalking(PAL)、CartPole(PAL)、Taxi(SPP、PAL)、Frozen Lake use deciders modified at 15:29 09.18
|
94 |
-
> 5. update Frozen Lake translator, add prior knowledge.
|
95 |
-
# Remarks
|
96 |
-
1. how to use future info
|
97 |
-
We provide future info in the env_info part. It is a dict and you can convert it to a text further to make your agent aware the world model.
|
|
|
56 |
```
|
57 |
./test.sh
|
58 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|