File size: 5,474 Bytes
be5548b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
---
title: SocialAI School Demo
emoji: πŸ§™πŸ»β€β™‚οΈ
colorFrom: gray
colorTo: indigo
sdk: docker
app_port: 7860
---

# SocialAI

[comment]: <> (This repository is the official implementation of [My Paper Title]&#40;https://arxiv.org/abs/2030.12345&#41;. )

[comment]: <> (TODO: add arxiv link later)
This repository is the official implementation of SocialAI: Benchmarking Socio-Cognitive Abilities inDeep Reinforcement Learning Agents.

The website of the project is [here](https://sites.google.com/view/socialai)

The code is based on:
[minigrid](https://github.com/maximecb/gym-minigrid)

Additional repositories used:
[BabyAI](https://github.com/mila-iqia/babyai)
[RIDE](https://github.com/facebookresearch/impact-driven-exploration)
[astar](https://github.com/jrialland/python-astar)


## Installation

[comment]: <> (Clone the repo)

[comment]: <> (```)

[comment]: <> (git clone https://gitlab.inria.fr/gkovac/act-and-speak.git)

[comment]: <> (```)

Create and activate your conda env
```
conda create --name social_ai python=3.7
conda activate social_ai
conda install -c anaconda graphviz 
```

Install the required packages
```
pip install -r requirements.txt
pip install -e torch-ac
pip install -e gym-minigrid 
conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
```

## Interactive policy

To run an enviroment in the interactive mode run:
```
python -m scripts.manual_control.py 
```

You can test different enviroments with the ```--env``` parameter.




# RL experiments

## Training

### Minimal example

To train a policy, run:
```train
python -m scripts.train --model test_model_name --seed 1  --compact-save --algo ppo --env SocialAI-AsocialBoxInformationSeekingParamEnv-v1 --dialogue --save-interval 1 --log-interval 1 --frames 5000000 --multi-modal-babyai11-agent --arch original_endpool_res --custom-ppo-2
`````

The policy should be above 0.95 success rate after the first 2M environment interactions.

### Recreating all the experiments 

See ```run_SAI_final_case_studies.txt``` for the experiments in the paper.

#### Regular machine

To run the experiments on a regular machine `run_SAI_final_case_studies.txt` contains all the bash commands running the RL experiments.



#### Slurm based cluster (todo:)

To recreate all the experiments from the paper on a slurm based server configure the `campaign_launcher.py` script and run:

```
python campaign_launcher.py run_NeurIPS.txt
```

[//]: # (The list of all the experiments and their parameters can be seen in run_NeurIPS.txt)

[//]: # ()
[//]: # (For example the bash equivalent of the following configuration:)

[//]: # (```)

[//]: # (--slurm_conf jz_long_2gpus_32g --nb_seeds 16 --model NeurIPS_Help_NoSocial_NO_BONUS_ABL  --compact-save --algo ppo --*env MiniGrid-AblationExiter-8x8-v0 --*env_args hidden_npc True --dialogue --save-interval 10 --frames 5000000 --*multi-modal-babyai11-agent --*arch original_endpool_res --*custom-ppo-2)

[//]: # (```)

[//]: # (is:)

[//]: # (```)

[//]: # (for SEED in {1..16})

[//]: # (do)

[//]: # (    python -m scripts.train --model NeurIPS_Help_NoSocial_NO_BONUS_ABL  --compact-save --algo ppo --*env MiniGrid-AblationExiter-8x8-v0 --*env_args hidden_npc True --dialogue --save-interval 10 --frames 5000000 --*multi-modal-babyai11-agent --*arch original_endpool_res --*custom-ppo-2 --seed $SEED & )

[//]: # (done)

[//]: # (```)



## Evaluation

To evaluate a policy, run:

```eval
python -m scripts.evaluate_new --episodes 500  --test-set-seed 1  --model-label test_model --eval-env SocialAI-TestLanguageFeedbackSwitchesInformationSeekingParamEnv-v1  --model-to-evaluate storage/test/ --n-seeds 8
````

To visualize a policy, run:
```
python -m scripts.visualize --model storage/test_model_name/1/ --pause 0.1 --seed $RANDOM --episodes 20 --gif viz/test
```


# LLM experiments

For LLMs set your ```OPENAI_API_KEY``` (and ```HF_TOKEN```) variable in ```~/.bashrc``` or wherever you want.

### Creating in-context examples
To create in_context examples you can use the ```create_LLM_examples.py``` script.

This script will open an interactive window, where you can manually control the agent.
By default, nothing is saved.
The general procedure is to press 'enter' to skip over environments which you don't like.
When you see a wanted enviroment, move the agent in the wanted position and start recording (press 'r'). The current and the following steps in the episode will be recorded.
Then control the agent and finish the episode. The new episode will start and recording will be turned off again.

If you already like some of the previously collected examples and want to append to them you can use the ```--load``` argument.

### Evaluating LLM-based agents

The script ```eval_LLMs.sh``` contains the bash commands to run all the experiments in the paper.

Here is an example of running evaluation on the ```text-ada-001``` model on the AsocialBox environment:
```
python -m scripts.LLM_test  --episodes 10 --max-steps 15 --model text-ada-001 --env-args size 7 --env-name SocialAI-AsocialBoxInformationSeekingParamEnv-v1 --in-context-path llm_data/in_context_examples/in_context_asocialbox_SocialAI-AsocialBoxInformationSeekingParamEnv-v1_2023_07_19_19_28_48/episodes.pkl
```

If you want to control the agent yourself you can set the model to ```interactive```.
```dummy``` agent just executes the move forward action, and ```random``` executes a random action. These agent are usefull for testing.