File size: 9,107 Bytes
b16a132
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
Metadata-Version: 2.1
Name: crazyneuraluser
Version: 0.0.post1.dev55+g3c295fb.d20220606
Summary: Add a short description here!
Home-page: https://github.com/pyscaffold/pyscaffold/
Author: Extended by Alistair McLeay, original code by Alexandru Coca
Author-email: am@alistairmcleay.com and alexcoca23@yahoo.co.uk
License: MIT
Project-URL: Documentation, https://pyscaffold.org/
Platform: any
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Python
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Provides-Extra: testing
License-File: LICENSE.txt
License-File: AUTHORS.md

# Cambridge Masters Project
Joint Learning of Practical Dialogue Systems and User Simulators

## Environment setup

1. Create an environment `crazyneuraluser` with the help of [conda]
   ```
   conda env create -f environment.yml
   ```
2. Activate the new environment with:
   ```
   conda activate crazyneuraluser
   ```
3. Install a version of `pytorch` compatible with your hardware (see the [pytorch website](https://pytorch.org/get-started/previous-versions/)). E.g.:
   ```
   pip install torch --extra-index-url https://download.pytorch.org/whl/cu113
   ```

4. Install `spacy` and download the tokenization tool in spacy:
   ```
   pip install spacy'
   python -m spacy download en_core_web_sm
   ```

### Generating dialogues through agent-agent interaction

To generate dialogues, first change working directory to the `baselines` directory. Run the command
   ```
   python baselines_setup.py
   ```
to prepare `convlab2` for running the baselines. 

#### Generating dialogues conditioned on randomly sampled goals

Select one of the available configurations in the `configs` directory and run the command
   ```
   python simulate_agent_interaction.py --config /rel/path/to/chosen/config
   ```
to generate dialogues conditioned on randomly sampled goals according to the `convlab2` goal model. The dialogues will be be saved automatically in the `models` directory, under a directory whose name depends on the configuration run. The `models` directory is located in the parent directory of the `baselines` directory. The `metadata.json` file saved with the dialogues contains information about the data generation process.

#### Generating dialogues conditioned on `MultiWOZ2.1` goals

To generate the entire corpus, simply pass the `--goals-path /path/to/multiwoz2.1/data.json/file` flag to `simulate_agent_interaction.py`. To generate the `test/val` split additionally pass the `--filter-path /path/to/multiwoz2.1/test-or-valListFile` argument to `simulate_agent_interaction.py`. You can use the  `generate_multiwoz21_train_id_file` function in `baselines/utils.py` to generate `trainListFile` which can then be passed via the `--filter-path` argument to the dialogue generation script in order to generate dialogues conditioned on the `MultiWOZ2.1` training goals.

### Converting the generated dialogues to SGD-like format

The `create_data_from_multiwoz.py` script can be used to convert the generated dialogues to SGD format, necessary for evaluation. It is based on the script provided by Google for DSTC8, but with additional functionality such as:

   - conversion of slot names as annotated in the MultiWOZ 2.1 dialogue acts to different slot names, specified through the `--slots_convention` argument. Options are `multiwoz22` to convert the slots to the same slots as defined in the MultiWOZ 2.2 dataset whreas the `multiwoz_goals` converts the slot names to the names used in the dialogue goal and state tracking annotations.

  - addition of system and user `nlu` fields for every turn

  - option to perform cleaning operations on the goals to ensure a standard format is received by the evaluator. 

The conversion is done according to the `schema.json` file in the `baselines` directory, which is the same as used by `DSTC8` conversion except for the addition of the `police` domain. Type ``python create_data_from_multiwoz.py --helpfull`` to see a full list of flags and usage. 

## Installation

The recommended way to use this repository is to develop the core code under `src/crazyneuraluser`. The experiments/exporatory analysis making use of the core package code should be placed outside the library and imported. See more guidance under the [Project Organisation](#project-organization) section below.

To create an environment for the package, make sure you have deactivated all `conda` environments. Then:

1. Create an environment `crazyneuraluser` with the help of [conda]:
   ```
   conda env create -f environment.yml
   ```
2. Add the developer dependencies to this environment with the help of [conda]:
   ```
   conda env update -f dev_environment.yml
   ```
   
Optional and needed only once after `git clone`:

3. install several [pre-commit] git hooks with:
   ```bash
   pre-commit install
   # You _are encouraged_ to run `pre-commit autoupdate`
   ```
   and checkout the configuration under `.pre-commit-config.yaml`.
   The `-n, --no-verify` flag of `git commit` can be used to deactivate pre-commit hooks temporarily.

4. install [nbstripout] git hooks to remove the output cells of committed notebooks with:
   ```bash
   nbstripout --install --attributes notebooks/.gitattributes
   ```
   This is useful to avoid large diffs due to plots in your notebooks.
   A simple `nbstripout --uninstall` will revert these changes.

Then take a look into the `scripts` and `notebooks` folders.

## Dependency Management & Reproducibility

1. Always keep your abstract (unpinned) dependencies updated in `environment.yml` and eventually
   in `setup.cfg` if you want to ship and install your package via `pip` later on.
2. Create concrete dependencies as `environment.lock.yml` for the exact reproduction of your
   environment with:
   ```bash
   conda env export -n crazyneuraluser -f environment.lock.yml
   ```
   For multi-OS development, consider using `--no-builds` during the export.
3. Update your current environment with respect to a new `environment.lock.yml` using:
   ```bash
   conda env update -f environment.lock.yml --prune
   ```
## Project Organization

```
β”œβ”€β”€ AUTHORS.md              <- List of developers and maintainers.
β”œβ”€β”€ CHANGELOG.md            <- Changelog to keep track of new features and fixes.
β”œβ”€β”€ LICENSE.txt             <- License as chosen on the command-line.
β”œβ”€β”€ README.md               <- The top-level README for developers.
β”œβ”€β”€ configs                 <- Directory for configurations of model & application.
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ external            <- Data from third party sources.
β”‚   β”œβ”€β”€ interim             <- Intermediate data that has been transformed.
β”‚   β”œβ”€β”€ processed           <- The final, canonical data sets for modeling.
β”‚   └── raw                 <- The original, immutable data dump.
β”œβ”€β”€ docs                    <- Directory for Sphinx documentation in rst or md.
β”œβ”€β”€ environment.yml         <- The conda environment file for reproducibility.
β”œβ”€β”€ models                  <- Trained and serialized models, model predictions,
β”‚                              or model summaries.
β”œβ”€β”€ notebooks               <- Jupyter notebooks. Naming convention is a number (for
β”‚                              ordering), the creator's initials and a description,
β”‚                              e.g. `1.0-fw-initial-data-exploration`.
β”œβ”€β”€ pyproject.toml          <- Build system configuration. Do not change!
β”œβ”€β”€ references              <- Data dictionaries, manuals, and all other materials.
β”œβ”€β”€ reports                 <- Generated analysis as HTML, PDF, LaTeX, etc.
β”‚   └── figures             <- Generated plots and figures for reports.
β”œβ”€β”€ scripts                 <- Analysis and production scripts which import the
β”‚                              actual Python package, e.g. train_model.py.
β”œβ”€β”€ setup.cfg               <- Declarative configuration of your project.
β”œβ”€β”€ setup.py                <- Use `pip install -e .` to install for development or
|                              or create a distribution with `tox -e build`.
β”œβ”€β”€ src
β”‚   └── crazyneuraluser     <- Actual Python package where the main functionality goes.
β”œβ”€β”€ tests                   <- Unit tests which can be run with `py.test`.
β”œβ”€β”€ .coveragerc             <- Configuration for coverage reports of unit tests.
β”œβ”€β”€ .isort.cfg              <- Configuration for git hook that sorts imports.
└── .pre-commit-config.yaml <- Configuration of pre-commit git hooks.
```

<!-- pyscaffold-notes -->

## Note

This project has been set up using [PyScaffold] 4.0.1 and the [dsproject extension] 0.6.1.

[conda]: https://docs.conda.io/
[pre-commit]: https://pre-commit.com/
[Jupyter]: https://jupyter.org/
[nbstripout]: https://github.com/kynan/nbstripout
[Google style]: http://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings
[PyScaffold]: https://pyscaffold.org/
[dsproject extension]: https://github.com/pyscaffold/pyscaffoldext-dsproject