Spaces:
Runtime error
Runtime error
File size: 9,107 Bytes
b16a132 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
Metadata-Version: 2.1
Name: crazyneuraluser
Version: 0.0.post1.dev55+g3c295fb.d20220606
Summary: Add a short description here!
Home-page: https://github.com/pyscaffold/pyscaffold/
Author: Extended by Alistair McLeay, original code by Alexandru Coca
Author-email: am@alistairmcleay.com and alexcoca23@yahoo.co.uk
License: MIT
Project-URL: Documentation, https://pyscaffold.org/
Platform: any
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Python
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Provides-Extra: testing
License-File: LICENSE.txt
License-File: AUTHORS.md
# Cambridge Masters Project
Joint Learning of Practical Dialogue Systems and User Simulators
## Environment setup
1. Create an environment `crazyneuraluser` with the help of [conda]
```
conda env create -f environment.yml
```
2. Activate the new environment with:
```
conda activate crazyneuraluser
```
3. Install a version of `pytorch` compatible with your hardware (see the [pytorch website](https://pytorch.org/get-started/previous-versions/)). E.g.:
```
pip install torch --extra-index-url https://download.pytorch.org/whl/cu113
```
4. Install `spacy` and download the tokenization tool in spacy:
```
pip install spacy'
python -m spacy download en_core_web_sm
```
### Generating dialogues through agent-agent interaction
To generate dialogues, first change working directory to the `baselines` directory. Run the command
```
python baselines_setup.py
```
to prepare `convlab2` for running the baselines.
#### Generating dialogues conditioned on randomly sampled goals
Select one of the available configurations in the `configs` directory and run the command
```
python simulate_agent_interaction.py --config /rel/path/to/chosen/config
```
to generate dialogues conditioned on randomly sampled goals according to the `convlab2` goal model. The dialogues will be be saved automatically in the `models` directory, under a directory whose name depends on the configuration run. The `models` directory is located in the parent directory of the `baselines` directory. The `metadata.json` file saved with the dialogues contains information about the data generation process.
#### Generating dialogues conditioned on `MultiWOZ2.1` goals
To generate the entire corpus, simply pass the `--goals-path /path/to/multiwoz2.1/data.json/file` flag to `simulate_agent_interaction.py`. To generate the `test/val` split additionally pass the `--filter-path /path/to/multiwoz2.1/test-or-valListFile` argument to `simulate_agent_interaction.py`. You can use the `generate_multiwoz21_train_id_file` function in `baselines/utils.py` to generate `trainListFile` which can then be passed via the `--filter-path` argument to the dialogue generation script in order to generate dialogues conditioned on the `MultiWOZ2.1` training goals.
### Converting the generated dialogues to SGD-like format
The `create_data_from_multiwoz.py` script can be used to convert the generated dialogues to SGD format, necessary for evaluation. It is based on the script provided by Google for DSTC8, but with additional functionality such as:
- conversion of slot names as annotated in the MultiWOZ 2.1 dialogue acts to different slot names, specified through the `--slots_convention` argument. Options are `multiwoz22` to convert the slots to the same slots as defined in the MultiWOZ 2.2 dataset whreas the `multiwoz_goals` converts the slot names to the names used in the dialogue goal and state tracking annotations.
- addition of system and user `nlu` fields for every turn
- option to perform cleaning operations on the goals to ensure a standard format is received by the evaluator.
The conversion is done according to the `schema.json` file in the `baselines` directory, which is the same as used by `DSTC8` conversion except for the addition of the `police` domain. Type ``python create_data_from_multiwoz.py --helpfull`` to see a full list of flags and usage.
## Installation
The recommended way to use this repository is to develop the core code under `src/crazyneuraluser`. The experiments/exporatory analysis making use of the core package code should be placed outside the library and imported. See more guidance under the [Project Organisation](#project-organization) section below.
To create an environment for the package, make sure you have deactivated all `conda` environments. Then:
1. Create an environment `crazyneuraluser` with the help of [conda]:
```
conda env create -f environment.yml
```
2. Add the developer dependencies to this environment with the help of [conda]:
```
conda env update -f dev_environment.yml
```
Optional and needed only once after `git clone`:
3. install several [pre-commit] git hooks with:
```bash
pre-commit install
# You _are encouraged_ to run `pre-commit autoupdate`
```
and checkout the configuration under `.pre-commit-config.yaml`.
The `-n, --no-verify` flag of `git commit` can be used to deactivate pre-commit hooks temporarily.
4. install [nbstripout] git hooks to remove the output cells of committed notebooks with:
```bash
nbstripout --install --attributes notebooks/.gitattributes
```
This is useful to avoid large diffs due to plots in your notebooks.
A simple `nbstripout --uninstall` will revert these changes.
Then take a look into the `scripts` and `notebooks` folders.
## Dependency Management & Reproducibility
1. Always keep your abstract (unpinned) dependencies updated in `environment.yml` and eventually
in `setup.cfg` if you want to ship and install your package via `pip` later on.
2. Create concrete dependencies as `environment.lock.yml` for the exact reproduction of your
environment with:
```bash
conda env export -n crazyneuraluser -f environment.lock.yml
```
For multi-OS development, consider using `--no-builds` during the export.
3. Update your current environment with respect to a new `environment.lock.yml` using:
```bash
conda env update -f environment.lock.yml --prune
```
## Project Organization
```
βββ AUTHORS.md <- List of developers and maintainers.
βββ CHANGELOG.md <- Changelog to keep track of new features and fixes.
βββ LICENSE.txt <- License as chosen on the command-line.
βββ README.md <- The top-level README for developers.
βββ configs <- Directory for configurations of model & application.
βββ data
β βββ external <- Data from third party sources.
β βββ interim <- Intermediate data that has been transformed.
β βββ processed <- The final, canonical data sets for modeling.
β βββ raw <- The original, immutable data dump.
βββ docs <- Directory for Sphinx documentation in rst or md.
βββ environment.yml <- The conda environment file for reproducibility.
βββ models <- Trained and serialized models, model predictions,
β or model summaries.
βββ notebooks <- Jupyter notebooks. Naming convention is a number (for
β ordering), the creator's initials and a description,
β e.g. `1.0-fw-initial-data-exploration`.
βββ pyproject.toml <- Build system configuration. Do not change!
βββ references <- Data dictionaries, manuals, and all other materials.
βββ reports <- Generated analysis as HTML, PDF, LaTeX, etc.
β βββ figures <- Generated plots and figures for reports.
βββ scripts <- Analysis and production scripts which import the
β actual Python package, e.g. train_model.py.
βββ setup.cfg <- Declarative configuration of your project.
βββ setup.py <- Use `pip install -e .` to install for development or
| or create a distribution with `tox -e build`.
βββ src
β βββ crazyneuraluser <- Actual Python package where the main functionality goes.
βββ tests <- Unit tests which can be run with `py.test`.
βββ .coveragerc <- Configuration for coverage reports of unit tests.
βββ .isort.cfg <- Configuration for git hook that sorts imports.
βββ .pre-commit-config.yaml <- Configuration of pre-commit git hooks.
```
<!-- pyscaffold-notes -->
## Note
This project has been set up using [PyScaffold] 4.0.1 and the [dsproject extension] 0.6.1.
[conda]: https://docs.conda.io/
[pre-commit]: https://pre-commit.com/
[Jupyter]: https://jupyter.org/
[nbstripout]: https://github.com/kynan/nbstripout
[Google style]: http://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings
[PyScaffold]: https://pyscaffold.org/
[dsproject extension]: https://github.com/pyscaffold/pyscaffoldext-dsproject
|