--- language: - en license: apache-2.0 tags: - dialogue policy - task-oriented dialog --- # lava-policy-multiwoz This is the best performing LAVA_kl model from the [LAVA paper](https://aclanthology.org/2020.coling-main.41/) which can be used as a word-level policy module in ConvLab3 pipeline. Refer to [ConvLab-3](https://github.com/ConvLab/ConvLab-3) for model description and usage. ## Training procedure The model was trained on MultiWOZ 2.0 data using the [LAVA codebase](https://gitlab.cs.uni-duesseldorf.de/general/dsml/lava-public). The model started with VAE pre-training and fine-tuning with informative prior KL loss, followed by corpus-based RL with REINFORCE. ### Training hyperparameters The following hyperparameters were used during SL training: - y_size: 10 - k_size: 20 - beta: 0.1 - simple_posterior: true - contextual_posterior: false - learning_rate: 1e-03 - max_vocab_size: 1000 - max_utt_len: 50 - max_dec_len: 30 - backward_size: 2 - train_batch_size: 128 - seed: 58 - optimizer: Adam - num_epoch: 100 with early stopping based on validation set The following hyperparameters were used during RL training: - tune_pi_only: false - max_words: 100 - temperature: 1.0 - episode_repeat: 1.0 - rl_lr: 0.01 - momentum: 0.0 - nesterov: false - gamma: 0.99 - rl_clip: 5.0 - random_seed: 38