lava-policy-multiwoz

This is the best performing LAVA_kl model from the LAVA paper which can be used as a word-level policy module in ConvLab3 pipeline.

Refer to ConvLab-3 for model description and usage.

Training procedure

The model was trained on MultiWOZ 2.0 data using the LAVA codebase. The model started with VAE pre-training and fine-tuning with informative prior KL loss, followed by corpus-based RL with REINFORCE.

Training hyperparameters

The following hyperparameters were used during SL training:

y_size: 10
k_size: 20
beta: 0.1
simple_posterior: true
contextual_posterior: false
learning_rate: 1e-03
max_vocab_size: 1000
max_utt_len: 50
max_dec_len: 30
backward_size: 2
train_batch_size: 128
seed: 58
optimizer: Adam
num_epoch: 100 with early stopping based on validation set

The following hyperparameters were used during RL training:

tune_pi_only: false
max_words: 100
temperature: 1.0
episode_repeat: 1.0
rl_lr: 0.01
momentum: 0.0
nesterov: false
gamma: 0.99
rl_clip: 5.0
random_seed: 38