espnet/guangzhisun_librispeech100_asr_train_conformer_transducer_tcpgen500_deep_sche30_GCN6L_rep_suffix · Inquiry about the implementation of TCPGen in Transformer-based ASR model

Hi, greatly thanks for your nice work on the contextual ASR decoder!

I am very interested in this biasing decoding technique illustrated in your paper(https://arxiv.org/abs/2109.00627).

Have you tried to implement the TCPGen in the Transformer-based ASR model?

As illustrated in the paper, the query combines the context vector \boldsymbol{c}_{i} and the previously decoded token \boldsymbol{y}_{i-1}. However, for the transformer-based ASR model, I am a little bit confused about which context vector is used since the decoder usually has 6 attention blocks, and each attention block will attend the encoded input to a context matrix.

Could you give me more clues about it? Thanks!