Inquiry about the implementation of TCPGen in Transformer-based ASR model

#2
by woshinibaba - opened

Hi, greatly thanks for your nice work on the contextual ASR decoder!

I am very interested in this biasing decoding technique illustrated in your paper(https://arxiv.org/abs/2109.00627).

Have you tried to implement the TCPGen in the Transformer-based ASR model?

As illustrated in the paper, the query combines the context vector \boldsymbol{c}_{i} and the previously decoded token \boldsymbol{y}_{i-1}. However, for the transformer-based ASR model, I am a little bit confused about which context vector is used since the decoder usually has 6 attention blocks, and each attention block will attend the encoded input to a context matrix.

Could you give me more clues about it? Thanks!

Hi. Thanks for your interest! Please check our latest work on integrating TCPGen into Whisper: https://arxiv.org/abs/2306.01942. This is an example of applying it to a Transformer-based ASR decoder and how it performs without changing any original model parameter. So here, TCPGen works as a distribution-level adaptor.

Sign up or log in to comment