What is the main difference between decoder only transformer and encoder only transformer?
The main difference is that the encoder-only transformer looks at the input sequence from both directions, both front and the back, this means they can better understand the relationships between each token in the sequence. On the other hand, the decoder-only transformer only look at one direction of the input, which means they are better suited for autoregressive language tasks such as text generation.