How does this work? Thanks!

#1
by YaTharThShaRma999 - opened

It seems really intresting but how does it work? You did give a description but I didn’t really understand it. Could you explain it a bit further but a bit simpler? Thanks anyway!

Basically it's an encoder-decoder model made from a pretrained GPT-N (here, Mistral 7B). In the first phase it is trained to encode and decode a 64 token sentence. In the next phase I train a main "router head" which reconstructs 64 tokens from an embedding and then predicts the next 64 tokens. You can then use the resulting model to guide sampling, encode text for later retrieval, etc.

Ah, ok. Thanks for explaining it now!

Sign up or log in to comment