How does this work? Thanks!

by YaTharThShaRma999 - opened Oct 9, 2023

Oct 9, 2023

It seems really intresting but how does it work? You did give a description but I didn’t really understand it. Could you explain it a bit further but a bit simpler? Thanks anyway!

jdpressman

Owner Oct 11, 2023

Basically it's an encoder-decoder model made from a pretrained GPT-N (here, Mistral 7B). In the first phase it is trained to encode and decode a 64 token sentence. In the next phase I train a main "router head" which reconstructs 64 tokens from an embedding and then predicts the next 64 tokens. You can then use the resulting model to guide sampling, encode text for later retrieval, etc.

YaTharThShaRma999

Oct 11, 2023

Ah, ok. Thanks for explaining it now!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment