Embedding handeling question

#3
by Helios888 - opened

Hello,

I am not sure I understand this

# extract residue embeddings for the first ([0,:]) sequence in the batch and remove padded & special tokens, incl. prefix ([0,1:8]) 
emb_0 = embedding_repr_train.last_hidden_state[0,1:8] # shape (7 x 1024)

Does it mean that when skipping the first token, both the cls token and the added prefix embeddings are skipped?
If so I still don't understand, I believed each letter was a token, and the prefix ("" in my case) is constituted of multiple letter.

To extract all relevent embeddings of the first sequence of the batch, skipping unwanted tokens, is it enough to remove padding tokens after running this line?

emb_0 = embedding_repr_train.last_hidden_state[0,1:]

Sign up or log in to comment