PoetschLab/GROVER · I would like to confirm some information in the paper about genome annotation in the embeddings

Hello. I am going to explore GROVER in my graduation thesis and I just wanted to make sure I understand the part where the genome annotation was added to the embeddings. I would like to explain this part in the document.

From what I gathered, after pretraining, the tokens are annotated with features such as GC content, strand info, repeated elements and gene coordinates.
I'm not sure I understood exactly how the annotations are added/appended to the embeddings. Are they also transformed into numerical vectors? And then appended after each word's embedding or after the whole sequence?

Is that about it? I found this most interesting, thank you so much.