description for special tokens

#21
by Iamexperimenting - opened

Hi team,

can you please provide some description for the below special token used for sqlcoder model? like what each special token refers to? it will help us to understand what it is.

['<s>', '</s>', '<unk>', '▁<PRE>', '▁<MID>', '▁<SUF>', '▁<EOT>', '▁<PRE>', '▁<MID>', '▁<SUF>', '▁<EOT>']

@jp-defog @rishdotblog

Defog.ai org

Hi @Iamexperimenting these are the exact same special tokens that codellama uses. The first 2 are the beginning/end of sequence tokens, unk is for unknown tokens, while the rest are for infilling (which we do not support, but kept for backwards compatibility in case you want to test out its code infilling abilities). You can check out their documentation here:
https://huggingface.co/docs/transformers/model_doc/code_llama#transformers.CodeLlamaTokenizer

Sign up or log in to comment