Stop words

#9
by chirag11 - opened

How to specify the Stop words.

BigCode org

Here is some sample code that does stop tokens:

https://github.com/arjunguha/BigCode-demos/blob/main/bigcode.ipynb

(See Cell 4, the stop_at_stop_tokens function.)

BigCode org

SantaCoder has an <|endoftext|> token which marks the end/beginning of a file, it is now by default in the tokenizer along with the FIM special tokens

loubnabnl changed discussion status to closed

Sign up or log in to comment