Tokenizer format needed

#12
by Danny5050 - opened

I'm in the process of designing a Vector Database to complement Falcon 180B for our knowledge domain data. Could you please clarify which tokenizer format is utilized: BPE, Wordpiece, or SentencePiece? Ensuring a match is crucial for our integration.

Sign up or log in to comment