--- license: gpl-2.0 language: - en - ja tags: - tokenizer - novelai - sentencepiece --- # NovelAI Tokenizer v1 This repository is exactly the same as [NovelAI/nerdstash-tokenizer-v1](https://huggingface.co/NovelAI/nerdstash-tokenizer-v1), but the config has been changed to address the following points (the sp model itself is not changed). - Load as T5Tokenizer - Enable to decode digits (In the original, digits are registered as `additional_special_tokens`, so if `skip_special_tokens=True` when decoding, the digits are also skipped.) ```python from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("mkshing/novelai-tokenizer-v1", use_fast=False) text = "1+1=3" tokenizer.decode(tokenizer.encode(text), skip_special_tokens=True) # '1+1=3' ```