File size: 783 Bytes
3e76623
 
77e084e
 
 
 
 
 
 
3e76623
77e084e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
---
license: gpl-2.0
language:
- en
- ja
tags:
- tokenizer
- novelai
- sentencepiece
---

# NovelAI Tokenizer v1
This repository is exactly the same as [NovelAI/nerdstash-tokenizer-v1](https://huggingface.co/NovelAI/nerdstash-tokenizer-v1), 
but the config has been changed to address the following points (the sp model itself is not changed). 

- Load as T5Tokenizer
- Enable to decode digits (In the original, digits are registered as `additional_special_tokens`, so if `skip_special_tokens=True` when decoding, the digits are also skipped.)

```python

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("mkshing/novelai-tokenizer-v1", use_fast=False)

text = "1+1=3"
tokenizer.decode(tokenizer.encode(text), skip_special_tokens=True)
# '1+1=3'
```