mkshing commited on
Commit
77e084e
1 Parent(s): 66107eb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -0
README.md CHANGED
@@ -1,3 +1,28 @@
1
  ---
2
  license: gpl-2.0
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: gpl-2.0
3
+ language:
4
+ - en
5
+ - ja
6
+ tags:
7
+ - tokenizer
8
+ - novelai
9
+ - sentencepiece
10
  ---
11
+
12
+ # NovelAI Tokenizer v1
13
+ This repository is exactly the same as [NovelAI/nerdstash-tokenizer-v1](https://huggingface.co/NovelAI/nerdstash-tokenizer-v1),
14
+ but the config has been changed to address the following points (the sp model itself is not changed).
15
+
16
+ - Load as T5Tokenizer
17
+ - Enable to decode digits (In the original, digits are registered as `additional_special_tokens`, so if `skip_special_tokens=True` when decoding, the digits are also skipped.)
18
+
19
+ ```python
20
+
21
+ from transformers import AutoTokenizer
22
+
23
+ tokenizer = AutoTokenizer.from_pretrained("mkshing/novelai-tokenizer-v1", use_fast=False)
24
+
25
+ text = "1+1=3"
26
+ tokenizer.decode(tokenizer.encode(text), skip_special_tokens=True)
27
+ # '1+1=3'
28
+ ```