DWDMaiMai commited on
Commit
2e78705
1 Parent(s): fe9a84b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -0
README.md CHANGED
@@ -1,3 +1,39 @@
1
  ---
2
  license: mit
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ library_name: transformers
4
+ tags:
5
+ - tokenizers
6
  ---
7
+
8
+ # Tiktoken cl100k_base/gpt4 Tokenizer
9
+
10
+ ## Convert script
11
+ modify from https://gist.github.com/xenova/a452a6474428de0182b17605a98631ee
12
+
13
+ ## Example usage:
14
+ ```py
15
+ import transformers
16
+
17
+ tokenizer = transformers.AutoTokenizer.from_pretrained("DWDMaiMai/tiktoken_cl100k_base")
18
+ assert [15339, 1917, 0] == tokenizer.encode("hello world!")
19
+
20
+ messages = [
21
+ {"role": "user", "content": "Hello, how are you?"},
22
+ {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
23
+ {"role": "user", "content": "I'd like to show off how chat templating works!"},
24
+ ]
25
+ assert """<|im_start|>user
26
+ Hello, how are you?<|im_end|>
27
+ <|im_start|>assistant
28
+ I'm doing great. How can I help you today?<|im_end|>
29
+ <|im_start|>user
30
+ I'd like to show off how chat templating works!<|im_end|>
31
+ <|im_start|>assistant
32
+ """ == tokenizer.apply_chat_template(
33
+ messages,
34
+ tokenize=False,
35
+ add_generation_prompt=True,
36
+ )
37
+ ```
38
+
39
+ ## Relevant