killawhale2 commited on
Commit
e977525
0 Parent(s):

Upload tokenizer.json, README with huggingface_hub

Browse files
Files changed (3) hide show
  1. .gitattributes +35 -0
  2. README.md +29 -0
  3. tokenizer.json +0 -0
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ Upstage `solar-pro-preview` tokenizer
6
+ - Vocab size: 32,128
7
+ - Langauge support: mainly English
8
+
9
+ Please use this tokenizer for tokenizing inputs for the Upstage [solar-pro-preview](https://developers.upstage.ai/docs/apis/chat) model.
10
+
11
+ You can load it with the tokenizer library like this:
12
+
13
+ ```python
14
+ from tokenizers import Tokenizer
15
+
16
+ tokenizer = Tokenizer.from_pretrained("upstage/solar-pro-preview-tokenizer")
17
+ text = "Hi, how are you?"
18
+ enc = tokenizer.encode(text)
19
+ print("Encoded input:")
20
+ print(enc)
21
+
22
+ inv_vocab = {v: k for k, v in tokenizer.get_vocab().items()}
23
+ tokens = [inv_vocab[token_id] for token_id in enc.ids]
24
+ print("Tokens:")
25
+ print(tokens)
26
+
27
+ number_of_tokens = len(enc.ids)
28
+ print("Number of tokens:", number_of_tokens)
29
+ ```
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff