Update README.md
Browse files
README.md
CHANGED
@@ -27,9 +27,8 @@ print(f"Tokens:\n\t{output.input_ids}")
|
|
27 |
|
28 |
## Notes
|
29 |
|
30 |
-
1. the default tokenizer (on branch `main`) has a vocab size of
|
31 |
-
|
32 |
-
|
33 |
|
34 |
<details>
|
35 |
<summary>How to Tokenize Text and Retrieve Offsets</summary>
|
|
|
27 |
|
28 |
## Notes
|
29 |
|
30 |
+
1. the default tokenizer (on branch `main`) has a vocab size of 32000
|
31 |
+
2. based on the `SentencePieceBPETokenizer` class
|
|
|
32 |
|
33 |
<details>
|
34 |
<summary>How to Tokenize Text and Retrieve Offsets</summary>
|