Update README.md
Browse files
README.md
CHANGED
@@ -13,6 +13,18 @@ library_name: transformers
|
|
13 |
After minutes of hard work, it is now available.
|
14 |
|
15 |
|
16 |
-
```
|
|
|
|
|
|
|
|
|
17 |
|
|
|
|
|
|
|
18 |
```
|
|
|
|
|
|
|
|
|
|
|
|
13 |
After minutes of hard work, it is now available.
|
14 |
|
15 |
|
16 |
+
```python
|
17 |
+
from transformers import AutoTokenizer
|
18 |
+
tokenizer = AutoTokenizer.from_pretrained("BEE-spoke-data/BeeTokenizer")
|
19 |
+
|
20 |
+
test_string = "When dealing with Varroa destructor mites, it's crucial to administer the right acaricides during the late autumn months, but only after ensuring that the worker bee population is free from pesticide contamination."
|
21 |
|
22 |
+
output = tokenizer(test_string)
|
23 |
+
print(f"Test string: {test_string}")
|
24 |
+
print(f"Tokens:\n\t{output.input_ids}")
|
25 |
```
|
26 |
+
|
27 |
+
|
28 |
+
## Notes
|
29 |
+
|
30 |
+
- the default tokenizer (on branch `main`) has a vocab size of 32128
|