BEE-spoke-data
/

BeeTokenizer

Model card Files Files and versions Community

pszemraj commited on Oct 30, 2023

Commit

3796f48

•

1 Parent(s): 8eaf377

Update README.md

Files changed (1) hide show

README.md +13 -1

README.md CHANGED Viewed

@@ -13,6 +13,18 @@ library_name: transformers
 After minutes of hard work, it is now available.
-```
 ```

 After minutes of hard work, it is now available.
+```python
+from transformers import AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("BEE-spoke-data/BeeTokenizer")
+test_string = "When dealing with Varroa destructor mites, it's crucial to administer the right acaricides during the late autumn months, but only after ensuring that the worker bee population is free from pesticide contamination."
+output = tokenizer(test_string)
+print(f"Test string: {test_string}")
+print(f"Tokens:\n\t{output.input_ids}")
 ```
+## Notes
+- the default tokenizer (on branch `main`) has a vocab size of 32128