--- library_name: transformers license: apache-2.0 --- # claude3 tokenizer for autoregressive/causal ```python from transformers import AutoTokenizer tk = AutoTokenizer.from_pretrained("BEE-spoke-data/claude-tokenizer") tk ``` ``` GPT2TokenizerFast(name_or_path='BEE-spoke-data/claude-tokenizer', vocab_size=65000, model_max_length=200000, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '', 'eos_token': '', 'unk_token': ''}, clean_up_tokenization_spaces=True), added_tokens_decoder={ 0: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 1: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 2: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 3: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 4: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), } In [4]: tk.eos_token_id Out[4]: 0 In [5]: tk.pad_token_id In [6]: tk.unk_token_id Out[6]: 0 In [7]: tk.bos_token_id Out[7]: 0 ```