--- library_name: transformers tags: - tokenizer - mlm license: mit --- # claude tokenizer: mlm A variant of [Xenova/claude-tokenizer](https://huggingface.co/Xenova/claude-tokenizer) with some small changes to support usage as an MLM tokenizer. ```py from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained('pszemraj/claude-tokenizer-mlm') text = "Hello, this is a test input." ids = tokenizer(text) print(tokenizer.decode(ids['input_ids'], skip_special_tokens=False)) # Hello, this is a test input. len(tokenizer) # 65004 ```