XLM Roberta Tokenizer trained with 162M tokens of Khmer text.
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("seanghay/xlm-roberta-khmer-32k-tokenizer")
tokenizer.tokenize("αα½ααααΈααααα»ααΆ!")
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no pipeline_tag.