Instructions to use zidsi/Zlatorog-30B-MoE-tokenizer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use zidsi/Zlatorog-30B-MoE-tokenizer with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("zidsi/Zlatorog-30B-MoE-tokenizer", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Zlatorog tokenizer (fixed fast decode)
Fast tokenizer for Zlatorog CPT, derived from Qwen3-30B-A3B-Base with an extended Slovenian/Croatian added vocabulary.
What changed
Rust fast decode corrupts some added tokens when a code point’s low byte is ≤ 32 (e.g. č → \r). See tokenizers#1996 and the upstream fix in tokenizers#1995.
This repo ships ZlatorogTokenizerFast (tokenization_zlatorog.py), which decodes added tokens the same way as the Transformers 4.x slow Zlatorog tokenizer. Token ids and vocabulary strings are unchanged.
Requirements
transformers>=4.45or>=5.0tokenizers>=0.22trust_remote_code=True(loadsZlatorogTokenizerFast)
Usage
from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained(
"zidsi/Zlatorog-30B-MoE-tokenizer",
trust_remote_code=True,
)
word = "Začnimo"
ids = tok.encode(word, add_special_tokens=False)
assert tok.decode(ids) == word
Use this tokenizer with zidsi/Zlatorog-30B-MoE-CPT_Long (or any checkpoint trained with the same vocabulary).
Audit
321 of 25 893 added tokens were affected on the Hub revision audited for the parent model (a2759ee7565dc7c55c9c93c3f9e72190dcf5def4). See the companion repo’s artifacts/affected_added_tokens.json for the full checklist.
Model tree for zidsi/Zlatorog-30B-MoE-tokenizer
Base model
zidsi/Zlatorog-30B-MoE-CPT_Long