ARCHON Tokenizer v2
BPE tokenizer used by ARCHON ASI. Custom 32K vocabulary + 6 ChatML special tokens.
Vocab info
- Base vocab: 32,000 BPE tokens (custom ARCHON corpus)
- Added: 6 ChatML/tool-calling specials
- Total: 32006
Special tokens
| Token | ID | Use |
|---|---|---|
<pad> |
0 | padding |
<bos> |
1 | begin of sequence |
<eos> |
2 | end of sequence |
| `< | im_start | >` |
| `< | im_end | >` |
| `< | system | >` |
| `< | user | >` |
| `< | assistant | >` |
| `< | tool_call | >` |
| `< | tool_result | >` |
| `< | task_type | >` |
Chat template (ChatML)
Available via tokenizer.apply_chat_template(messages). Renders ChatML format.
Usage
from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained("jescy525/archon-tokenizer-v2")
text = tok.apply_chat_template(
[{"role": "user", "content": "Hello ARCHON"}],
tokenize=False, add_generation_prompt=True,
)
ids = tok.encode(text)
Roundtrip safety
Encoding adds <bos> and <eos> by default. Set add_special_tokens=False to skip.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support