gsaltintas's picture
Update README.md
2c3d66c verified
metadata
datasets:
  - flexitok/mod-arithmetic

Super Vocabulary

A merged super-vocabulary built from 9 tokenizer(s).

Vocab size: 100007

Tokenizers

  • flexitok/mod-tokenizers-individual
  • flexitok/mod-tokenizers-ltr_3digit
  • flexitok/mod-tokenizers-ltr_2digit
  • flexitok/mod-tokenizers-ltr_4digit
  • flexitok/mod-tokenizers-ltr_5digit
  • flexitok/mod-tokenizers-rtl_2digit
  • flexitok/mod-tokenizers-rtl_3digit
  • flexitok/mod-tokenizers-rtl_4digit
  • flexitok/mod-tokenizers-rtl_5digit

Files

  • super_vocab.json — merged vocabulary mapping token string → super index
  • config.yaml — model config with vocab_size
  • participating_tokenizers.json — list of tokenizer names included
  • <tokenizer>_super_mapping.json — per-tokenizer index → super index mapping
  • <tokenizer>_vocab.json — per-tokenizer vocabulary
  • <tokenizer>_info.json / .yaml — tokenizer metadata