posterior_KaTeMaTa_llama_llama.model
- This is SP format tokenizer obtained by merging Kannada, Telugu, Malayalam, Tamil and Llama-2 tokenizers.
posterior_dr_llama_15_32k_balanced.model posterior_dr_llama_15_32k_balanced.vocab
- These is SP format tokenizer obtained by training the SP tokenizer using the four languages data.