--- license: mit datasets: - BiniyamAjaw/amharic_dataset_v2 language: - am --- # Amharic Tokenizer ## Model Details - **Vocabulary Size:** 100,000 - **Tokenizer Type:** Byte-Pair Encoder ### Model Description - **Developed by:** Biniyam Ajaw - **Language(s) (NLP):** Amharic and Amharic-Driven Languages - **License:** MIT ### Model Sources [optional] - **Repository:** https://github.com/biniyam69/Amharic-LLM-Finetuning/ ## Uses Model can be called by the autotokenizer module from the transformers package and can be used to tokenize any amharic text perfectly