File size: 699 Bytes
7d60695 b71d5e0 ae356a8 427c9dd ae356a8 427c9dd ae356a8 427c9dd ae356a8 427c9dd ae356a8 427c9dd |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
---
license: mit
datasets:
- BiniyamAjaw/amharic_dataset_v2
language:
- am
---
# Amharic Tokenizer
<!-- The model is trained on a vast amharic data to tokenize unseen data into tokens. -->
## Model Details
- **Vocabulary Size:** 100,000
- **Tokenizer Type:** Byte-Pair Encoder
### Model Description
<!-- Tokenizer that uses BPE -->
- **Developed by:** Biniyam Ajaw
- **Language(s) (NLP):** Amharic and Amharic-Driven Languages
- **License:** MIT
### Model Sources [optional]
- **Repository:** https://github.com/biniyam69/Amharic-LLM-Finetuning/
## Uses
Model can be called by the autotokenizer module from the transformers package and can be used to tokenize any amharic text perfectly |