amharic_tokenizer / README.md
BiniyamAjaw's picture
Update README.md
427c9dd verified
|
raw
history blame contribute delete
No virus
699 Bytes
metadata
license: mit
datasets:
  - BiniyamAjaw/amharic_dataset_v2
language:
  - am

Amharic Tokenizer

Model Details

  • Vocabulary Size: 100,000
  • Tokenizer Type: Byte-Pair Encoder

Model Description

  • Developed by: Biniyam Ajaw
  • Language(s) (NLP): Amharic and Amharic-Driven Languages
  • License: MIT

Model Sources [optional]

Uses

Model can be called by the autotokenizer module from the transformers package and can be used to tokenize any amharic text perfectly