BiniyamAjaw
/

amharic_tokenizer

Model card Files Files and versions Community

amharic_tokenizer / README.md

BiniyamAjaw's picture

Update README.md

427c9dd verified 7 months ago

|

history blame contribute delete

No virus

699 Bytes

metadata

license: mit
datasets:
  - BiniyamAjaw/amharic_dataset_v2
language:
  - am

Amharic Tokenizer

Model Details

Vocabulary Size: 100,000
Tokenizer Type: Byte-Pair Encoder

Model Description

Developed by: Biniyam Ajaw
Language(s) (NLP): Amharic and Amharic-Driven Languages
License: MIT

Model Sources [optional]

Repository: https://github.com/biniyam69/Amharic-LLM-Finetuning/

Uses

Model can be called by the autotokenizer module from the transformers package and can be used to tokenize any amharic text perfectly