File size: 699 Bytes
7d60695
 
b71d5e0
 
 
 
ae356a8
 
 
 
 
 
 
427c9dd
 
ae356a8
 
 
427c9dd
ae356a8
 
 
427c9dd
 
 
ae356a8
 
 
427c9dd
ae356a8
 
 
 
427c9dd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
---
license: mit
datasets:
- BiniyamAjaw/amharic_dataset_v2
language:
- am
---
# Amharic Tokenizer

<!-- The model is trained on a vast amharic data to tokenize unseen data into tokens. -->


## Model Details
- **Vocabulary Size:** 100,000
- **Tokenizer Type:** Byte-Pair Encoder

### Model Description

<!-- Tokenizer that uses BPE -->



- **Developed by:** Biniyam Ajaw
- **Language(s) (NLP):** Amharic and Amharic-Driven Languages
- **License:** MIT

### Model Sources [optional]

- **Repository:** https://github.com/biniyam69/Amharic-LLM-Finetuning/


## Uses

Model can be called by the autotokenizer module from the transformers package and can be used to tokenize any amharic text perfectly