Update README.md
Browse files
README.md
CHANGED
@@ -7,7 +7,7 @@ language:
|
|
7 |
library_name: transformers
|
8 |
---
|
9 |
# Amharic BPE Tokenizer
|
10 |
-
This repo contains a **Byte-Pair Encoding** tokenizer trained on the **Amharic** subset of the [oscar](https://huggingface.co/datasets/oscar) dataset. It's the same as the GPT-2 tokenizer but trained from scratch on an amharic dataset with a
|
11 |
|
12 |
# How to use
|
13 |
You can load the tokenizer from huggingface hub as follows.
|
|
|
7 |
library_name: transformers
|
8 |
---
|
9 |
# Amharic BPE Tokenizer
|
10 |
+
This repo contains a **Byte-Pair Encoding** tokenizer trained on the **Amharic** subset of the [oscar](https://huggingface.co/datasets/oscar) dataset. It's the same as the GPT-2 tokenizer but trained from scratch on an amharic dataset with a vocabulary size of `24000`.
|
11 |
|
12 |
# How to use
|
13 |
You can load the tokenizer from huggingface hub as follows.
|