rasyosef
/

gpt2-oscar-amharic-tokenizer

Inference Endpoints

Model card Files Files and versions Community

rasyosef commited on Jan 31

Commit

52c136b

•

1 Parent(s): 5b389e4

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -7,7 +7,7 @@ language:
 library_name: transformers
 ---
 # Amharic BPE Tokenizer
-This repo contains a **Byte-Pair Encoding** tokenizer trained on the **Amharic** subset of the [oscar](https://huggingface.co/datasets/oscar) dataset. It's the same as the GPT-2 tokenizer but trained from scratch on an amharic dataset with a **vocabulary size** of `24000`.
 # How to use
 You can load the tokenizer from huggingface hub as follows.

 library_name: transformers
 ---
 # Amharic BPE Tokenizer
+This repo contains a **Byte-Pair Encoding** tokenizer trained on the **Amharic** subset of the [oscar](https://huggingface.co/datasets/oscar) dataset. It's the same as the GPT-2 tokenizer but trained from scratch on an amharic dataset with a vocabulary size of `24000`.
 # How to use
 You can load the tokenizer from huggingface hub as follows.