rasyosef commited on
Commit
6bbb3d0
1 Parent(s): 068ddf1

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -0
README.md ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - oscar
5
+ language:
6
+ - am
7
+ library_name: transformers
8
+ ---
9
+ # Amharic BPE Tokenizer
10
+ This repo contains a **Byte-Pair Encoding** tokenizer trained on the **Amharic** subset of the [oscar](https://huggingface.co/datasets/oscar) dataset. It's the same as the GPT-2 tokenizer but trained from scratch on an amharic dataset with a **vocabulary size** of `24000`.
11
+
12
+ # How to use
13
+ You can load the tokenizer from huggingface hub as follows.
14
+ ```python
15
+ from transformers import AutoTokenizer
16
+
17
+ tokenizer = AutoTokenizer.from_pretrained("rasyosef/gpt2-oscar-amharic-tokenizer")
18
+ ```