nilq commited on
Commit
7357b8a
1 Parent(s): 11d0588

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -1
README.md CHANGED
@@ -5,13 +5,30 @@ language:
5
  tags:
6
  - babylm
7
  - tokenizer
8
- library_name: transformers
9
  ---
10
 
11
  ## Baby Tokenizer
12
 
13
  Compact sentencepiece tokenizer for sample-efficient English language modeling, simply tokenizing natural language.
14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ### Data
16
 
17
  This tokeniser is derived from the BabyLM 100M dataset of mixed domain data, consisting of the following sources:
 
5
  tags:
6
  - babylm
7
  - tokenizer
 
8
  ---
9
 
10
  ## Baby Tokenizer
11
 
12
  Compact sentencepiece tokenizer for sample-efficient English language modeling, simply tokenizing natural language.
13
 
14
+ ### Usage
15
+
16
+ #### Transformers
17
+
18
+ ```py
19
+ from transformers import AutoTokenizer
20
+
21
+ tokenizer_baby = AutoTokenizer.from_pretrained("nilq/baby-tokenizer")
22
+ ```
23
+
24
+ #### Tokenizers
25
+
26
+ ```py
27
+ from tokenizers import Tokenizer
28
+
29
+ tokenizer_baby = Tokenizer.from_pretrained("nilq/baby-tokenizer")
30
+ ```
31
+
32
  ### Data
33
 
34
  This tokeniser is derived from the BabyLM 100M dataset of mixed domain data, consisting of the following sources: