sangjeedondrub commited on
Commit
92a4e96
1 Parent(s): f8c585f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -0
README.md ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - bo
5
+ tags:
6
+ - tibetan,tokenization,sentencepiece
7
+ ---
8
+
9
+ # marpa-tokenizer
10
+
11
+ > A LlamaTokenizer with support to tokenize Tibetan text.
12
+
13
+
14
+ Example:
15
+
16
+ ```sh
17
+ ['▁རྒྱལ་ཡོངས་', '▁ཀྱི་', '▁དོ་ཁུར་', '▁དང་', '▁འཛམ་གླིང་', '▁ཡོངས་', '▁ཀྱི་', '▁དོ་སྣང་', '▁ཁྲོད', '▁།']
18
+ ['▁我们', '认为', '下面', '这些', '真理', '是不', '言', '而', '喻', '的']
19
+ ```