sangjeedondrub
commited on
Commit
•
92a4e96
1
Parent(s):
f8c585f
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
language:
|
4 |
+
- bo
|
5 |
+
tags:
|
6 |
+
- tibetan,tokenization,sentencepiece
|
7 |
+
---
|
8 |
+
|
9 |
+
# marpa-tokenizer
|
10 |
+
|
11 |
+
> A LlamaTokenizer with support to tokenize Tibetan text.
|
12 |
+
|
13 |
+
|
14 |
+
Example:
|
15 |
+
|
16 |
+
```sh
|
17 |
+
['▁རྒྱལ་ཡོངས་', '▁ཀྱི་', '▁དོ་ཁུར་', '▁དང་', '▁འཛམ་གླིང་', '▁ཡོངས་', '▁ཀྱི་', '▁དོ་སྣང་', '▁ཁྲོད', '▁།']
|
18 |
+
['▁我们', '认为', '下面', '这些', '真理', '是不', '言', '而', '喻', '的']
|
19 |
+
```
|