Update README.md
Browse files
README.md
CHANGED
@@ -4,4 +4,6 @@ language:
|
|
4 |
- km
|
5 |
- en
|
6 |
---
|
7 |
-
The tokenizer is trained with only Khmer/English. The corpus trained with approximiately 3,000 links using SentencePiece with a similar configuration comparing to Llama3.
|
|
|
|
|
|
4 |
- km
|
5 |
- en
|
6 |
---
|
7 |
+
The tokenizer is trained with only Khmer/English. The corpus trained with approximiately 3,000 links using SentencePiece with a similar configuration comparing to Llama3.
|
8 |
+
|
9 |
+
Tho model card has 7152 vocab size and its type is Byte Pair Encoding.
|