orofido commited on
Commit
a0393a4
1 Parent(s): 51bfa4c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -4,4 +4,6 @@ language:
4
  - km
5
  - en
6
  ---
7
- The tokenizer is trained with only Khmer/English. The corpus trained with approximiately 3,000 links using SentencePiece with a similar configuration comparing to Llama3.
 
 
 
4
  - km
5
  - en
6
  ---
7
+ The tokenizer is trained with only Khmer/English. The corpus trained with approximiately 3,000 links using SentencePiece with a similar configuration comparing to Llama3.
8
+
9
+ Tho model card has 7152 vocab size and its type is Byte Pair Encoding.