orofido
/

tok7152.model

Model card Files Files and versions Community

orofido commited on Jul 29

Commit

a0393a4

•

1 Parent(s): 51bfa4c

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -4,4 +4,6 @@ language:
 - km
 - en
 ---
-The tokenizer is trained with only Khmer/English. The corpus trained with approximiately 3,000 links using SentencePiece with a similar configuration comparing to Llama3.

 - km
 - en
 ---
+The tokenizer is trained with only Khmer/English. The corpus trained with approximiately 3,000 links using SentencePiece with a similar configuration comparing to Llama3.
+Tho model card has 7152 vocab size and its type is Byte Pair Encoding.