SeaLLMs
/

SeaLLM-13B-Chat

Model card Files Files and versions Community

nxphi47 commited on Oct 27, 2023

Commit

11bc472

•

1 Parent(s): 7762fdc

Update README.md

Files changed (1) hide show

README.md +7 -7

README.md CHANGED Viewed

@@ -55,13 +55,13 @@ Our goal for vocabulary expansion is threefold: (1) the number of newly-added to
 As seen in the table below, our new vocabulary reduces the compression ratio from 4.29 to 1.57 for Thai - meaning it can now encode 2.7x longer Thai text given the same context length. Meanwhile, English is only compressed by 0.3%, thus preserving its integrity.
-|Language | Llama's ratio | Our ratio | # New tokens
-| --- | --- | --- | --- |
-| Vi | 2.91 | 1.2488 | 2304
-| Zh | 1.99 | 1.1806 | 3456
-| Th | 4.29 | 1.5739 | 1536
-| Id | 1.76 | 1.1408 | 3840
-| En | 1.00 | 0.9976
 ### Pre-training Data

 As seen in the table below, our new vocabulary reduces the compression ratio from 4.29 to 1.57 for Thai - meaning it can now encode 2.7x longer Thai text given the same context length. Meanwhile, English is only compressed by 0.3%, thus preserving its integrity.
+|Language | ChatGPT's ratio | Llama's ratio | Our ratio | # New tokens
+| --- | --- | --- | --- | --- |
+| Vi | 4.41 | 2.91 | 1.2488 | 2304
+| Zh | 2.80 | 1.99 | 1.1806 | 3456
+| Th | 9.09 | 4.29 | 1.5739 | 1536
+| Id | 2.00 | 1.76 | 1.1408 | 3840
+| En | 1.00 | 1.00 | 0.9976
 ### Pre-training Data