multilingual
sea
nxphi47 commited on
Commit
11bc472
1 Parent(s): 7762fdc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -55,13 +55,13 @@ Our goal for vocabulary expansion is threefold: (1) the number of newly-added to
55
 
56
  As seen in the table below, our new vocabulary reduces the compression ratio from 4.29 to 1.57 for Thai - meaning it can now encode 2.7x longer Thai text given the same context length. Meanwhile, English is only compressed by 0.3%, thus preserving its integrity.
57
 
58
- |Language | Llama's ratio | Our ratio | # New tokens
59
- | --- | --- | --- | --- |
60
- | Vi | 2.91 | 1.2488 | 2304
61
- | Zh | 1.99 | 1.1806 | 3456
62
- | Th | 4.29 | 1.5739 | 1536
63
- | Id | 1.76 | 1.1408 | 3840
64
- | En | 1.00 | 0.9976
65
 
66
 
67
  ### Pre-training Data
 
55
 
56
  As seen in the table below, our new vocabulary reduces the compression ratio from 4.29 to 1.57 for Thai - meaning it can now encode 2.7x longer Thai text given the same context length. Meanwhile, English is only compressed by 0.3%, thus preserving its integrity.
57
 
58
+ |Language | ChatGPT's ratio | Llama's ratio | Our ratio | # New tokens
59
+ | --- | --- | --- | --- | --- |
60
+ | Vi | 4.41 | 2.91 | 1.2488 | 2304
61
+ | Zh | 2.80 | 1.99 | 1.1806 | 3456
62
+ | Th | 9.09 | 4.29 | 1.5739 | 1536
63
+ | Id | 2.00 | 1.76 | 1.1408 | 3840
64
+ | En | 1.00 | 1.00 | 0.9976
65
 
66
 
67
  ### Pre-training Data