dotw commited on
Commit
3fd7468
1 Parent(s): ba839cc

fix Burmese tokens

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -44,7 +44,7 @@ SEA-LION was trained on 980B tokens of the following data:
44
  | mC4 - Indonesian | 14.7B | 1.50% |
45
  | mC4 - Malay | 2.9B | 0.29% |
46
  | mC4 - Filipino | 5.3B | 0.54% |
47
- | mC4 - Burmese | 1.2B | 0.49% |
48
  | mC4 - Vietnamese | 63.4B | 6.46% |
49
  | mC4 - Thai | 21.6B | 2.20% |
50
  | mC4 - Lao | 1.1B | 0.12% |
 
44
  | mC4 - Indonesian | 14.7B | 1.50% |
45
  | mC4 - Malay | 2.9B | 0.29% |
46
  | mC4 - Filipino | 5.3B | 0.54% |
47
+ | mC4 - Burmese | 4.9B | 0.49% |
48
  | mC4 - Vietnamese | 63.4B | 6.46% |
49
  | mC4 - Thai | 21.6B | 2.20% |
50
  | mC4 - Lao | 1.1B | 0.12% |