fix Burmese tokens
Browse files
README.md
CHANGED
@@ -44,7 +44,7 @@ SEA-LION was trained on 980B tokens of the following data:
|
|
44 |
| mC4 - Indonesian | 14.7B | 1.50% |
|
45 |
| mC4 - Malay | 2.9B | 0.29% |
|
46 |
| mC4 - Filipino | 5.3B | 0.54% |
|
47 |
-
| mC4 - Burmese |
|
48 |
| mC4 - Vietnamese | 63.4B | 6.46% |
|
49 |
| mC4 - Thai | 21.6B | 2.20% |
|
50 |
| mC4 - Lao | 1.1B | 0.12% |
|
|
|
44 |
| mC4 - Indonesian | 14.7B | 1.50% |
|
45 |
| mC4 - Malay | 2.9B | 0.29% |
|
46 |
| mC4 - Filipino | 5.3B | 0.54% |
|
47 |
+
| mC4 - Burmese | 4.9B | 0.49% |
|
48 |
| mC4 - Vietnamese | 63.4B | 6.46% |
|
49 |
| mC4 - Thai | 21.6B | 2.20% |
|
50 |
| mC4 - Lao | 1.1B | 0.12% |
|