selective_pre_translation / utils /languages_by_word_count.csv
Anonymous
add utils
32eac11
Unnamed: 0,Language,number of words,percentage of total words
0,English,181014683608,92.64708%
1,French,3553061536,1.81853%
2,German,2870869396,1.46937%
3,Spanish,1510070974,0.77289%
4,Italian,1187784217,0.60793%
5,Portuguese,1025413869,0.52483%
6,Dutch,669055061,0.34244%
7,Russian,368157074,0.18843%
8,Romanian,308182352,0.15773%
9,Polish,303812362,0.15550%
10,Finnish,221644679,0.11344%
11,Danish,221551540,0.11339%
12,Swedish,220920577,0.11307%
13,Japanese,217047918,0.11109%
14,Norwegian,212193299,0.10860%
15,Chinese,193517396,0.09905%
16,Czech,139918438,0.07161%
17,Hungarian,127224375,0.06512%
18,Indonesian,116930321,0.05985%
19,Turkish,116141938,0.05944%
20,Croatian,101613675,0.05201%
21,Vietnamese,83077650,0.04252%
22,Greek,61607673,0.03153%
23,Arabic,60839973,0.03114%
24,Serbian,52875283,0.02706%
25,Chinese (Traditional),38583893,0.01975%
26,Catalan,35126650,0.01798%
27,Korean,33147663,0.01697%
28,Slovak,27957963,0.01431%∆
29,Thai,26806557,0.01372%
30,Slovenian,26037337,0.01333%
31,Estonian,20718080,0.01060%
32,Persian,16731301,0.00856%
33,Hebrew,15027640,0.00769%
34,Ukrainian,14905898,0.00763%
35,Malay,13389340,0.00685%
36,Latvian,13290098,0.00680%
37,Bosnian,13160941,0.00674%
38,Lithuanian,12921255,0.00661%
39,Icelandic,12792837,0.00655%
40,Hindi,9434632,0.00483%
41,Albanian,9253803,0.00474%
42,Filipino,8650331,0.00443%
43,Galician,6947527,0.00356%
44,Javanese,6604056,0.00338%
45,Bulgarian,5919807,0.00303%
46,Afrikaans,5461216,0.00280%
47,Tamil,5163171,0.00264%
48,Marathi,3660217,0.00187%
49,Welsh,3459671,0.00177%
50,Malayalam,3227746,0.00165%
51,Bangla,3003033,0.00154%
52,Irish,2878943,0.00147%
53,Azerbaijani,2496202,0.00128%
54,Kannada,1913389,0.00098%
55,Burmese,1853421,0.00095%
56,Telugu,1638366,0.00084%
57,Uzbek,1458861,0.00075%
58,Kinyarwanda,1430208,0.00073%
59,Cebuano,1329456,0.00068%
60,Nepali,1120450,0.00057%
61,Kurdish,1091032,0.00056%
62,Basque,1048905,0.00054%
63,Khmer,1041164,0.00053%
64,Georgian,924256,0.00047%
65,Scottish Gaelic,841970,0.00043%
66,Armenian,840171,0.00043%
67,Maltese,748610,0.00038%
68,Sinhala,708343,0.00036%
69,Punjabi,703086,0.00036%
70,Urdu,689768,0.00035%
71,Kazakh,670231,0.00034%
72,Swahili,585858,0.00030%
73,Southern Sotho,538257,0.00028%
74,Belarusian,533405,0.00027%
75,Macedonian,529413,0.00027%
76,Malagasy,507043,0.00026%
77,Gujarati,494798,0.00025%
78,Lao,449476,0.00023%
79,Haitian Creole,430911,0.00022%
80,Ganda,261217,0.00013%
81,Yiddish,227609,0.00012%
82,Tajik,210167,0.00011%
83,Sundanese,208819,0.00011%
84,Hmong,175972,0.00009%
85,Nyanja,161994,0.00008%
86,Odia,131688,0.00007%
87,Divehi,112819,0.00006%
88,Kyrgyz,91289,0.00005%
89,Bihari languages,48094,0.00002%
90,Unknown language [xx] (Gothic),48025,0.00002%
91,Unknown language [xx] (Runic),37558,0.00002%
92,Inuktitut,31142,0.00002%
93,Syriac,21482,0.00001%
94,Mongolian,7779,0.00000%
95,Unknown language [xx] (Phoenician),4343,0.00000%
96,Unknown language [xx] (Unknown Script [Qaai]),4185,0.00000%
97,Unknown language [xx] (Egyptian hieroglyphs),3395,0.00000%
98,Unknown language [xx] (N’Ko),3338,0.00000%
99,Unknown language [xx] (Tifinagh),3277,0.00000%
100,Unknown language [xx] (Chakma),2608,0.00000%
101,Unknown language [xx] (Yi),2357,0.00000%
102,Cherokee,2315,0.00000%
103,Unknown language [xx] (Phags-pa),1750,0.00000%
104,Unknown language [xx] (Tai Viet),1622,0.00000%
105,Unknown language [xx] (Deseret),1504,0.00000%
106,Unknown language [xx] (Javanese),1448,0.00000%
107,Unknown language [xx] (Sundanese),780,0.00000%
108,Unknown language [xx] (Coptic),707,0.00000%
109,Unknown language [xx] (Glagolitic),673,0.00000%
110,Unknown language [xx] (Ol Chiki),573,0.00000%
111,Unknown language [xx] (Shavian),542,0.00000%
112,Unknown language [xx] (Samaritan),313,0.00000%
113,Unknown language [xx] (Avestan),213,0.00000%
114,Unknown language [xx] (Bopomofo),188,0.00000%
115,Unknown language [xx] (Linear B),156,0.00000%
116,Unknown language [xx] (Ogham),84,0.00000%
117,Unknown language [xx] (Cham),49,0.00000%