Alp commited on
Commit
87f167e
·
1 Parent(s): 1928598
Files changed (4) hide show
  1. README.md +44 -2
  2. app.py +1303 -0
  3. app_content.md +43 -0
  4. language-codes-full.csv +488 -0
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  title: Speech Resource Finder
3
- emoji: 🦀
4
  colorFrom: gray
5
  colorTo: pink
6
  sdk: gradio
@@ -10,4 +10,46 @@ pinned: false
10
  short_description: 'Discover ASR and TTS support and resources for any language '
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: Speech Resource Finder
3
+ emoji: 🧭
4
  colorFrom: gray
5
  colorTo: pink
6
  sdk: gradio
 
10
  short_description: 'Discover ASR and TTS support and resources for any language '
11
  ---
12
 
13
+ # Speech Resource Finder
14
+
15
+ ## Description
16
+
17
+ Almost 4 billion people speak languages with little or no speech technology support. This tool makes visible which languages have resources available and which communities are being left behind in the speech AI revolution.
18
+
19
+ Built by CLEAR Global to support language inclusion and help close the digital language divide.
20
+
21
+ ## Data Sources
22
+
23
+ ### Commercial Speech Services
24
+
25
+ Commercial service support is automatically pulled from the language support page of each service provider.
26
+
27
+ - **Azure Speech Services** - [Speech-to-Text](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=stt) | [Text-to-Speech](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=tts)
28
+ - **Google Cloud Speech** - [Speech-to-Text](https://cloud.google.com/speech-to-text/docs/speech-to-text-supported-languages) | [Text-to-Speech](https://cloud.google.com/text-to-speech/docs/voices)
29
+ - **AWS** - [Transcribe](https://docs.aws.amazon.com/transcribe/latest/dg/supported-languages.html) | [Polly](https://docs.aws.amazon.com/polly/latest/dg/supported-languages.html)
30
+ - **ElevenLabs** - [Multilingual v2](https://elevenlabs.io/docs/models#multilingual-v2) | [Turbo v3](https://elevenlabs.io/docs/models#eleven-v3-alpha)
31
+
32
+ ### Open Source Resources
33
+ - **HuggingFace Models** - Pre-trained speech models sorted by downloads
34
+ - [ASR Models](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition)
35
+ - [TTS Models](https://huggingface.co/models?pipeline_tag=text-to-speech)
36
+ - **HuggingFace Datasets** - Speech corpora for training and evaluation
37
+ - [ASR Datasets](https://huggingface.co/datasets?task_categories=task_categories:automatic-speech-recognition)
38
+ - [TTS Datasets](https://huggingface.co/datasets?task_categories=task_categories:text-to-speech)
39
+
40
+ ## How to Use
41
+ 1. Select a language from the dropdown (type to search by name or ISO code)
42
+ 2. Toggle model deduplication if desired (enabled by default)
43
+ 3. Review results: commercial availability, models, and datasets
44
+ 4. Click model/dataset names to open on HuggingFace
45
+
46
+ ## Disclaimer
47
+
48
+ - Currently lists only 487 languages and is taken from this [Github repository](https://github.com/datasets/language-codes).
49
+ - Data fetched in real-time and can change.
50
+ - This is not an exhaustive list. There are other commercial voice technology providers and dataset/model resources that this app doesn't cover.
51
+ - Deduplication discards models with same name uploaded by others and keeps the most downloaded version in the list.
52
+
53
+ ## Feedback
54
+
55
+ We would love to hear your feedback and suggestions. Please write us at tech@clearglobal.org.
app.py ADDED
@@ -0,0 +1,1303 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import pandas as pd
3
+ import requests
4
+ from bs4 import BeautifulSoup
5
+ from functools import lru_cache
6
+ import csv
7
+ from io import StringIO
8
+ import re
9
+
10
+ # Configuration
11
+ LANGUAGE_CODES_FILE = "language-codes-full.csv"
12
+ APP_CONTENT_FILE = "app_content.md"
13
+
14
+ # Language list will be loaded from CSV
15
+ # Structure: {alpha3_b: {"name": str, "alpha3_t": str, "alpha2": str}}
16
+ LANGUAGES = {}
17
+
18
+ # App content will be loaded from markdown file
19
+ APP_CONTENT = {
20
+ "title": "Speech Resource Finder",
21
+ "description": "Search for speech resources",
22
+ "full_content": ""
23
+ }
24
+
25
+ def load_app_content(content_path=None):
26
+ """Load app content from markdown file"""
27
+ global APP_CONTENT
28
+ if content_path is None:
29
+ content_path = APP_CONTENT_FILE
30
+
31
+ try:
32
+ with open(content_path, 'r', encoding='utf-8') as f:
33
+ content = f.read()
34
+
35
+ # Parse markdown content
36
+ lines = content.split('\n')
37
+
38
+ # Extract title (first # heading)
39
+ title = "Speech Resource Finder"
40
+ for line in lines:
41
+ if line.startswith('# '):
42
+ title = line[2:].strip()
43
+ break
44
+
45
+ # Extract description (text after ## Description until next ##)
46
+ description = ""
47
+ in_description = False
48
+ for line in lines:
49
+ if line.startswith('## Description'):
50
+ in_description = True
51
+ continue
52
+ elif in_description and line.startswith('##'):
53
+ break
54
+ elif in_description and line.strip():
55
+ description += line.strip() + " "
56
+
57
+ APP_CONTENT = {
58
+ "title": title,
59
+ "description": description.strip(),
60
+ "full_content": content
61
+ }
62
+ print(f"Loaded app content from {content_path}")
63
+ except Exception as e:
64
+ print(f"Error loading app content: {e}")
65
+ print("Using default content")
66
+
67
+ def load_language_list(csv_path=None):
68
+ """Load ISO 639 language codes from CSV file"""
69
+ global LANGUAGES
70
+ if csv_path is None:
71
+ csv_path = LANGUAGE_CODES_FILE
72
+
73
+ try:
74
+ with open(csv_path, 'r', encoding='utf-8') as f:
75
+ reader = csv.DictReader(f)
76
+ for row in reader:
77
+ # Use alpha3-b as primary key, fallback to alpha3-t if empty
78
+ code_b = row['alpha3-b'].strip()
79
+ code_t = row['alpha3-t'].strip()
80
+ code_2 = row['alpha2'].strip()
81
+ name = row['English'].strip()
82
+
83
+ primary_code = code_b if code_b else code_t
84
+
85
+ if primary_code and name:
86
+ LANGUAGES[primary_code] = {
87
+ "name": name,
88
+ "alpha3_b": code_b,
89
+ "alpha3_t": code_t,
90
+ "alpha2": code_2
91
+ }
92
+ print(f"Loaded {len(LANGUAGES)} languages from {csv_path}")
93
+ except Exception as e:
94
+ print(f"Error loading language list: {e}")
95
+ # Fallback to a minimal set
96
+ LANGUAGES = {
97
+ "eng": {"name": "English", "alpha3_b": "eng", "alpha3_t": "", "alpha2": "en"},
98
+ "spa": {"name": "Spanish", "alpha3_b": "spa", "alpha3_t": "", "alpha2": "es"},
99
+ "fra": {"name": "French", "alpha3_b": "fra", "alpha3_t": "", "alpha2": "fr"},
100
+ "deu": {"name": "German", "alpha3_b": "ger", "alpha3_t": "deu", "alpha2": "de"},
101
+ }
102
+ print(f"Using fallback with {len(LANGUAGES)} languages")
103
+
104
+ @lru_cache(maxsize=1)
105
+ def fetch_azure_asr_languages():
106
+ """Scrape Azure Speech-to-Text supported languages"""
107
+ url = "https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=stt"
108
+
109
+ try:
110
+ response = requests.get(url, timeout=10)
111
+ response.raise_for_status()
112
+ soup = BeautifulSoup(response.content, 'html.parser')
113
+
114
+ # Find the table with locale data
115
+ # The table has columns: Locale (BCP-47) | Language | Fast transcription support | Custom speech support
116
+ tables = soup.find_all('table')
117
+
118
+ azure_asr = {}
119
+ for table in tables:
120
+ rows = table.find_all('tr')
121
+ if not rows:
122
+ continue
123
+
124
+ # Check if this is the right table by looking at headers
125
+ headers = [th.get_text(strip=True) for th in rows[0].find_all('th')]
126
+ if 'Locale' in ' '.join(headers) or 'Language' in ' '.join(headers):
127
+ for row in rows[1:]: # Skip header
128
+ cols = row.find_all('td')
129
+ if len(cols) >= 2:
130
+ locale = cols[0].get_text(strip=True)
131
+ language = cols[1].get_text(strip=True)
132
+ if locale and language:
133
+ azure_asr[locale] = language
134
+ break
135
+
136
+ return azure_asr
137
+ except Exception as e:
138
+ print(f"Error fetching Azure ASR data: {e}")
139
+ return {}
140
+
141
+ @lru_cache(maxsize=1)
142
+ def fetch_azure_tts_languages():
143
+ """Scrape Azure Text-to-Speech supported languages with voice counts"""
144
+ url = "https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=tts"
145
+
146
+ try:
147
+ response = requests.get(url, timeout=10)
148
+ response.raise_for_status()
149
+ soup = BeautifulSoup(response.content, 'html.parser')
150
+
151
+ # Find the TTS table
152
+ # Columns: Locale (BCP-47) | Language | Text to speech voices
153
+ tables = soup.find_all('table')
154
+
155
+ azure_tts = {}
156
+ for table in tables:
157
+ rows = table.find_all('tr')
158
+ if not rows:
159
+ continue
160
+
161
+ headers = [th.get_text(strip=True) for th in rows[0].find_all('th')]
162
+ if 'Text to speech' in ' '.join(headers) or 'voices' in ' '.join(headers).lower():
163
+ for row in rows[1:]:
164
+ cols = row.find_all('td')
165
+ if len(cols) >= 3:
166
+ locale = cols[0].get_text(strip=True)
167
+ language = cols[1].get_text(strip=True)
168
+ voices_text = cols[2].get_text(strip=True)
169
+ # Count number of voices (look for "Neural" in the text)
170
+ voice_count = voices_text.count('Neural')
171
+ if locale and language:
172
+ azure_tts[locale] = {
173
+ 'language': language,
174
+ 'voice_count': voice_count
175
+ }
176
+ break
177
+
178
+ return azure_tts
179
+ except Exception as e:
180
+ print(f"Error fetching Azure TTS data: {e}")
181
+ return {}
182
+
183
+ @lru_cache(maxsize=1)
184
+ def fetch_google_stt_languages():
185
+ """Scrape Google Cloud Speech-to-Text supported languages"""
186
+ url = "https://cloud.google.com/speech-to-text/docs/speech-to-text-supported-languages"
187
+
188
+ try:
189
+ response = requests.get(url, timeout=10)
190
+ response.raise_for_status()
191
+ soup = BeautifulSoup(response.content, 'html.parser')
192
+
193
+ # Find tables with BCP-47 language codes
194
+ tables = soup.find_all('table')
195
+
196
+ google_stt = {}
197
+ for table in tables:
198
+ rows = table.find_all('tr')
199
+ if not rows:
200
+ continue
201
+
202
+ # Check if this table has BCP-47 column
203
+ headers = [th.get_text(strip=True) for th in rows[0].find_all('th')]
204
+
205
+ # Find BCP-47 column index
206
+ bcp47_idx = None
207
+ name_idx = None
208
+ for idx, header in enumerate(headers):
209
+ if 'BCP-47' in header or 'BCP47' in header:
210
+ bcp47_idx = idx
211
+ if 'Name' in header and name_idx is None:
212
+ name_idx = idx
213
+
214
+ if bcp47_idx is not None:
215
+ for row in rows[1:]: # Skip header
216
+ cols = row.find_all('td')
217
+ if len(cols) > bcp47_idx:
218
+ locale = cols[bcp47_idx].get_text(strip=True)
219
+ language = cols[name_idx].get_text(strip=True) if name_idx and len(cols) > name_idx else ''
220
+ if locale and locale not in ['—', '-', '']:
221
+ google_stt[locale] = language
222
+
223
+ return google_stt
224
+ except Exception as e:
225
+ print(f"Error fetching Google STT data: {e}")
226
+ return {}
227
+
228
+ @lru_cache(maxsize=1)
229
+ def fetch_google_tts_languages():
230
+ """Scrape Google Cloud Text-to-Speech supported languages with voice counts"""
231
+ url = "https://cloud.google.com/text-to-speech/docs/voices"
232
+
233
+ try:
234
+ response = requests.get(url, timeout=10)
235
+ response.raise_for_status()
236
+ soup = BeautifulSoup(response.content, 'html.parser')
237
+
238
+ # Find the voices table
239
+ # Columns: Language | Voice type | Language code | Voice name | SSML Gender | Sample
240
+ tables = soup.find_all('table')
241
+
242
+ google_tts = {}
243
+ for table in tables:
244
+ rows = table.find_all('tr')
245
+ if not rows:
246
+ continue
247
+
248
+ headers = [th.get_text(strip=True) for th in rows[0].find_all('th')]
249
+
250
+ # Find Language code column index
251
+ lang_code_idx = None
252
+ for idx, header in enumerate(headers):
253
+ if 'Language code' in header or 'language code' in header.lower():
254
+ lang_code_idx = idx
255
+ break
256
+
257
+ if lang_code_idx is not None:
258
+ for row in rows[1:]:
259
+ cols = row.find_all('td')
260
+ if len(cols) > lang_code_idx:
261
+ locale = cols[lang_code_idx].get_text(strip=True)
262
+ if locale and locale not in ['—', '-', '']:
263
+ # Count voices per locale
264
+ if locale in google_tts:
265
+ google_tts[locale]['voice_count'] += 1
266
+ else:
267
+ language = cols[0].get_text(strip=True) if len(cols) > 0 else ''
268
+ google_tts[locale] = {
269
+ 'language': language,
270
+ 'voice_count': 1
271
+ }
272
+
273
+ return google_tts
274
+ except Exception as e:
275
+ print(f"Error fetching Google TTS data: {e}")
276
+ return {}
277
+
278
+ @lru_cache(maxsize=1)
279
+ def fetch_elevenlabs_multilingual_v2():
280
+ """Get ElevenLabs Multilingual v2 supported languages"""
281
+ # Based on https://elevenlabs.io/docs/models#multilingual-v2
282
+ # These are ISO 639-1 (2-letter) codes
283
+ supported_codes = {
284
+ 'en', 'ja', 'zh', 'de', 'hi', 'fr', 'ko', 'pt', 'it', 'es',
285
+ 'id', 'nl', 'tr', 'fil', 'pl', 'sv', 'bg', 'ro', 'ar', 'cs',
286
+ 'el', 'fi', 'hr', 'ms', 'sk', 'da', 'ta', 'uk', 'ru'
287
+ }
288
+ return supported_codes
289
+
290
+ @lru_cache(maxsize=1)
291
+ def fetch_elevenlabs_turbo_v3():
292
+ """Get ElevenLabs Eleven Turbo v3 (formerly v3 Alpha) supported languages"""
293
+ # Based on https://elevenlabs.io/docs/models#eleven-v3-alpha
294
+ # These are ISO 639-3 (3-letter) codes
295
+ supported_codes = {
296
+ 'afr', 'ara', 'hye', 'asm', 'aze', 'bel', 'ben', 'bos', 'bul', 'cat',
297
+ 'ceb', 'nya', 'hrv', 'ces', 'dan', 'nld', 'eng', 'est', 'fil', 'fin',
298
+ 'fra', 'glg', 'kat', 'deu', 'ell', 'guj', 'hau', 'heb', 'hin', 'hun',
299
+ 'isl', 'ind', 'gle', 'ita', 'jpn', 'jav', 'kan', 'kaz', 'kir', 'kor',
300
+ 'lav', 'lin', 'lit', 'ltz', 'mkd', 'msa', 'mal', 'cmn', 'mar', 'nep',
301
+ 'nor', 'pus', 'fas', 'pol', 'por', 'pan', 'ron', 'rus', 'srp', 'snd',
302
+ 'slk', 'slv', 'som', 'spa', 'swa', 'swe', 'tam', 'tel', 'tha', 'tur',
303
+ 'ukr', 'urd', 'vie', 'cym'
304
+ }
305
+ return supported_codes
306
+
307
+ @lru_cache(maxsize=1)
308
+ def fetch_aws_transcribe_languages():
309
+ """Scrape AWS Transcribe (ASR) supported languages"""
310
+ url = "https://docs.aws.amazon.com/transcribe/latest/dg/supported-languages.html"
311
+
312
+ try:
313
+ response = requests.get(url, timeout=10)
314
+ response.raise_for_status()
315
+ soup = BeautifulSoup(response.content, 'html.parser')
316
+
317
+ # Find tables with language codes
318
+ tables = soup.find_all('table')
319
+
320
+ aws_transcribe = {}
321
+ for table in tables:
322
+ rows = table.find_all('tr')
323
+ if not rows:
324
+ continue
325
+
326
+ # Check if this table has language code column
327
+ headers = [th.get_text(strip=True) for th in rows[0].find_all('th')]
328
+
329
+ # Find language code column index
330
+ lang_code_idx = None
331
+ lang_name_idx = None
332
+ for idx, header in enumerate(headers):
333
+ if 'Language code' in header or 'language code' in header.lower():
334
+ lang_code_idx = idx
335
+ if 'Language' == header or header.startswith('Language'):
336
+ lang_name_idx = idx
337
+
338
+ if lang_code_idx is not None:
339
+ for row in rows[1:]: # Skip header
340
+ cols = row.find_all('td')
341
+ if len(cols) > lang_code_idx:
342
+ locale = cols[lang_code_idx].get_text(strip=True)
343
+ language = cols[lang_name_idx].get_text(strip=True) if lang_name_idx and len(cols) > lang_name_idx else ''
344
+ if locale and locale not in ['—', '-', '']:
345
+ aws_transcribe[locale] = language
346
+
347
+ return aws_transcribe
348
+ except Exception as e:
349
+ print(f"Error fetching AWS Transcribe data: {e}")
350
+ return {}
351
+
352
+ @lru_cache(maxsize=1)
353
+ def fetch_aws_polly_languages():
354
+ """Scrape AWS Polly (TTS) supported languages"""
355
+ url = "https://docs.aws.amazon.com/polly/latest/dg/supported-languages.html"
356
+
357
+ try:
358
+ response = requests.get(url, timeout=10)
359
+ response.raise_for_status()
360
+ soup = BeautifulSoup(response.content, 'html.parser')
361
+
362
+ # Find tables with language codes
363
+ tables = soup.find_all('table')
364
+
365
+ aws_polly = {}
366
+ for table in tables:
367
+ rows = table.find_all('tr')
368
+ if not rows:
369
+ continue
370
+
371
+ # Check if this table has language code column
372
+ headers = [th.get_text(strip=True) for th in rows[0].find_all('th')]
373
+
374
+ # Find language code column index
375
+ lang_code_idx = None
376
+ lang_name_idx = None
377
+ for idx, header in enumerate(headers):
378
+ if 'Language code' in header or 'language code' in header.lower():
379
+ lang_code_idx = idx
380
+ if 'Language' == header or header.startswith('Language'):
381
+ lang_name_idx = idx
382
+
383
+ if lang_code_idx is not None:
384
+ for row in rows[1:]: # Skip header
385
+ cols = row.find_all('td')
386
+ if len(cols) > lang_code_idx:
387
+ locale = cols[lang_code_idx].get_text(strip=True)
388
+ language = cols[lang_name_idx].get_text(strip=True) if lang_name_idx and len(cols) > lang_name_idx else ''
389
+ if locale and locale not in ['—', '-', '']:
390
+ # Count voices per locale (each row is a different voice/locale combo)
391
+ if locale in aws_polly:
392
+ aws_polly[locale]['voice_count'] += 1
393
+ else:
394
+ aws_polly[locale] = {
395
+ 'language': language,
396
+ 'voice_count': 1
397
+ }
398
+
399
+ return aws_polly
400
+ except Exception as e:
401
+ print(f"Error fetching AWS Polly data: {e}")
402
+ return {}
403
+
404
+ def get_azure_locales_for_language(language_code):
405
+ """
406
+ Get Azure BCP-47 locales for a language using its alpha2 code
407
+ Returns list of matching locales from Azure
408
+ """
409
+ lang_info = LANGUAGES.get(language_code)
410
+ if not lang_info or not lang_info['alpha2']:
411
+ return []
412
+
413
+ alpha2 = lang_info['alpha2']
414
+ azure_asr = fetch_azure_asr_languages()
415
+ azure_tts = fetch_azure_tts_languages()
416
+
417
+ # Find all locales that start with the alpha2 code
418
+ matching_locales = set()
419
+
420
+ for locale in azure_asr.keys():
421
+ if locale.startswith(alpha2 + '-') or locale == alpha2:
422
+ matching_locales.add(locale)
423
+
424
+ for locale in azure_tts.keys():
425
+ if locale.startswith(alpha2 + '-') or locale == alpha2:
426
+ matching_locales.add(locale)
427
+
428
+ return sorted(matching_locales)
429
+
430
+ def get_google_locales_for_language(language_code):
431
+ """
432
+ Get Google Cloud BCP-47 locales for a language using its alpha2 code
433
+ Returns list of matching locales from Google Cloud
434
+ """
435
+ lang_info = LANGUAGES.get(language_code)
436
+ if not lang_info or not lang_info['alpha2']:
437
+ return []
438
+
439
+ alpha2 = lang_info['alpha2']
440
+ google_stt = fetch_google_stt_languages()
441
+ google_tts = fetch_google_tts_languages()
442
+
443
+ # Find all locales that start with the alpha2 code
444
+ matching_locales = set()
445
+
446
+ for locale in google_stt.keys():
447
+ if locale.startswith(alpha2 + '-') or locale == alpha2:
448
+ matching_locales.add(locale)
449
+
450
+ for locale in google_tts.keys():
451
+ if locale.startswith(alpha2 + '-') or locale == alpha2:
452
+ matching_locales.add(locale)
453
+
454
+ return sorted(matching_locales)
455
+
456
+ def check_elevenlabs_multilingual_v2_support(language_code):
457
+ """
458
+ Check if ElevenLabs Multilingual v2 supports a language using ISO 639-1 (alpha2) codes
459
+ Returns True if supported, False otherwise
460
+ """
461
+ lang_info = LANGUAGES.get(language_code)
462
+ if not lang_info:
463
+ return False
464
+
465
+ supported_codes = fetch_elevenlabs_multilingual_v2()
466
+
467
+ # Check alpha2 code (2-letter code)
468
+ if lang_info['alpha2'] and lang_info['alpha2'] in supported_codes:
469
+ return True
470
+
471
+ return False
472
+
473
+ def check_elevenlabs_turbo_v3_support(language_code):
474
+ """
475
+ Check if ElevenLabs Turbo v3 supports a language using ISO 639-3 (alpha3) codes
476
+ Returns True if supported, False otherwise
477
+ """
478
+ lang_info = LANGUAGES.get(language_code)
479
+ if not lang_info:
480
+ return False
481
+
482
+ supported_codes = fetch_elevenlabs_turbo_v3()
483
+
484
+ # Check alpha3_b code first (3-letter code, bibliographic)
485
+ if lang_info['alpha3_b'] and lang_info['alpha3_b'] in supported_codes:
486
+ return True
487
+
488
+ # Check alpha3_t code (3-letter code, terminological)
489
+ if lang_info['alpha3_t'] and lang_info['alpha3_t'] in supported_codes:
490
+ return True
491
+
492
+ return False
493
+
494
+ def get_aws_locales_for_language(language_code):
495
+ """
496
+ Get AWS locales for a language using its alpha2 code
497
+ Returns list of matching locales from AWS Transcribe and Polly
498
+ """
499
+ lang_info = LANGUAGES.get(language_code)
500
+ if not lang_info or not lang_info['alpha2']:
501
+ return []
502
+
503
+ alpha2 = lang_info['alpha2']
504
+ aws_transcribe = fetch_aws_transcribe_languages()
505
+ aws_polly = fetch_aws_polly_languages()
506
+
507
+ # Find all locales that start with the alpha2 code
508
+ matching_locales = set()
509
+
510
+ for locale in aws_transcribe.keys():
511
+ if locale.startswith(alpha2 + '-') or locale == alpha2:
512
+ matching_locales.add(locale)
513
+
514
+ for locale in aws_polly.keys():
515
+ if locale.startswith(alpha2 + '-') or locale == alpha2:
516
+ matching_locales.add(locale)
517
+
518
+ return sorted(matching_locales)
519
+
520
+ def search_huggingface_models(language_code, pipeline_tag, max_results=100, max_pages=3):
521
+ """
522
+ Search HuggingFace for models supporting a specific language
523
+ pipeline_tag: 'automatic-speech-recognition' or 'text-to-speech'
524
+ max_results: maximum number of models to return
525
+ max_pages: maximum number of pages to search per language code
526
+ Returns tuple: (list of model dictionaries, log messages)
527
+ """
528
+ lang_info = LANGUAGES.get(language_code)
529
+ logs = []
530
+
531
+ if not lang_info:
532
+ logs.append(f"No language info found for code: {language_code}")
533
+ return [], logs
534
+
535
+ # Try multiple language code formats
536
+ codes_to_try = []
537
+ if lang_info['alpha2']:
538
+ codes_to_try.append(lang_info['alpha2']) # 2-letter code
539
+ if lang_info['alpha3_b']:
540
+ codes_to_try.append(lang_info['alpha3_b']) # 3-letter code
541
+ if lang_info['alpha3_t']:
542
+ codes_to_try.append(lang_info['alpha3_t']) # 3-letter terminological
543
+
544
+ logs.append(f"Language codes to search: {set(codes_to_try)}")
545
+
546
+ models = []
547
+ seen_models = set()
548
+
549
+ for code in codes_to_try:
550
+ if len(models) >= max_results:
551
+ break
552
+
553
+ logs.append(f"Searching for language code: {code}")
554
+
555
+ # Try multiple pages for this language code
556
+ for page in range(max_pages):
557
+ if len(models) >= max_results:
558
+ break
559
+
560
+ try:
561
+ # Use HuggingFace model search with pagination
562
+ url = f"https://huggingface.co/models?pipeline_tag={pipeline_tag}&language={code}&sort=trending"
563
+ if page > 0:
564
+ url += f"&p={page}"
565
+
566
+ logs.append(f" Page {page}: {url}")
567
+
568
+ headers = {
569
+ 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36'
570
+ }
571
+
572
+ response = requests.get(url, headers=headers, timeout=10)
573
+ response.raise_for_status()
574
+
575
+ soup = BeautifulSoup(response.content, 'html.parser')
576
+
577
+ # Parse model cards from the page
578
+ model_cards = soup.find_all('article', class_='overview-card-wrapper')
579
+
580
+ if not model_cards:
581
+ logs.append(f" No model cards found on page {page}")
582
+ break
583
+
584
+ logs.append(f" Found {len(model_cards)} model cards on page {page}")
585
+
586
+ for card in model_cards:
587
+ if len(models) >= max_results:
588
+ break
589
+
590
+ try:
591
+ link = card.find('a', href=True)
592
+ if link:
593
+ href = link.get('href', '')
594
+ model_name = href.lstrip('/')
595
+
596
+ if model_name and model_name != '#' and model_name not in seen_models:
597
+ seen_models.add(model_name)
598
+
599
+ # Parse stats directly from the card HTML by looking at SVG icons
600
+ downloads = 0
601
+ likes = 0
602
+ size = ""
603
+
604
+ # Find all SVG elements in the card
605
+ svgs = card.find_all('svg')
606
+
607
+ for svg in svgs:
608
+ # Get the next sibling text after the SVG
609
+ # Could be direct text or text within a span/other element
610
+ next_elem = svg.find_next_sibling(string=True)
611
+ stat_text = ""
612
+
613
+ if next_elem and next_elem.strip():
614
+ stat_text = next_elem.strip()
615
+ else:
616
+ # Try to find text in the next sibling element (e.g., <span>)
617
+ next_tag = svg.find_next_sibling()
618
+ if next_tag:
619
+ stat_text = next_tag.get_text(strip=True)
620
+
621
+ if not stat_text or len(stat_text) < 1:
622
+ continue
623
+
624
+ # Identify icon type by viewBox or path content
625
+ svg_str = str(svg)
626
+
627
+ # Download icon: viewBox="0 0 32 32" with download arrow path
628
+ if 'M26 24v4H6v-4H4v4a2 2 0 0 0 2 2h20a2 2 0 0 0 2-2v-4zm0-10l-1.41-1.41L17 20.17V2h-2v18.17l-7.59-7.58L6 14l10 10l10-10z' in svg_str:
629
+ downloads = parse_stat_number(stat_text)
630
+
631
+ # Like/heart icon: heart path
632
+ elif 'M22.45,6a5.47,5.47,0,0,1,3.91,1.64,5.7,5.7,0,0,1,0,8L16,26.13' in svg_str:
633
+ likes = parse_stat_number(stat_text)
634
+
635
+ # Model size icon: small grid icon (viewBox="0 0 12 12") with specific path for parameter count
636
+ elif 'M10 10H8.4V8.4H10V10Zm0-3.2H8.4V5.2H10v1.6ZM6.8 10H5.2V8.4h1.6V10Z' in svg_str:
637
+ # Model parameter count (e.g., "2B", "0.6B")
638
+ # Must be short and contain B for billion params
639
+ if len(stat_text) <= 6 and re.search(r'\d+\.?\d*\s*[Bb]', stat_text):
640
+ size = stat_text
641
+
642
+ models.append({
643
+ 'name': model_name,
644
+ 'url': f"https://huggingface.co/{model_name}",
645
+ 'downloads': downloads,
646
+ 'likes': likes,
647
+ 'size': size
648
+ })
649
+ except Exception as e:
650
+ logs.append(f" Error parsing model card: {e}")
651
+ continue
652
+
653
+ except Exception as e:
654
+ logs.append(f" ERROR searching page {page}: {e}")
655
+ break
656
+
657
+ # Sort by downloads (descending)
658
+ models.sort(key=lambda x: x['downloads'], reverse=True)
659
+
660
+ logs.append(f"Total unique models found: {len(models)}")
661
+ return models, logs
662
+
663
+ def get_huggingface_stats(item_name, item_type='datasets'):
664
+ """
665
+ Get likes and downloads for a HuggingFace dataset or model using API
666
+ item_type: 'datasets' or 'models'
667
+ Returns dict with likes and downloads
668
+
669
+ NOTE: This method is currently NOT USED. We parse stats directly from HTML instead.
670
+ Keeping it here as a fallback in case HTML parsing fails.
671
+ """
672
+ try:
673
+ api_url = f"https://huggingface.co/api/{item_type}/{item_name}"
674
+ response = requests.get(api_url, timeout=5)
675
+
676
+ if response.status_code == 200:
677
+ data = response.json()
678
+ return {
679
+ 'likes': data.get('likes', 0),
680
+ 'downloads': data.get('downloads', 0)
681
+ }
682
+ except Exception:
683
+ pass
684
+
685
+ return {'likes': 0, 'downloads': 0}
686
+
687
+ def parse_stat_number(stat_text):
688
+ """
689
+ Parse HuggingFace stat numbers like '4.07M', '23.4k', '349' into integers
690
+ Returns integer value or 0 if parsing fails
691
+ """
692
+ if not stat_text:
693
+ return 0
694
+
695
+ stat_text = stat_text.strip().upper()
696
+
697
+ try:
698
+ # Handle 'M' (millions)
699
+ if 'M' in stat_text:
700
+ return int(float(stat_text.replace('M', '')) * 1_000_000)
701
+ # Handle 'K' (thousands)
702
+ elif 'K' in stat_text:
703
+ return int(float(stat_text.replace('K', '')) * 1_000)
704
+ # Plain number
705
+ else:
706
+ return int(stat_text.replace(',', ''))
707
+ except (ValueError, AttributeError):
708
+ return 0
709
+
710
+ def deduplicate_models(models):
711
+ """
712
+ Deduplicate models by base name (without user/org prefix)
713
+ Keep the model with most downloads and count duplicates
714
+ Returns list of deduplicated models with duplicate count added
715
+ """
716
+ from collections import defaultdict
717
+
718
+ # Group models by base name
719
+ grouped = defaultdict(list)
720
+ for model in models:
721
+ # Extract base name (everything after last '/')
722
+ name_parts = model['name'].split('/')
723
+ if len(name_parts) > 1:
724
+ base_name = name_parts[-1] # e.g., "whisper-large-v3"
725
+ else:
726
+ base_name = model['name']
727
+
728
+ grouped[base_name].append(model)
729
+
730
+ # For each group, keep the one with most downloads
731
+ deduplicated = []
732
+ for base_name, model_list in grouped.items():
733
+ # Sort by downloads (descending) and keep the first one
734
+ model_list.sort(key=lambda x: x['downloads'], reverse=True)
735
+ best_model = model_list[0]
736
+
737
+ # Add duplicate count (total in group)
738
+ best_model['duplicates'] = len(model_list) - 1
739
+
740
+ deduplicated.append(best_model)
741
+
742
+ # Sort by downloads again
743
+ deduplicated.sort(key=lambda x: x['downloads'], reverse=True)
744
+
745
+ return deduplicated
746
+
747
+ def search_huggingface_datasets(language_code, task_category, max_results=100, max_pages=3):
748
+ """
749
+ Search HuggingFace for datasets supporting a specific language
750
+ task_category: 'automatic-speech-recognition' or 'text-to-speech'
751
+ max_results: maximum number of datasets to return
752
+ max_pages: maximum number of pages to search per language code
753
+ Returns tuple: (list of dataset dictionaries, log messages)
754
+ """
755
+ lang_info = LANGUAGES.get(language_code)
756
+ logs = []
757
+
758
+ if not lang_info:
759
+ logs.append(f"No language info found for code: {language_code}")
760
+ return [], logs
761
+
762
+ # Collect all unique language codes for this language
763
+ language_codes = set()
764
+ if lang_info['alpha2']:
765
+ language_codes.add(lang_info['alpha2']) # 2-letter code
766
+ if lang_info['alpha3_b']:
767
+ language_codes.add(lang_info['alpha3_b']) # 3-letter code
768
+ if lang_info['alpha3_t']:
769
+ language_codes.add(lang_info['alpha3_t']) # 3-letter terminological
770
+
771
+ logs.append(f"Language codes to search: {language_codes}")
772
+
773
+ datasets = []
774
+ seen_datasets = set()
775
+
776
+ # Search separately for each language code
777
+ for code in language_codes:
778
+ if len(datasets) >= max_results:
779
+ break
780
+
781
+ logs.append(f"Searching for language code: {code}")
782
+
783
+ for page in range(max_pages):
784
+ if len(datasets) >= max_results:
785
+ break
786
+
787
+ try:
788
+ # Use HuggingFace dataset search with correct format
789
+ # Format: task_categories=task_categories:automatic-speech-recognition&language=language:en
790
+ url = f"https://huggingface.co/datasets?task_categories=task_categories:{task_category}&language=language:{code}&sort=trending"
791
+ if page > 0:
792
+ url += f"&p={page}"
793
+
794
+ logs.append(f" Page {page}: {url}")
795
+
796
+ headers = {
797
+ 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36'
798
+ }
799
+
800
+ response = requests.get(url, headers=headers, timeout=10)
801
+ response.raise_for_status()
802
+
803
+ soup = BeautifulSoup(response.content, 'html.parser')
804
+
805
+ # Parse dataset cards from the page
806
+ dataset_cards = soup.find_all('article', class_='overview-card-wrapper')
807
+
808
+ if not dataset_cards:
809
+ logs.append(f" No dataset cards found on page {page}")
810
+ break
811
+
812
+ logs.append(f" Found {len(dataset_cards)} dataset cards on page {page}")
813
+
814
+ for card in dataset_cards:
815
+ if len(datasets) >= max_results:
816
+ break
817
+
818
+ try:
819
+ link = card.find('a', href=True)
820
+ if link:
821
+ href = link.get('href', '')
822
+ dataset_path = href.lstrip('/')
823
+
824
+ # Remove "datasets/" prefix if present
825
+ if dataset_path.startswith('datasets/'):
826
+ dataset_name = dataset_path[9:] # Remove "datasets/" (9 chars)
827
+ else:
828
+ dataset_name = dataset_path
829
+
830
+ if dataset_name and dataset_name != '#' and dataset_name not in seen_datasets:
831
+ seen_datasets.add(dataset_name)
832
+
833
+ # Parse stats directly from the card HTML by looking at SVG icons
834
+ downloads = 0
835
+ likes = 0
836
+ size = ""
837
+
838
+ # Find all SVG elements in the card
839
+ svgs = card.find_all('svg')
840
+
841
+ for svg in svgs:
842
+ # Get the next sibling text after the SVG
843
+ # Could be direct text or text within a span/other element
844
+ next_elem = svg.find_next_sibling(string=True)
845
+ stat_text = ""
846
+
847
+ if next_elem and next_elem.strip():
848
+ stat_text = next_elem.strip()
849
+ else:
850
+ # Try to find text in the next sibling element (e.g., <span>)
851
+ next_tag = svg.find_next_sibling()
852
+ if next_tag:
853
+ stat_text = next_tag.get_text(strip=True)
854
+
855
+ # Skip non-numeric text like "Viewer", "Updated", etc.
856
+ if not stat_text or len(stat_text) < 1 or stat_text in ['Viewer', 'Updated']:
857
+ continue
858
+
859
+ # Identify icon type by viewBox or path content
860
+ svg_str = str(svg)
861
+
862
+ # Download icon: viewBox="0 0 32 32" with download arrow path
863
+ if 'M26 24v4H6v-4H4v4a2 2 0 0 0 2 2h20a2 2 0 0 0 2-2v-4zm0-10l-1.41-1.41L17 20.17V2h-2v18.17l-7.59-7.58L6 14l10 10l10-10z' in svg_str:
864
+ downloads = parse_stat_number(stat_text)
865
+
866
+ # Like/heart icon: heart path
867
+ elif 'M22.45,6a5.47,5.47,0,0,1,3.91,1.64,5.7,5.7,0,0,1,0,8L16,26.13' in svg_str:
868
+ likes = parse_stat_number(stat_text)
869
+
870
+ # Dataset size icon: table/grid icon with fill-rule="evenodd"
871
+ elif 'fill-rule="evenodd"' in svg_str and 'clip-rule="evenodd"' in svg_str:
872
+ # Dataset size (e.g., "411k", "23.4M", "65.1k")
873
+ # Must look like a number (has k, M, or digits)
874
+ if any(c in stat_text for c in ['k', 'K', 'm', 'M']) or stat_text.replace(',', '').replace('.', '').isdigit():
875
+ size = stat_text
876
+
877
+ datasets.append({
878
+ 'name': dataset_name,
879
+ 'url': f"https://huggingface.co/datasets/{dataset_name}",
880
+ 'downloads': downloads,
881
+ 'likes': likes,
882
+ 'size': size
883
+ })
884
+ except Exception as e:
885
+ logs.append(f" Error parsing dataset card: {e}")
886
+ continue
887
+
888
+ except Exception as e:
889
+ logs.append(f" ERROR searching page {page}: {e}")
890
+ break
891
+
892
+ # Sort by downloads (descending)
893
+ datasets.sort(key=lambda x: x['downloads'], reverse=True)
894
+
895
+ logs.append(f"Total unique datasets found: {len(datasets)}")
896
+ return datasets, logs
897
+
898
+ def search_language_resources(language_code, deduplicate=False):
899
+ """
900
+ Search for ASR/TTS resources for a given language
901
+ Returns results organized by service type
902
+ deduplicate: if True, remove duplicate models (same base name) and keep only the one with most downloads
903
+ """
904
+ all_logs = []
905
+
906
+ if not language_code:
907
+ return None, None, None, 0, 0, None, None, 0, 0, ""
908
+
909
+ lang_info = LANGUAGES.get(language_code)
910
+ if not lang_info:
911
+ return None, None, None, 0, 0, None, None, 0, 0, ""
912
+
913
+ language_name = lang_info['name']
914
+ all_logs.append(f"=== Searching for {language_name} ({language_code}) ===")
915
+ all_logs.append(f"Language codes: alpha2={lang_info['alpha2']}, alpha3_b={lang_info['alpha3_b']}, alpha3_t={lang_info['alpha3_t']}")
916
+
917
+ # Fetch Azure data
918
+ all_logs.append("\n[Azure Speech Services]")
919
+ azure_asr = fetch_azure_asr_languages()
920
+ azure_tts = fetch_azure_tts_languages()
921
+ all_logs.append(f" Fetched {len(azure_asr)} ASR languages and {len(azure_tts)} TTS languages from Azure")
922
+
923
+ # Get matching Azure locales using alpha2 code
924
+ azure_locales = get_azure_locales_for_language(language_code)
925
+ all_logs.append(f" Matching Azure locales: {azure_locales}")
926
+
927
+ # Check Azure ASR support
928
+ azure_asr_locales = [loc for loc in azure_locales if loc in azure_asr]
929
+ azure_asr_available = len(azure_asr_locales) > 0
930
+ all_logs.append(f" Azure ASR: {'✅ Supported' if azure_asr_available else '❌ Not supported'} ({len(azure_asr_locales)} locales)")
931
+
932
+ # Check Azure TTS support and count voices
933
+ azure_tts_locales = [loc for loc in azure_locales if loc in azure_tts]
934
+ azure_tts_available = len(azure_tts_locales) > 0
935
+ azure_total_voices = sum(azure_tts[loc]['voice_count'] for loc in azure_tts_locales)
936
+ all_logs.append(f" Azure TTS: {'✅ Supported' if azure_tts_available else '❌ Not supported'} ({len(azure_tts_locales)} locales, {azure_total_voices} voices)")
937
+
938
+ # Fetch Google Cloud data
939
+ all_logs.append("\n[Google Cloud Speech]")
940
+ google_stt = fetch_google_stt_languages()
941
+ google_tts = fetch_google_tts_languages()
942
+ all_logs.append(f" Fetched {len(google_stt)} STT languages and {len(google_tts)} TTS languages from Google Cloud")
943
+
944
+ # Get matching Google Cloud locales using alpha2 code
945
+ google_locales = get_google_locales_for_language(language_code)
946
+ all_logs.append(f" Matching Google Cloud locales: {google_locales}")
947
+
948
+ # Check Google Cloud STT support
949
+ google_stt_locales = [loc for loc in google_locales if loc in google_stt]
950
+ google_stt_available = len(google_stt_locales) > 0
951
+ all_logs.append(f" Google STT: {'✅ Supported' if google_stt_available else '❌ Not supported'} ({len(google_stt_locales)} locales)")
952
+
953
+ # Check Google Cloud TTS support and count voices
954
+ google_tts_locales = [loc for loc in google_locales if loc in google_tts]
955
+ google_tts_available = len(google_tts_locales) > 0
956
+ google_total_voices = sum(google_tts[loc]['voice_count'] for loc in google_tts_locales)
957
+ all_logs.append(f" Google TTS: {'✅ Supported' if google_tts_available else '❌ Not supported'} ({len(google_tts_locales)} locales, {google_total_voices} voices)")
958
+
959
+ # Fetch AWS data
960
+ all_logs.append("\n[AWS (Transcribe + Polly)]")
961
+ aws_transcribe = fetch_aws_transcribe_languages()
962
+ aws_polly = fetch_aws_polly_languages()
963
+ all_logs.append(f" Fetched {len(aws_transcribe)} Transcribe languages and {len(aws_polly)} Polly languages from AWS")
964
+
965
+ # Get matching AWS locales using alpha2 code
966
+ aws_locales = get_aws_locales_for_language(language_code)
967
+ all_logs.append(f" Matching AWS locales: {aws_locales}")
968
+
969
+ # Check AWS Transcribe support
970
+ aws_transcribe_locales = [loc for loc in aws_locales if loc in aws_transcribe]
971
+ aws_transcribe_available = len(aws_transcribe_locales) > 0
972
+ all_logs.append(f" AWS Transcribe: {'✅ Supported' if aws_transcribe_available else '❌ Not supported'} ({len(aws_transcribe_locales)} locales)")
973
+
974
+ # Check AWS Polly support and count voices
975
+ aws_polly_locales = [loc for loc in aws_locales if loc in aws_polly]
976
+ aws_polly_available = len(aws_polly_locales) > 0
977
+ aws_total_voices = sum(aws_polly[loc]['voice_count'] for loc in aws_polly_locales)
978
+ all_logs.append(f" AWS Polly: {'✅ Supported' if aws_polly_available else '❌ Not supported'} ({len(aws_polly_locales)} locales, {aws_total_voices} voices)")
979
+
980
+ # Commercial Services
981
+ commercial_rows = []
982
+
983
+ # Azure Speech
984
+ if azure_asr_available:
985
+ azure_asr_text = f"✅ {len(azure_asr_locales)} locale(s)"
986
+ else:
987
+ azure_asr_text = "❌ N/A"
988
+
989
+ if azure_tts_available:
990
+ azure_tts_text = f"✅ {len(azure_tts_locales)} locale(s), {azure_total_voices} voice(s)"
991
+ else:
992
+ azure_tts_text = "❌ N/A"
993
+
994
+ commercial_rows.append({
995
+ "Service": "Azure Speech",
996
+ "ASR": azure_asr_text,
997
+ "TTS": azure_tts_text,
998
+ })
999
+
1000
+ # Google Cloud Speech
1001
+ if google_stt_available:
1002
+ google_stt_text = f"✅ {len(google_stt_locales)} locale(s)"
1003
+ else:
1004
+ google_stt_text = "❌ N/A"
1005
+
1006
+ if google_tts_available:
1007
+ google_tts_text = f"✅ {len(google_tts_locales)} locale(s), {google_total_voices} voice(s)"
1008
+ else:
1009
+ google_tts_text = "❌ N/A"
1010
+
1011
+ commercial_rows.append({
1012
+ "Service": "Google Cloud Speech",
1013
+ "ASR": google_stt_text,
1014
+ "TTS": google_tts_text,
1015
+ })
1016
+
1017
+ # AWS (Transcribe + Polly)
1018
+ if aws_transcribe_available:
1019
+ aws_transcribe_text = f"✅ {len(aws_transcribe_locales)} locale(s)"
1020
+ else:
1021
+ aws_transcribe_text = "❌ N/A"
1022
+
1023
+ if aws_polly_available:
1024
+ aws_polly_text = f"✅ {len(aws_polly_locales)} locale(s), {aws_total_voices} voice(s)"
1025
+ else:
1026
+ aws_polly_text = "❌ N/A"
1027
+
1028
+ commercial_rows.append({
1029
+ "Service": "AWS (Transcribe + Polly)",
1030
+ "ASR": aws_transcribe_text,
1031
+ "TTS": aws_polly_text,
1032
+ })
1033
+
1034
+ # ElevenLabs Multilingual v2 (TTS only)
1035
+ all_logs.append("\n[ElevenLabs]")
1036
+ elevenlabs_v2_supported = check_elevenlabs_multilingual_v2_support(language_code)
1037
+ all_logs.append(f" Multilingual v2: {'✅ Supported' if elevenlabs_v2_supported else '❌ Not supported'}")
1038
+
1039
+ if elevenlabs_v2_supported:
1040
+ elevenlabs_v2_tts_text = "✅ Supported"
1041
+ else:
1042
+ elevenlabs_v2_tts_text = "❌ N/A"
1043
+
1044
+ commercial_rows.append({
1045
+ "Service": "ElevenLabs Multilingual v2",
1046
+ "ASR": "N/A", # ElevenLabs doesn't offer ASR
1047
+ "TTS": elevenlabs_v2_tts_text,
1048
+ })
1049
+
1050
+ # ElevenLabs Turbo v3 (TTS only)
1051
+ elevenlabs_v3_supported = check_elevenlabs_turbo_v3_support(language_code)
1052
+ all_logs.append(f" Turbo v3: {'✅ Supported' if elevenlabs_v3_supported else '❌ Not supported'}")
1053
+
1054
+ if elevenlabs_v3_supported:
1055
+ elevenlabs_v3_tts_text = "✅ Supported"
1056
+ else:
1057
+ elevenlabs_v3_tts_text = "❌ N/A"
1058
+
1059
+ commercial_rows.append({
1060
+ "Service": "ElevenLabs Turbo v3",
1061
+ "ASR": "N/A", # ElevenLabs doesn't offer ASR
1062
+ "TTS": elevenlabs_v3_tts_text,
1063
+ })
1064
+
1065
+ commercial_df = pd.DataFrame(commercial_rows)
1066
+
1067
+ # HuggingFace Models - Search for real ASR and TTS models
1068
+ all_logs.append("\n[HuggingFace Models]")
1069
+
1070
+ asr_models, asr_model_logs = search_huggingface_models(language_code, 'automatic-speech-recognition', max_results=100, max_pages=5)
1071
+ all_logs.extend([f" [ASR] {log}" for log in asr_model_logs])
1072
+
1073
+ tts_models, tts_model_logs = search_huggingface_models(language_code, 'text-to-speech', max_results=100, max_pages=5)
1074
+ all_logs.extend([f" [TTS] {log}" for log in tts_model_logs])
1075
+
1076
+ # Apply deduplication if requested
1077
+ if deduplicate:
1078
+ all_logs.append(f"\n[Deduplication]")
1079
+ asr_before = len(asr_models)
1080
+ asr_models = deduplicate_models(asr_models)
1081
+ all_logs.append(f" ASR models: {asr_before} → {len(asr_models)} (removed {asr_before - len(asr_models)} duplicates)")
1082
+
1083
+ tts_before = len(tts_models)
1084
+ tts_models = deduplicate_models(tts_models)
1085
+ all_logs.append(f" TTS models: {tts_before} → {len(tts_models)} (removed {tts_before - len(tts_models)} duplicates)")
1086
+ else:
1087
+ # Add duplicates count of 1 for all models when not deduplicating
1088
+ for model in asr_models:
1089
+ model['duplicates'] = 1
1090
+ for model in tts_models:
1091
+ model['duplicates'] = 1
1092
+
1093
+ # Format ASR models with clickable names
1094
+ asr_models_data = []
1095
+ for model in asr_models:
1096
+ asr_models_data.append({
1097
+ "Model Name": f"[{model['name']}]({model['url']})",
1098
+ "Downloads": model['downloads'],
1099
+ "Likes": model['likes'],
1100
+ "Size": model.get('size', ''),
1101
+ "Duplicates": model.get('duplicates', 1)
1102
+ })
1103
+
1104
+ if asr_models_data:
1105
+ asr_models_df = pd.DataFrame(asr_models_data)
1106
+ else:
1107
+ # Empty dataframe if no models found
1108
+ asr_models_df = pd.DataFrame(columns=["Model Name", "Downloads", "Likes", "Size", "Duplicates"])
1109
+
1110
+ # Format TTS models with clickable names
1111
+ tts_models_data = []
1112
+ for model in tts_models:
1113
+ tts_models_data.append({
1114
+ "Model Name": f"[{model['name']}]({model['url']})",
1115
+ "Downloads": model['downloads'],
1116
+ "Likes": model['likes'],
1117
+ "Size": model.get('size', ''),
1118
+ "Duplicates": model.get('duplicates', 1)
1119
+ })
1120
+
1121
+ if tts_models_data:
1122
+ tts_models_df = pd.DataFrame(tts_models_data)
1123
+ else:
1124
+ # Empty dataframe if no models found
1125
+ tts_models_df = pd.DataFrame(columns=["Model Name", "Downloads", "Likes", "Size", "Duplicates"])
1126
+
1127
+ # HuggingFace Datasets - Search for real ASR and TTS datasets
1128
+ all_logs.append("\n[HuggingFace Datasets]")
1129
+ asr_datasets, asr_dataset_logs = search_huggingface_datasets(language_code, 'automatic-speech-recognition', max_results=100, max_pages=5)
1130
+ all_logs.extend([f" [ASR] {log}" for log in asr_dataset_logs])
1131
+
1132
+ tts_datasets, tts_dataset_logs = search_huggingface_datasets(language_code, 'text-to-speech', max_results=100, max_pages=5)
1133
+ all_logs.extend([f" [TTS] {log}" for log in tts_dataset_logs])
1134
+
1135
+ # Format ASR datasets with clickable names
1136
+ asr_datasets_data = []
1137
+ for dataset in asr_datasets:
1138
+ asr_datasets_data.append({
1139
+ "Dataset Name": f"[{dataset['name']}]({dataset['url']})",
1140
+ "Downloads": dataset['downloads'],
1141
+ "Likes": dataset['likes'],
1142
+ "Size": dataset.get('size', '')
1143
+ })
1144
+
1145
+ if asr_datasets_data:
1146
+ asr_datasets_df = pd.DataFrame(asr_datasets_data)
1147
+ else:
1148
+ # Empty dataframe if no datasets found
1149
+ asr_datasets_df = pd.DataFrame(columns=["Dataset Name", "Downloads", "Likes", "Size"])
1150
+
1151
+ # Format TTS datasets with clickable names
1152
+ tts_datasets_data = []
1153
+ for dataset in tts_datasets:
1154
+ tts_datasets_data.append({
1155
+ "Dataset Name": f"[{dataset['name']}]({dataset['url']})",
1156
+ "Downloads": dataset['downloads'],
1157
+ "Likes": dataset['likes'],
1158
+ "Size": dataset.get('size', '')
1159
+ })
1160
+
1161
+ if tts_datasets_data:
1162
+ tts_datasets_df = pd.DataFrame(tts_datasets_data)
1163
+ else:
1164
+ # Empty dataframe if no datasets found
1165
+ tts_datasets_df = pd.DataFrame(columns=["Dataset Name", "Downloads", "Likes", "Size"])
1166
+
1167
+ # Combine all logs
1168
+ log_text = "\n".join(all_logs)
1169
+
1170
+ # Return separate ASR and TTS dataframes, plus counts for tab labels, plus logs
1171
+ return commercial_df, asr_models_df, tts_models_df, len(asr_models), len(tts_models), asr_datasets_df, tts_datasets_df, len(asr_datasets), len(tts_datasets), log_text
1172
+
1173
+ # Initialize - load language list and app content
1174
+ print("Initializing Speech Resource Finder...")
1175
+ load_app_content()
1176
+ load_language_list()
1177
+
1178
+ # Create language choices for dropdown (code: name format for easy searching)
1179
+ language_choices = [f"{code}: {info['name']}" for code, info in sorted(LANGUAGES.items(), key=lambda x: x[1]['name'])]
1180
+ print(f"Created dropdown with {len(language_choices)} language options")
1181
+
1182
+ with gr.Blocks(title=APP_CONTENT["title"]) as demo:
1183
+ gr.Markdown(f"# {APP_CONTENT['title']}")
1184
+ gr.Markdown(APP_CONTENT["description"])
1185
+
1186
+ with gr.Row():
1187
+ language_dropdown = gr.Dropdown(
1188
+ choices=language_choices,
1189
+ label="Select Language",
1190
+ info="Type to search for a language",
1191
+ allow_custom_value=False,
1192
+ filterable=True,
1193
+ )
1194
+ search_btn = gr.Button("Search", variant="primary")
1195
+
1196
+ with gr.Row():
1197
+ deduplicate_checkbox = gr.Checkbox(
1198
+ label="Deduplicate models",
1199
+ value=True,
1200
+ info="Keep only the model with most downloads for each base name"
1201
+ )
1202
+
1203
+ gr.Markdown("## Commercial Services")
1204
+ commercial_table = gr.Dataframe(
1205
+ headers=["Service", "ASR", "TTS"],
1206
+ interactive=False,
1207
+ wrap=True,
1208
+ )
1209
+
1210
+ gr.Markdown("## HuggingFace Models")
1211
+
1212
+ # Create tabs for ASR and TTS models with count labels
1213
+ with gr.Tabs():
1214
+ with gr.Tab(label="ASR Models") as asr_tab:
1215
+ asr_count_label = gr.Markdown("*Loading...*")
1216
+ asr_models_table = gr.Dataframe(
1217
+ headers=["Model Name", "Downloads", "Likes", "Size", "Duplicates"],
1218
+ interactive=False,
1219
+ wrap=True,
1220
+ datatype=["markdown", "number", "number", "str", "number"],
1221
+ )
1222
+
1223
+ with gr.Tab(label="TTS Models") as tts_tab:
1224
+ tts_count_label = gr.Markdown("*Loading...*")
1225
+ tts_models_table = gr.Dataframe(
1226
+ headers=["Model Name", "Downloads", "Likes", "Size", "Duplicates"],
1227
+ interactive=False,
1228
+ wrap=True,
1229
+ datatype=["markdown", "number", "number", "str", "number"],
1230
+ )
1231
+
1232
+ gr.Markdown("## HuggingFace Datasets")
1233
+
1234
+ # Create tabs for ASR and TTS datasets with count labels
1235
+ with gr.Tabs():
1236
+ with gr.Tab(label="ASR Datasets") as asr_datasets_tab:
1237
+ asr_datasets_count_label = gr.Markdown("*Loading...*")
1238
+ asr_datasets_table = gr.Dataframe(
1239
+ headers=["Dataset Name", "Downloads", "Likes", "Size"],
1240
+ interactive=False,
1241
+ wrap=True,
1242
+ datatype=["markdown", "number", "number", "str"],
1243
+ )
1244
+
1245
+ with gr.Tab(label="TTS Datasets") as tts_datasets_tab:
1246
+ tts_datasets_count_label = gr.Markdown("*Loading...*")
1247
+ tts_datasets_table = gr.Dataframe(
1248
+ headers=["Dataset Name", "Downloads", "Likes", "Size"],
1249
+ interactive=False,
1250
+ wrap=True,
1251
+ datatype=["markdown", "number", "number", "str"],
1252
+ )
1253
+
1254
+ gr.Markdown("## Logs")
1255
+ log_textbox = gr.Textbox(
1256
+ label="Search Logs",
1257
+ lines=10,
1258
+ max_lines=20,
1259
+ interactive=False,
1260
+ placeholder="Logs will appear here...",
1261
+ )
1262
+
1263
+ # About section with full content
1264
+ with gr.Accordion("About this tool", open=False):
1265
+ gr.Markdown(APP_CONTENT["full_content"])
1266
+
1267
+ def on_search(language_selection, deduplicate):
1268
+ if not language_selection:
1269
+ return None, "", None, "", None, "", None, "", None, ""
1270
+ # Extract the language code from "code: name" format
1271
+ language_code = language_selection.split(":")[0].strip()
1272
+ commercial_df, asr_models_df, tts_models_df, asr_models_count, tts_models_count, asr_datasets_df, tts_datasets_df, asr_datasets_count, tts_datasets_count, logs = search_language_resources(language_code, deduplicate=deduplicate)
1273
+
1274
+ # Create count labels
1275
+ asr_models_label = f"**Found {asr_models_count} ASR model(s)**"
1276
+ tts_models_label = f"**Found {tts_models_count} TTS model(s)**"
1277
+ asr_datasets_label = f"**Found {asr_datasets_count} ASR dataset(s)**"
1278
+ tts_datasets_label = f"**Found {tts_datasets_count} TTS dataset(s)**"
1279
+
1280
+ return commercial_df, asr_models_label, asr_models_df, tts_models_label, tts_models_df, asr_datasets_label, asr_datasets_df, tts_datasets_label, tts_datasets_df, logs
1281
+
1282
+ search_btn.click(
1283
+ fn=on_search,
1284
+ inputs=[language_dropdown, deduplicate_checkbox],
1285
+ outputs=[commercial_table, asr_count_label, asr_models_table, tts_count_label, tts_models_table, asr_datasets_count_label, asr_datasets_table, tts_datasets_count_label, tts_datasets_table, log_textbox],
1286
+ )
1287
+
1288
+ # Also trigger search when language is selected
1289
+ language_dropdown.change(
1290
+ fn=on_search,
1291
+ inputs=[language_dropdown, deduplicate_checkbox],
1292
+ outputs=[commercial_table, asr_count_label, asr_models_table, tts_count_label, tts_models_table, asr_datasets_count_label, asr_datasets_table, tts_datasets_count_label, tts_datasets_table, log_textbox],
1293
+ )
1294
+
1295
+ # Trigger search when deduplicate checkbox is changed
1296
+ deduplicate_checkbox.change(
1297
+ fn=on_search,
1298
+ inputs=[language_dropdown, deduplicate_checkbox],
1299
+ outputs=[commercial_table, asr_count_label, asr_models_table, tts_count_label, tts_models_table, asr_datasets_count_label, asr_datasets_table, tts_datasets_count_label, tts_datasets_table, log_textbox],
1300
+ )
1301
+
1302
+ if __name__ == "__main__":
1303
+ demo.launch(server_name="0.0.0.0", server_port=7860, share=False, show_error=True)
app_content.md ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Speech Resource Finder
2
+
3
+ ## Description
4
+
5
+ Almost 4 billion people speak languages with little or no speech technology support. This tool makes visible which languages have resources available and which communities are being left behind in the speech AI revolution.
6
+
7
+ Built by CLEAR Global to support language inclusion and help close the digital language divide.
8
+
9
+ ## How to Use
10
+ 1. Select a language from the dropdown (type to search by name or ISO code)
11
+ 2. Toggle model deduplication if desired (enabled by default)
12
+ 3. Review results: commercial availability, models, and datasets
13
+ 4. Click model/dataset names to open on HuggingFace
14
+
15
+ ## Data Sources
16
+
17
+ ### Commercial Speech Services
18
+
19
+ Commercial service support is automatically pulled from the language support page of each service provider.
20
+
21
+ - **Azure Speech Services** - [Speech-to-Text](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=stt) | [Text-to-Speech](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=tts)
22
+ - **Google Cloud Speech** - [Speech-to-Text](https://cloud.google.com/speech-to-text/docs/speech-to-text-supported-languages) | [Text-to-Speech](https://cloud.google.com/text-to-speech/docs/voices)
23
+ - **AWS** - [Transcribe](https://docs.aws.amazon.com/transcribe/latest/dg/supported-languages.html) | [Polly](https://docs.aws.amazon.com/polly/latest/dg/supported-languages.html)
24
+ - **ElevenLabs** - [Multilingual v2](https://elevenlabs.io/docs/models#multilingual-v2) | [Turbo v3](https://elevenlabs.io/docs/models#eleven-v3-alpha)
25
+
26
+ ### Open Source Resources
27
+ - **HuggingFace Models** - Pre-trained speech models sorted by downloads
28
+ - [ASR Models](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition)
29
+ - [TTS Models](https://huggingface.co/models?pipeline_tag=text-to-speech)
30
+ - **HuggingFace Datasets** - Speech corpora for training and evaluation
31
+ - [ASR Datasets](https://huggingface.co/datasets?task_categories=task_categories:automatic-speech-recognition)
32
+ - [TTS Datasets](https://huggingface.co/datasets?task_categories=task_categories:text-to-speech)
33
+
34
+ ## Disclaimer
35
+
36
+ - Currently lists only 487 languages and is taken from this [Github repository](https://github.com/datasets/language-codes).
37
+ - Data fetched in real-time and can change.
38
+ - This is not an exhaustive list. There are other commercial voice technology providers and dataset/model resources that this app doesn't cover.
39
+ - Deduplication discards models with same name uploaded by others and keeps the most downloaded version in the list.
40
+
41
+ ## Feedback
42
+
43
+ We would love to hear your feedback and suggestions. Please write us at tech@clearglobal.org.
language-codes-full.csv ADDED
@@ -0,0 +1,488 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ "alpha3-b","alpha3-t","alpha2","English","French"
2
+ "aar","","aa","Afar","afar
3
+ "abk","","ab","Abkhazian","abkhaze
4
+ "ace","","","Achinese","aceh
5
+ "ach","","","Acoli","acoli
6
+ "ada","","","Adangme","adangme
7
+ "ady","","","Adyghe; Adygei","adyghé
8
+ "afa","","","Afro-Asiatic languages","afro-asiatiques, langues
9
+ "afh","","","Afrihili","afrihili
10
+ "afr","","af","Afrikaans","afrikaans
11
+ "ain","","","Ainu","aïnou
12
+ "aka","","ak","Akan","akan
13
+ "akk","","","Akkadian","akkadien
14
+ "alb","sqi","sq","Albanian","albanais
15
+ "ale","","","Aleut","aléoute
16
+ "alg","","","Algonquian languages","algonquines, langues
17
+ "alt","","","Southern Altai","altai du Sud
18
+ "amh","","am","Amharic","amharique
19
+ "ang","","","English, Old (ca.450-1100)","anglo-saxon (ca.450-1100)
20
+ "anp","","","Angika","angika
21
+ "apa","","","Apache languages","apaches, langues
22
+ "ara","","ar","Arabic","arabe
23
+ "arc","","","Official Aramaic (700-300 BCE); Imperial Aramaic (700-300 BCE)","araméen d'empire (700-300 BCE)
24
+ "arg","","an","Aragonese","aragonais
25
+ "arm","hye","hy","Armenian","arménien
26
+ "arn","","","Mapudungun; Mapuche","mapudungun; mapuche; mapuce
27
+ "arp","","","Arapaho","arapaho
28
+ "art","","","Artificial languages","artificielles, langues
29
+ "arw","","","Arawak","arawak
30
+ "asm","","as","Assamese","assamais
31
+ "ast","","","Asturian; Bable; Leonese; Asturleonese","asturien; bable; léonais; asturoléonais
32
+ "ath","","","Athapascan languages","athapascanes, langues
33
+ "aus","","","Australian languages","australiennes, langues
34
+ "ava","","av","Avaric","avar
35
+ "ave","","ae","Avestan","avestique
36
+ "awa","","","Awadhi","awadhi
37
+ "aym","","ay","Aymara","aymara
38
+ "aze","","az","Azerbaijani","azéri
39
+ "bad","","","Banda languages","banda, langues
40
+ "bai","","","Bamileke languages","bamiléké, langues
41
+ "bak","","ba","Bashkir","bachkir
42
+ "bal","","","Baluchi","baloutchi
43
+ "bam","","bm","Bambara","bambara
44
+ "ban","","","Balinese","balinais
45
+ "baq","eus","eu","Basque","basque
46
+ "bas","","","Basa","basa
47
+ "bat","","","Baltic languages","baltes, langues
48
+ "bej","","","Beja; Bedawiyet","bedja
49
+ "bel","","be","Belarusian","biélorusse
50
+ "bem","","","Bemba","bemba
51
+ "ben","","bn","Bengali","bengali
52
+ "ber","","","Berber languages","berbères, langues
53
+ "bho","","","Bhojpuri","bhojpuri
54
+ "bih","","","Bihari languages","langues biharis
55
+ "bik","","","Bikol","bikol
56
+ "bin","","","Bini; Edo","bini; edo
57
+ "bis","","bi","Bislama","bichlamar
58
+ "bla","","","Siksika","blackfoot
59
+ "bnt","","","Bantu languages","bantou, langues
60
+ "bos","","bs","Bosnian","bosniaque
61
+ "bra","","","Braj","braj
62
+ "bre","","br","Breton","breton
63
+ "btk","","","Batak languages","batak, langues
64
+ "bua","","","Buriat","bouriate
65
+ "bug","","","Buginese","bugi
66
+ "bul","","bg","Bulgarian","bulgare
67
+ "bur","mya","my","Burmese","birman
68
+ "byn","","","Blin; Bilin","blin; bilen
69
+ "cad","","","Caddo","caddo
70
+ "cai","","","Central American Indian languages","amérindiennes de L'Amérique centrale, langues
71
+ "car","","","Galibi Carib","karib; galibi; carib
72
+ "cat","","ca","Catalan; Valencian","catalan; valencien
73
+ "cau","","","Caucasian languages","caucasiennes, langues
74
+ "ceb","","","Cebuano","cebuano
75
+ "cel","","","Celtic languages","celtiques, langues; celtes, langues
76
+ "cha","","ch","Chamorro","chamorro
77
+ "chb","","","Chibcha","chibcha
78
+ "che","","ce","Chechen","tchétchène
79
+ "chg","","","Chagatai","djaghataï
80
+ "chi","zho","zh","Chinese","chinois
81
+ "chk","","","Chuukese","chuuk
82
+ "chm","","","Mari","mari
83
+ "chn","","","Chinook jargon","chinook, jargon
84
+ "cho","","","Choctaw","choctaw
85
+ "chp","","","Chipewyan; Dene Suline","chipewyan
86
+ "chr","","","Cherokee","cherokee
87
+ "chu","","cu","Church Slavic; Old Slavonic; Church Slavonic; Old Bulgarian; Old Church Slavonic","slavon d'église; vieux slave; slavon liturgique; vieux bulgare
88
+ "chv","","cv","Chuvash","tchouvache
89
+ "chy","","","Cheyenne","cheyenne
90
+ "cmc","","","Chamic languages","chames, langues
91
+ "cnr","","","Montenegrin","monténégrin
92
+ "cop","","","Coptic","copte
93
+ "cor","","kw","Cornish","cornique
94
+ "cos","","co","Corsican","corse
95
+ "cpe","","","Creoles and pidgins, English based","créoles et pidgins basés sur l'anglais
96
+ "cpf","","","Creoles and pidgins, French-based","créoles et pidgins basés sur le français
97
+ "cpp","","","Creoles and pidgins, Portuguese-based","créoles et pidgins basés sur le portugais
98
+ "cre","","cr","Cree","cree
99
+ "crh","","","Crimean Tatar; Crimean Turkish","tatar de Crimé
100
+ "crp","","","Creoles and pidgins","créoles et pidgins
101
+ "csb","","","Kashubian","kachoube
102
+ "cus","","","Cushitic languages","couchitiques, langues
103
+ "cze","ces","cs","Czech","tchèque
104
+ "dak","","","Dakota","dakota
105
+ "dan","","da","Danish","danois
106
+ "dar","","","Dargwa","dargwa
107
+ "day","","","Land Dayak languages","dayak, langues
108
+ "del","","","Delaware","delaware
109
+ "den","","","Slave (Athapascan)","esclave (athapascan)
110
+ "dgr","","","Tlicho; Dogrib","tlicho; dogrib
111
+ "din","","","Dinka","dinka
112
+ "div","","dv","Divehi; Dhivehi; Maldivian","maldivien
113
+ "doi","","","Dogri","dogri
114
+ "dra","","","Dravidian languages","dravidiennes, langues
115
+ "dsb","","","Lower Sorbian","bas-sorabe
116
+ "dua","","","Duala","douala
117
+ "dum","","","Dutch, Middle (ca.1050-1350)","néerlandais moyen (ca. 1050-1350)
118
+ "dut","nld","nl","Dutch; Flemish","néerlandais; flamand
119
+ "dyu","","","Dyula","dioula
120
+ "dzo","","dz","Dzongkha","dzongkha
121
+ "efi","","","Efik","efik
122
+ "egy","","","Egyptian (Ancient)","égyptien
123
+ "eka","","","Ekajuk","ekajuk
124
+ "elx","","","Elamite","élamite
125
+ "eng","","en","English","anglais
126
+ "enm","","","English, Middle (1100-1500)","anglais moyen (1100-1500)
127
+ "epo","","eo","Esperanto","espéranto
128
+ "est","","et","Estonian","estonien
129
+ "ewe","","ee","Ewe","éwé
130
+ "ewo","","","Ewondo","éwondo
131
+ "fan","","","Fang","fang
132
+ "fao","","fo","Faroese","féroïen
133
+ "fat","","","Fanti","fanti
134
+ "fij","","fj","Fijian","fidjien
135
+ "fil","","","Filipino; Pilipino","filipino; pilipino
136
+ "fin","","fi","Finnish","finnois
137
+ "fiu","","","Finno-Ugrian languages","finno-ougriennes, langues
138
+ "fon","","","Fon","fon
139
+ "fre","fra","fr","French","français
140
+ "frm","","","French, Middle (ca.1400-1600)","français moyen (1400-1600)
141
+ "fro","","","French, Old (842-ca.1400)","français ancien (842-ca.1400)
142
+ "frr","","","Northern Frisian","frison septentrional
143
+ "frs","","","Eastern Frisian","frison oriental
144
+ "fry","","fy","Western Frisian","frison occidental
145
+ "ful","","ff","Fulah","peul
146
+ "fur","","","Friulian","frioulan
147
+ "gaa","","","Ga","ga
148
+ "gay","","","Gayo","gayo
149
+ "gba","","","Gbaya","gbaya
150
+ "gem","","","Germanic languages","germaniques, langues
151
+ "geo","kat","ka","Georgian","géorgien
152
+ "ger","deu","de","German","allemand
153
+ "gez","","","Geez","guèze
154
+ "gil","","","Gilbertese","kiribati
155
+ "gla","","gd","Gaelic; Scottish Gaelic","gaélique; gaélique écossais
156
+ "gle","","ga","Irish","irlandais
157
+ "glg","","gl","Galician","galicien
158
+ "glv","","gv","Manx","manx; mannois
159
+ "gmh","","","German, Middle High (ca.1050-1500)","allemand, moyen haut (ca. 1050-1500)
160
+ "goh","","","German, Old High (ca.750-1050)","allemand, vieux haut (ca. 750-1050)
161
+ "gon","","","Gondi","gond
162
+ "gor","","","Gorontalo","gorontalo
163
+ "got","","","Gothic","gothique
164
+ "grb","","","Grebo","grebo
165
+ "grc","","","Greek, Ancient (to 1453)","grec ancien (jusqu'à 1453)
166
+ "gre","ell","el","Greek, Modern (1453-)","grec moderne (après 1453)
167
+ "grn","","gn","Guarani","guarani
168
+ "gsw","","","Swiss German; Alemannic; Alsatian","suisse alémanique; alémanique; alsacien
169
+ "guj","","gu","Gujarati","goudjrati
170
+ "gwi","","","Gwich'in","gwich'in
171
+ "hai","","","Haida","haida
172
+ "hat","","ht","Haitian; Haitian Creole","haïtien; créole haïtien
173
+ "hau","","ha","Hausa","haoussa
174
+ "haw","","","Hawaiian","hawaïen
175
+ "heb","","he","Hebrew","hébreu
176
+ "her","","hz","Herero","herero
177
+ "hil","","","Hiligaynon","hiligaynon
178
+ "him","","","Himachali languages; Western Pahari languages","langues himachalis; langues paharis occidentales
179
+ "hin","","hi","Hindi","hindi
180
+ "hit","","","Hittite","hittite
181
+ "hmn","","","Hmong; Mong","hmong
182
+ "hmo","","ho","Hiri Motu","hiri motu
183
+ "hrv","","hr","Croatian","croate
184
+ "hsb","","","Upper Sorbian","haut-sorabe
185
+ "hun","","hu","Hungarian","hongrois
186
+ "hup","","","Hupa","hupa
187
+ "iba","","","Iban","iban
188
+ "ibo","","ig","Igbo","igbo
189
+ "ice","isl","is","Icelandic","islandais
190
+ "ido","","io","Ido","ido
191
+ "iii","","ii","Sichuan Yi; Nuosu","yi de Sichuan
192
+ "ijo","","","Ijo languages","ijo, langues
193
+ "iku","","iu","Inuktitut","inuktitut
194
+ "ile","","ie","Interlingue; Occidental","interlingue
195
+ "ilo","","","Iloko","ilocano
196
+ "ina","","ia","Interlingua (International Auxiliary Language Association)","interlingua (langue auxiliaire internationale)
197
+ "inc","","","Indic languages","indo-aryennes, langues
198
+ "ind","","id","Indonesian","indonésien
199
+ "ine","","","Indo-European languages","indo-européennes, langues
200
+ "inh","","","Ingush","ingouche
201
+ "ipk","","ik","Inupiaq","inupiaq
202
+ "ira","","","Iranian languages","iraniennes, langues
203
+ "iro","","","Iroquoian languages","iroquoises, langues
204
+ "ita","","it","Italian","italien
205
+ "jav","","jv","Javanese","javanais
206
+ "jbo","","","Lojban","lojban
207
+ "jpn","","ja","Japanese","japonais
208
+ "jpr","","","Judeo-Persian","judéo-persan
209
+ "jrb","","","Judeo-Arabic","judéo-arabe
210
+ "kaa","","","Kara-Kalpak","karakalpak
211
+ "kab","","","Kabyle","kabyle
212
+ "kac","","","Kachin; Jingpho","kachin; jingpho
213
+ "kal","","kl","Kalaallisut; Greenlandic","groenlandais
214
+ "kam","","","Kamba","kamba
215
+ "kan","","kn","Kannada","kannada
216
+ "kar","","","Karen languages","karen, langues
217
+ "kas","","ks","Kashmiri","kashmiri
218
+ "kau","","kr","Kanuri","kanouri
219
+ "kaw","","","Kawi","kawi
220
+ "kaz","","kk","Kazakh","kazakh
221
+ "kbd","","","Kabardian","kabardien
222
+ "kha","","","Khasi","khasi
223
+ "khi","","","Khoisan languages","khoïsan, langues
224
+ "khm","","km","Central Khmer","khmer central
225
+ "kho","","","Khotanese; Sakan","khotanais; sakan
226
+ "kik","","ki","Kikuyu; Gikuyu","kikuyu
227
+ "kin","","rw","Kinyarwanda","rwanda
228
+ "kir","","ky","Kirghiz; Kyrgyz","kirghiz
229
+ "kmb","","","Kimbundu","kimbundu
230
+ "kok","","","Konkani","konkani
231
+ "kom","","kv","Komi","kom
232
+ "kon","","kg","Kongo","kongo
233
+ "kor","","ko","Korean","coréen
234
+ "kos","","","Kosraean","kosrae
235
+ "kpe","","","Kpelle","kpellé
236
+ "krc","","","Karachay-Balkar","karatchai balkar
237
+ "krl","","","Karelian","carélien
238
+ "kro","","","Kru languages","krou, langues
239
+ "kru","","","Kurukh","kurukh
240
+ "kua","","kj","Kuanyama; Kwanyama","kuanyama; kwanyama
241
+ "kum","","","Kumyk","koumyk
242
+ "kur","","ku","Kurdish","kurde
243
+ "kut","","","Kutenai","kutenai
244
+ "lad","","","Ladino","judéo-espagnol
245
+ "lah","","","Lahnda","lahnda
246
+ "lam","","","Lamba","lamba
247
+ "lao","","lo","Lao","lao
248
+ "lat","","la","Latin","latin
249
+ "lav","","lv","Latvian","letton
250
+ "lez","","","Lezghian","lezghien
251
+ "lim","","li","Limburgan; Limburger; Limburgish","limbourgeois
252
+ "lin","","ln","Lingala","lingala
253
+ "lit","","lt","Lithuanian","lituanien
254
+ "lol","","","Mongo","mongo
255
+ "loz","","","Lozi","lozi
256
+ "ltz","","lb","Luxembourgish; Letzeburgesch","luxembourgeois
257
+ "lua","","","Luba-Lulua","luba-lulua
258
+ "lub","","lu","Luba-Katanga","luba-katanga
259
+ "lug","","lg","Ganda","ganda
260
+ "lui","","","Luiseno","luiseno
261
+ "lun","","","Lunda","lunda
262
+ "luo","","","Luo (Kenya and Tanzania)","luo (Kenya et Tanzanie)
263
+ "lus","","","Lushai","lushai
264
+ "mac","mkd","mk","Macedonian","macédonien
265
+ "mad","","","Madurese","madourais
266
+ "mag","","","Magahi","magahi
267
+ "mah","","mh","Marshallese","marshall
268
+ "mai","","","Maithili","maithili
269
+ "mak","","","Makasar","makassar
270
+ "mal","","ml","Malayalam","malayalam
271
+ "man","","","Mandingo","mandingue
272
+ "mao","mri","mi","Maori","maori
273
+ "map","","","Austronesian languages","austronésiennes, langues
274
+ "mar","","mr","Marathi","marathe
275
+ "mas","","","Masai","massaï
276
+ "may","msa","ms","Malay","malais
277
+ "mdf","","","Moksha","moksa
278
+ "mdr","","","Mandar","mandar
279
+ "men","","","Mende","mendé
280
+ "mga","","","Irish, Middle (900-1200)","irlandais moyen (900-1200)
281
+ "mic","","","Mi'kmaq; Micmac","mi'kmaq; micmac
282
+ "min","","","Minangkabau","minangkabau
283
+ "mis","","","Uncoded languages","langues non codées
284
+ "mkh","","","Mon-Khmer languages","môn-khmer, langues
285
+ "mlg","","mg","Malagasy","malgache
286
+ "mlt","","mt","Maltese","maltais
287
+ "mnc","","","Manchu","mandchou
288
+ "mni","","","Manipuri","manipuri
289
+ "mno","","","Manobo languages","manobo, langues
290
+ "moh","","","Mohawk","mohawk
291
+ "mon","","mn","Mongolian","mongol
292
+ "mos","","","Mossi","moré
293
+ "mul","","","Multiple languages","multilingue
294
+ "mun","","","Munda languages","mounda, langues
295
+ "mus","","","Creek","muskogee
296
+ "mwl","","","Mirandese","mirandais
297
+ "mwr","","","Marwari","marvari
298
+ "myn","","","Mayan languages","maya, langues
299
+ "myv","","","Erzya","erza
300
+ "nah","","","Nahuatl languages","nahuatl, langues
301
+ "nai","","","North American Indian languages","nord-amérindiennes, langues
302
+ "nap","","","Neapolitan","napolitain
303
+ "nau","","na","Nauru","nauruan
304
+ "nav","","nv","Navajo; Navaho","navaho
305
+ "nbl","","nr","Ndebele, South; South Ndebele","ndébélé du Sud
306
+ "nde","","nd","Ndebele, North; North Ndebele","ndébélé du Nord
307
+ "ndo","","ng","Ndonga","ndonga
308
+ "nds","","","Low German; Low Saxon; German, Low; Saxon, Low","bas allemand; bas saxon; allemand, bas; saxon, bas
309
+ "nep","","ne","Nepali","népalais
310
+ "new","","","Nepal Bhasa; Newari","nepal bhasa; newari
311
+ "nia","","","Nias","nias
312
+ "nic","","","Niger-Kordofanian languages","nigéro-kordofaniennes, langues
313
+ "niu","","","Niuean","niué
314
+ "nno","","nn","Norwegian Nynorsk; Nynorsk, Norwegian","norvégien nynorsk; nynorsk, norvégien
315
+ "nob","","nb","Bokmål, Norwegian; Norwegian Bokmål","norvégien bokmål
316
+ "nog","","","Nogai","nogaï; nogay
317
+ "non","","","Norse, Old","norrois, vieux
318
+ "nor","","no","Norwegian","norvégien
319
+ "nqo","","","N'Ko","n'ko
320
+ "nso","","","Pedi; Sepedi; Northern Sotho","pedi; sepedi; sotho du Nord
321
+ "nub","","","Nubian languages","nubiennes, langues
322
+ "nwc","","","Classical Newari; Old Newari; Classical Nepal Bhasa","newari classique
323
+ "nya","","ny","Chichewa; Chewa; Nyanja","chichewa; chewa; nyanja
324
+ "nym","","","Nyamwezi","nyamwezi
325
+ "nyn","","","Nyankole","nyankolé
326
+ "nyo","","","Nyoro","nyoro
327
+ "nzi","","","Nzima","nzema
328
+ "oci","","oc","Occitan (post 1500)","occitan (après 1500)
329
+ "oji","","oj","Ojibwa","ojibwa
330
+ "ori","","or","Oriya","oriya
331
+ "orm","","om","Oromo","galla
332
+ "osa","","","Osage","osage
333
+ "oss","","os","Ossetian; Ossetic","ossète
334
+ "ota","","","Turkish, Ottoman (1500-1928)","turc ottoman (1500-1928)
335
+ "oto","","","Otomian languages","otomi, langues
336
+ "paa","","","Papuan languages","papoues, langues
337
+ "pag","","","Pangasinan","pangasinan
338
+ "pal","","","Pahlavi","pahlavi
339
+ "pam","","","Pampanga; Kapampangan","pampangan
340
+ "pan","","pa","Panjabi; Punjabi","pendjabi
341
+ "pap","","","Papiamento","papiamento
342
+ "pau","","","Palauan","palau
343
+ "peo","","","Persian, Old (ca.600-400 B.C.)","perse, vieux (ca. 600-400 av. J.-C.)
344
+ "per","fas","fa","Persian","persan
345
+ "phi","","","Philippine languages","philippines, langues
346
+ "phn","","","Phoenician","phénicien
347
+ "pli","","pi","Pali","pali
348
+ "pol","","pl","Polish","polonais
349
+ "pon","","","Pohnpeian","pohnpei
350
+ "por","","pt","Portuguese","portugais
351
+ "pra","","","Prakrit languages","prâkrit, langues
352
+ "pro","","","Provençal, Old (to 1500); Occitan, Old (to 1500)","provençal ancien (jusqu'à 1500); occitan ancien (jusqu'à 1500)
353
+ "pus","","ps","Pushto; Pashto","pachto
354
+ "qaa-qtz","","","Reserved for local use","réservée à l'usage local
355
+ "que","","qu","Quechua","quechua
356
+ "raj","","","Rajasthani","rajasthani
357
+ "rap","","","Rapanui","rapanui
358
+ "rar","","","Rarotongan; Cook Islands Maori","rarotonga; maori des îles Cook
359
+ "roa","","","Romance languages","romanes, langues
360
+ "roh","","rm","Romansh","romanche
361
+ "rom","","","Romany","tsigane
362
+ "rum","ron","ro","Romanian; Moldavian; Moldovan","roumain; moldave
363
+ "run","","rn","Rundi","rundi
364
+ "rup","","","Aromanian; Arumanian; Macedo-Romanian","aroumain; macédo-roumain
365
+ "rus","","ru","Russian","russe
366
+ "sad","","","Sandawe","sandawe
367
+ "sag","","sg","Sango","sango
368
+ "sah","","","Yakut","iakoute
369
+ "sai","","","South American Indian languages","sud-amérindiennes, langues
370
+ "sal","","","Salishan languages","salishennes, langues
371
+ "sam","","","Samaritan Aramaic","samaritain
372
+ "san","","sa","Sanskrit","sanskrit
373
+ "sas","","","Sasak","sasak
374
+ "sat","","","Santali","santal
375
+ "scn","","","Sicilian","sicilien
376
+ "sco","","","Scots","écossais
377
+ "sel","","","Selkup","selkoupe
378
+ "sem","","","Semitic languages","sémitiques, langues
379
+ "sga","","","Irish, Old (to 900)","irlandais ancien (jusqu'à 900)
380
+ "sgn","","","Sign Languages","langues des signes
381
+ "shn","","","Shan","chan
382
+ "sid","","","Sidamo","sidamo
383
+ "sin","","si","Sinhala; Sinhalese","singhalais
384
+ "sio","","","Siouan languages","sioux, langues
385
+ "sit","","","Sino-Tibetan languages","sino-tibétaines, langues
386
+ "sla","","","Slavic languages","slaves, langues
387
+ "slo","slk","sk","Slovak","slovaque
388
+ "slv","","sl","Slovenian","slovène
389
+ "sma","","","Southern Sami","sami du Sud
390
+ "sme","","se","Northern Sami","sami du Nord
391
+ "smi","","","Sami languages","sames, langues
392
+ "smj","","","Lule Sami","sami de Lule
393
+ "smn","","","Inari Sami","sami d'Inari
394
+ "smo","","sm","Samoan","samoan
395
+ "sms","","","Skolt Sami","sami skolt
396
+ "sna","","sn","Shona","shona
397
+ "snd","","sd","Sindhi","sindhi
398
+ "snk","","","Soninke","soninké
399
+ "sog","","","Sogdian","sogdien
400
+ "som","","so","Somali","somali
401
+ "son","","","Songhai languages","songhai, langues
402
+ "sot","","st","Sotho, Southern","sotho du Sud
403
+ "spa","","es","Spanish; Castilian","espagnol; castillan
404
+ "srd","","sc","Sardinian","sarde
405
+ "srn","","","Sranan Tongo","sranan tongo
406
+ "srp","","sr","Serbian","serbe
407
+ "srr","","","Serer","sérère
408
+ "ssa","","","Nilo-Saharan languages","nilo-sahariennes, langues
409
+ "ssw","","ss","Swati","swati
410
+ "suk","","","Sukuma","sukuma
411
+ "sun","","su","Sundanese","soundanais
412
+ "sus","","","Susu","soussou
413
+ "sux","","","Sumerian","sumérien
414
+ "swa","","sw","Swahili","swahili
415
+ "swe","","sv","Swedish","suédois
416
+ "syc","","","Classical Syriac","syriaque classique
417
+ "syr","","","Syriac","syriaque
418
+ "tah","","ty","Tahitian","tahitien
419
+ "tai","","","Tai languages","tai, langues
420
+ "tam","","ta","Tamil","tamoul
421
+ "tat","","tt","Tatar","tatar
422
+ "tel","","te","Telugu","télougou
423
+ "tem","","","Timne","temne
424
+ "ter","","","Tereno","tereno
425
+ "tet","","","Tetum","tetum
426
+ "tgk","","tg","Tajik","tadjik
427
+ "tgl","","tl","Tagalog","tagalog
428
+ "tha","","th","Thai","thaï
429
+ "tib","bod","bo","Tibetan","tibétain
430
+ "tig","","","Tigre","tigré
431
+ "tir","","ti","Tigrinya","tigrigna
432
+ "tiv","","","Tiv","tiv
433
+ "tkl","","","Tokelau","tokelau
434
+ "tlh","","","Klingon; tlhIngan-Hol","klingon
435
+ "tli","","","Tlingit","tlingit
436
+ "tmh","","","Tamashek","tamacheq
437
+ "tog","","","Tonga (Nyasa)","tonga (Nyasa)
438
+ "ton","","to","Tonga (Tonga Islands)","tongan (Îles Tonga)
439
+ "tpi","","","Tok Pisin","tok pisin
440
+ "tsi","","","Tsimshian","tsimshian
441
+ "tsn","","tn","Tswana","tswana
442
+ "tso","","ts","Tsonga","tsonga
443
+ "tuk","","tk","Turkmen","turkmène
444
+ "tum","","","Tumbuka","tumbuka
445
+ "tup","","","Tupi languages","tupi, langues
446
+ "tur","","tr","Turkish","turc
447
+ "tut","","","Altaic languages","altaïques, langues
448
+ "tvl","","","Tuvalu","tuvalu
449
+ "twi","","tw","Twi","twi
450
+ "tyv","","","Tuvinian","touva
451
+ "udm","","","Udmurt","oudmourte
452
+ "uga","","","Ugaritic","ougaritique
453
+ "uig","","ug","Uighur; Uyghur","ouïgour
454
+ "ukr","","uk","Ukrainian","ukrainien
455
+ "umb","","","Umbundu","umbundu
456
+ "und","","","Undetermined","indéterminée
457
+ "urd","","ur","Urdu","ourdou
458
+ "uzb","","uz","Uzbek","ouszbek
459
+ "vai","","","Vai","vaï
460
+ "ven","","ve","Venda","venda
461
+ "vie","","vi","Vietnamese","vietnamien
462
+ "vol","","vo","Volapük","volapük
463
+ "vot","","","Votic","vote
464
+ "wak","","","Wakashan languages","wakashanes, langues
465
+ "wal","","","Wolaitta; Wolaytta","wolaitta; wolaytta
466
+ "war","","","Waray","waray
467
+ "was","","","Washo","washo
468
+ "wel","cym","cy","Welsh","gallois
469
+ "wen","","","Sorbian languages","sorabes, langues
470
+ "wln","","wa","Walloon","wallon
471
+ "wol","","wo","Wolof","wolof
472
+ "xal","","","Kalmyk; Oirat","kalmouk; oïrat
473
+ "xho","","xh","Xhosa","xhosa
474
+ "yao","","","Yao","yao
475
+ "yap","","","Yapese","yapois
476
+ "yid","","yi","Yiddish","yiddish
477
+ "yor","","yo","Yoruba","yoruba
478
+ "ypk","","","Yupik languages","yupik, langues
479
+ "zap","","","Zapotec","zapotèque
480
+ "zbl","","","Blissymbols; Blissymbolics; Bliss","symboles Bliss; Bliss
481
+ "zen","","","Zenaga","zenaga
482
+ "zgh","","","Standard Moroccan Tamazight","amazighe standard marocain
483
+ "zha","","za","Zhuang; Chuang","zhuang; chuang
484
+ "znd","","","Zande languages","zandé, langues
485
+ "zul","","zu","Zulu","zoulou
486
+ "zun","","","Zuni","zuni
487
+ "zxx","","","No linguistic content; Not applicable","pas de contenu linguistique; non applicable
488
+ "zza","","","Zaza; Dimili; Dimli; Kirdki; Kirmanjki; Zazaki","zaza; dimili; dimli; kirdki; kirmanjki; zazaki"