SebOchs commited on
Commit
d06ff02
1 Parent(s): 2790ac2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -0
README.md CHANGED
@@ -266,6 +266,27 @@ Canine model trained on WiLI-2018 dataset to identify the language of a text.
266
  - Accuracy: 94,92%
267
  - Macro F1-score: 94,91%
268
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
269
  ### Credit to
270
  ```
271
  @article{clark-etal-2022-canine,
 
266
  - Accuracy: 94,92%
267
  - Macro F1-score: 94,91%
268
 
269
+ ### Inference
270
+ Dictionary to return English names for a label id:
271
+ ```python
272
+ import datasets
273
+ import pycountry
274
+ def int_to_lang():
275
+ dataset = datasets.load_dataset('wili_2018')
276
+ # names for languages not in iso-639-3 from wikipedia
277
+ non_iso_languages = {'roa-tara': 'Tarantino', 'zh-yue': 'Cantonese', 'map-bms': 'Banyumasan',
278
+ 'nds-nl': 'Dutch Low Saxon', 'be-tarask': 'Belarusian'}
279
+ # create dictionary from data set labels to language names
280
+ lab_to_lang = {}
281
+ for i, lang in enumerate(dataset['train'].features['label'].names):
282
+ full_lang = pycountry.languages.get(alpha_3=lang)
283
+ if full_lang:
284
+ lab_to_lang[i] = full_lang.name
285
+ else:
286
+ lab_to_lang[i] = non_iso_languages[lang]
287
+ return lab_to_lang
288
+ ```
289
+
290
  ### Credit to
291
  ```
292
  @article{clark-etal-2022-canine,