Text Classification
fastText
2155 languages
language-identification
kargaranamir commited on
Commit
0c503b1
1 Parent(s): 04e93aa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -0
README.md CHANGED
@@ -1,3 +1,56 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ tags:
4
+ - text-classification
5
+ - language-identification
6
+ library_name: fasttext
7
+ datasets:
8
+ - cis-lmu/GlotSparse
9
+ - cis-lmu/GlotStoryBook
10
+ metrics:
11
+ - f1
12
  ---
13
+
14
+
15
+ # GlotLID
16
+
17
+ ## Description
18
+
19
+ GlotLID is a Fasttext language identification (LID) model for around 2000 languages.
20
+
21
+
22
+ ### How to use
23
+
24
+ Here is how to use this model to detect the language of a given text:
25
+
26
+ ```python
27
+ >>> import fasttext
28
+ >>> from huggingface_hub import hf_hub_download
29
+
30
+ >>> model_path = hf_hub_download(repo_id="cis-lmu/GlotLID", filename="model.bin")
31
+ >>> model = fasttext.load_model(model_path)
32
+ >>> model.predict("Hello, world!")
33
+
34
+ >>> model.predict("Hello, world!", k=2)
35
+
36
+ ```
37
+
38
+ ## License
39
+
40
+ The model is distributed under the Apache License, Version 2.0.
41
+
42
+ ## References
43
+
44
+ If you use this model, please cite the following paper:
45
+
46
+ ```
47
+ @inproceedings{
48
+ kargaran2023glotlid,
49
+ title={{GlotLID}: Language Identification for Low-Resource Languages},
50
+ author={Kargaran, Amir Hossein and Imani, Ayyoob and Yvon, Fran{\c{c}}ois and Sch{\"u}tze, Hinrich},
51
+ booktitle={The 2023 Conference on Empirical Methods in Natural Language Processing},
52
+ year={2023},
53
+ url={https://openreview.net/forum?id=dl4e3EBz5j}
54
+ }
55
+
56
+ ```