cis-lmu
/

glotlid

kargaranamir commited on Oct 20, 2023

Commit

0c503b1

1 Parent(s): 04e93aa

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,3 +1,56 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+tags:
+- text-classification
+- language-identification
+library_name: fasttext
+datasets:
+- cis-lmu/GlotSparse
+- cis-lmu/GlotStoryBook
+metrics:
+- f1
 ---
+# GlotLID
+## Description
+GlotLID is a Fasttext language identification (LID) model for around 2000 languages.
+### How to use
+Here is how to use this model to detect the language of a given text:
+```python
+>>> import fasttext
+>>> from huggingface_hub import hf_hub_download
+>>> model_path = hf_hub_download(repo_id="cis-lmu/GlotLID", filename="model.bin")
+>>> model = fasttext.load_model(model_path)
+>>> model.predict("Hello, world!")
+>>> model.predict("Hello, world!", k=2)
+```
+## License
+The model is distributed under the Apache License, Version 2.0.
+## References
+If you use this model, please cite the following paper:
+```
+@inproceedings{
+  kargaran2023glotlid,
+  title={{GlotLID}: Language Identification for Low-Resource Languages},
+  author={Kargaran, Amir Hossein and Imani, Ayyoob and Yvon, Fran{\c{c}}ois and Sch{\"u}tze, Hinrich},
+  booktitle={The 2023 Conference on Empirical Methods in Natural Language Processing},
+  year={2023},
+  url={https://openreview.net/forum?id=dl4e3EBz5j}
+}
+```