Text Classification
fastText
language-identification
glotlid / README.md
kargaranamir's picture
Update README.md
0c503b1
|
raw
history blame
1.19 kB
---
license: apache-2.0
tags:
- text-classification
- language-identification
library_name: fasttext
datasets:
- cis-lmu/GlotSparse
- cis-lmu/GlotStoryBook
metrics:
- f1
---
# GlotLID
## Description
GlotLID is a Fasttext language identification (LID) model for around 2000 languages.
### How to use
Here is how to use this model to detect the language of a given text:
```python
>>> import fasttext
>>> from huggingface_hub import hf_hub_download
>>> model_path = hf_hub_download(repo_id="cis-lmu/GlotLID", filename="model.bin")
>>> model = fasttext.load_model(model_path)
>>> model.predict("Hello, world!")
>>> model.predict("Hello, world!", k=2)
```
## License
The model is distributed under the Apache License, Version 2.0.
## References
If you use this model, please cite the following paper:
```
@inproceedings{
kargaran2023glotlid,
title={{GlotLID}: Language Identification for Low-Resource Languages},
author={Kargaran, Amir Hossein and Imani, Ayyoob and Yvon, Fran{\c{c}}ois and Sch{\"u}tze, Hinrich},
booktitle={The 2023 Conference on Empirical Methods in Natural Language Processing},
year={2023},
url={https://openreview.net/forum?id=dl4e3EBz5j}
}
```