--- license: apache-2.0 tags: - text-classification - language-identification library_name: fasttext datasets: - cis-lmu/GlotSparse - cis-lmu/GlotStoryBook metrics: - f1 --- # GlotLID [![GlotLID](https://img.shields.io/badge/🤗-Open%20In%20Spaces-blue.svg)](https://huggingface.co/spaces/cis-lmu/glotlid-space) ## Description **GlotLID** is a Fasttext language identification (LID) model that supports more than **1600 languages**. - **Demo:** [huggingface](https://huggingface.co/spaces/cis-lmu/glotlid-space) - **Repository:** [github](https://github.com/cisnlp/GlotLID) - **Paper:** [paper](https://arxiv.org/abs/2310.16248) (EMNLP 2023) - **Point of Contact:** amir@cis.lmu.de ### How to use Here is how to use this model to detect the language of a given text: ```python >>> import fasttext >>> from huggingface_hub import hf_hub_download >>> model_path = hf_hub_download(repo_id="cis-lmu/glotlid", filename="model.bin") >>> model = fasttext.load_model(model_path) >>> model.predict("Hello, world!") ``` If you are not a fan of huggingface_hub, then download the model directyly: ```python >>> ! wget https://huggingface.co/cis-lmu/glotlid/resolve/main/model.bin ``` ```python >>> import fasttext >>> model = fasttext.load_model("/path/to/model.bin") >>> model.predict("Hello, world!") ``` ## License The model is distributed under the Apache License, Version 2.0. ## Version We always maintain the previous version of GlotLID in our repository. To access a specific version, simply append the version number to the `filename`. - For v1: `model_v1.bin` (introduced in the GlotLID [paper](https://arxiv.org/abs/2310.16248) and used in all experiments). - For v2: `model_v2.bin` (an edited version of v1, featuring more languages, and cleaned from noisy corpora based on the analysis of v1). `model.bin` always refers to the latest version (v2). ## References If you use this model, please cite the following paper: ``` @inproceedings{ kargaran2023glotlid, title={{GlotLID}: Language Identification for Low-Resource Languages}, author={Kargaran, Amir Hossein and Imani, Ayyoob and Yvon, Fran{\c{c}}ois and Sch{\"u}tze, Hinrich}, booktitle={The 2023 Conference on Empirical Methods in Natural Language Processing}, year={2023}, url={https://openreview.net/forum?id=dl4e3EBz5j} } ```