cis-lmu
/

glotlid

Text Classification

language-identification

Model card Files Files and versions Community

glotlid / README.md

kargaranamir's picture

Update README.md

f6d6f66 about 1 year ago

|

1.6 kB

	---
	license: apache-2.0
	tags:
	- text-classification
	- language-identification
	library_name: fasttext
	datasets:
	- cis-lmu/GlotSparse
	- cis-lmu/GlotStoryBook
	metrics:
	- f1
	---


	# GlotLID

	## Description

	GlotLID is a Fasttext language identification (LID) model for around 2000 languages.


	### How to use

	Here is how to use this model to detect the language of a given text:

	```python
	>>> import fasttext
	>>> from huggingface_hub import hf_hub_download

	>>> model_path = hf_hub_download(repo_id="cis-lmu/glotlid", filename="model.bin")
	>>> model = fasttext.load_model(model_path)
	>>> model.predict("Hello, world!")

	```

	## License

	The model is distributed under the Apache License, Version 2.0.

	## Version

	We always maintain the previous version of GlotLID in our repository.

	To access a specific version, simply append the version number to the `filename`.

	- For v1: `model_v1.bin` (introduced in the GlotLID paper and used in all experiments).
	- For v2: `model_v2.bin` (an edited version of v1, featuring more languages, and cleaned from noisy corpora based on the analysis of v1).

	`model.bin` always refers to the latest version (v2).


	## References

	If you use this model, please cite the following paper:

	```
	@inproceedings{
	kargaran2023glotlid,
	title={{GlotLID}: Language Identification for Low-Resource Languages},
	author={Kargaran, Amir Hossein and Imani, Ayyoob and Yvon, Fran{\c{c}}ois and Sch{\"u}tze, Hinrich},
	booktitle={The 2023 Conference on Empirical Methods in Natural Language Processing},
	year={2023},
	url={https://openreview.net/forum?id=dl4e3EBz5j}
	}

	```