Text Classification
fastText
language-identification
kargaranamir commited on
Commit
e773fb9
1 Parent(s): d27bf79
Files changed (1) hide show
  1. README.md +8 -1
README.md CHANGED
@@ -1852,6 +1852,10 @@ metrics:
1852
 
1853
  **GlotLID** is a Fasttext language identification (LID) model that supports more than **1600 languages**.
1854
 
 
 
 
 
1855
  - **Demo:** [huggingface](https://huggingface.co/spaces/cis-lmu/glotlid-space)
1856
  - **Repository:** [github](https://github.com/cisnlp/GlotLID)
1857
  - **Paper:** [paper](https://arxiv.org/abs/2310.16248) (EMNLP 2023)
@@ -1867,6 +1871,7 @@ Here is how to use this model to detect the language of a given text:
1867
  >>> import fasttext
1868
  >>> from huggingface_hub import hf_hub_download
1869
 
 
1870
  >>> model_path = hf_hub_download(repo_id="cis-lmu/glotlid", filename="model.bin")
1871
  >>> model = fasttext.load_model(model_path)
1872
  >>> model.predict("Hello, world!")
@@ -1898,8 +1903,10 @@ To access a specific version, simply append the version number to the `filename`
1898
 
1899
  - For v1: `model_v1.bin` (introduced in the GlotLID [paper](https://arxiv.org/abs/2310.16248) and used in all experiments).
1900
  - For v2: `model_v2.bin` (an edited version of v1, featuring more languages, and cleaned from noisy corpora based on the analysis of v1).
 
 
1901
 
1902
- `model.bin` always refers to the latest version (v2).
1903
 
1904
 
1905
  ## References
 
1852
 
1853
  **GlotLID** is a Fasttext language identification (LID) model that supports more than **1600 languages**.
1854
 
1855
+
1856
+ **Latest:** GlotLID is now updated to **V3**. V3 supports **2114 labels** (three-letter ISO codes with script). For more details on the supported languages and performance, as well as significant changes from previous versions, please refer to [https://github.com/cisnlp/GlotLID/blob/main/languages-v3.md](https://github.com/cisnlp/GlotLID/blob/main/languages-v3.md).
1857
+
1858
+
1859
  - **Demo:** [huggingface](https://huggingface.co/spaces/cis-lmu/glotlid-space)
1860
  - **Repository:** [github](https://github.com/cisnlp/GlotLID)
1861
  - **Paper:** [paper](https://arxiv.org/abs/2310.16248) (EMNLP 2023)
 
1871
  >>> import fasttext
1872
  >>> from huggingface_hub import hf_hub_download
1873
 
1874
+ # model.bin is the latest version always
1875
  >>> model_path = hf_hub_download(repo_id="cis-lmu/glotlid", filename="model.bin")
1876
  >>> model = fasttext.load_model(model_path)
1877
  >>> model.predict("Hello, world!")
 
1903
 
1904
  - For v1: `model_v1.bin` (introduced in the GlotLID [paper](https://arxiv.org/abs/2310.16248) and used in all experiments).
1905
  - For v2: `model_v2.bin` (an edited version of v1, featuring more languages, and cleaned from noisy corpora based on the analysis of v1).
1906
+ - For v3: `model_v3.bin` (an edited version of v2, featuring more languages, excluding macro languages, further cleaned from noisy corpora and incorrect metadata labels based on the analysis of v2, supporting "zxx" and "und" series labels)
1907
+
1908
 
1909
+ `model.bin` always refers to the latest version (v3).
1910
 
1911
 
1912
  ## References