File size: 1,022 Bytes
0f716f8
 
 
487c6ac
 
0f716f8
46f7fb2
0f716f8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
---
library_name: fasttext
tags:
  - text-classification
  - language-identification
---
This is a fastText-based language classification model from the paper [The first neural machine translation system for the Erzya language](https://arxiv.org/abs/2209.09368).

It supports 323 languages used in Wikipedia (as of July 2022), and has extended support of the Erzya (`myv`) and Moksha (`mdf`) languages.

Example usage:

```Python
import fasttext
import urllib.request
import os
model_path = 'lid.323.ftz'
url = 'https://huggingface.co/slone/fastText-LID-323/resolve/main/lid.323.ftz'
if not os.path.exists(model_path):
    urllib.request.urlretrieve(url, model_path)  # or just download it manually

model = fasttext.load_model(model_path)
languages, scores = model.predict("эрзянь кель", k=3)  # k is the number of returned hypotheses
```

The model was trained on texts of articles randomly sampled from Wikipedia. It works better with sentences and longer texts than with words, and may be sensitive to noise.