fastText-LID-323 / README.md
cointegrated's picture
Update README.md
487c6ac
metadata
library_name: fasttext
tags:
  - text-classification
  - language-identification

This is a fastText-based language classification model from the paper The first neural machine translation system for the Erzya language.

It supports 323 languages used in Wikipedia (as of July 2022), and has extended support of the Erzya (myv) and Moksha (mdf) languages.

Example usage:

import fasttext
import urllib.request
import os
model_path = 'lid.323.ftz'
url = 'https://huggingface.co/slone/fastText-LID-323/resolve/main/lid.323.ftz'
if not os.path.exists(model_path):
    urllib.request.urlretrieve(url, model_path)  # or just download it manually

model = fasttext.load_model(model_path)
languages, scores = model.predict("эрзянь кель", k=3)  # k is the number of returned hypotheses

The model was trained on texts of articles randomly sampled from Wikipedia. It works better with sentences and longer texts than with words, and may be sensitive to noise.