metadata
library_name: fasttext
tags:
- text-classification
- language-identification
This is a fastText-based language classification model from the paper The first neural machine translation system for the Erzya language.
It supports 323 languages used in Wikipedia (as of July 2022), and has extended support of the Erzya (myv
) and Moksha (mdf
) languages.
Example usage:
import fasttext
import urllib.request
import os
model_path = 'lid.323.ftz'
url = 'https://huggingface.co/slone/fastText-LID-323/resolve/main/lid.323.ftz'
if not os.path.exists(model_path):
urllib.request.urlretrieve(url, model_path) # or just download it manually
model = fasttext.load_model(model_path)
languages, scores = model.predict("эрзянь кель", k=3) # k is the number of returned hypotheses
The model was trained on texts of articles randomly sampled from Wikipedia. It works better with sentences and longer texts than with words, and may be sensitive to noise.