Edit model card

RoBERTa for Single Language Classification

Training

RoBERTa fine-tuned on small parts of Open Subtitles, Oscar and Tatoeba datasets (~9k samples per language).

data source language
open_subtitles ka, he, en, de
oscar be, kk, az, hu
tatoeba ru, uk

Validation

The metrics obtained from validation on the another part of dataset (~1k samples per language).

index class f1-score precision recall support
0 az 0.998 0.997 1.0 997
1 be 0.996 0.998 0.994 1004
2 de 0.976 0.966 0.987 979
3 en 0.976 0.986 0.967 1020
4 he 1.0 1.0 0.999 1001
5 hy 0.994 0.991 0.998 993
6 ka 0.999 0.999 0.999 1000
7 kk 0.996 0.998 0.993 1005
8 uk 0.982 0.997 0.968 1030
9 ru 0.982 0.968 0.997 971
10 macro_avg 0.99 0.99 0.99 10000
11 weighted avg 0.99 0.99 0.99 10000
Downloads last month
10
Inference API
This model can be loaded on Inference API (serverless).

Datasets used to train nikitast/lang-classifier-roberta