nikitast commited on
Commit
33ed588
1 Parent(s): a4fede0

nice readme

Browse files
Files changed (1) hide show
  1. README.md +24 -6
README.md CHANGED
@@ -18,12 +18,30 @@ datasets:
18
  - oscar
19
  ---
20
 
21
- Model for Single Language Classification in texts. Supports 10 languages: ru, uk, be, kk, az, hy, ka, he, en, de.
 
 
22
 
23
- Model trained on small parts of Open Subtitles, Oscar and Tatoeba datasets (~9k samples per language).
 
 
 
 
24
 
25
- The metrics obtained from validation on part of dataset (~1k samples per language).
 
26
 
27
- | eval_accuracy | eval_az_f1-score | eval_az_precision | eval_az_recall | eval_az_support | eval_be_f1-score | eval_be_precision | eval_be_recall | eval_be_support | eval_de_f1-score | eval_de_precision | eval_de_recall | eval_de_support | eval_en_f1-score | eval_en_precision | eval_en_recall | eval_en_support | eval_he_f1-score | eval_he_precision | eval_he_recall | eval_he_support | eval_hy_f1-score | eval_hy_precision | eval_hy_recall | eval_hy_support | eval_ka_f1-score | eval_ka_precision | eval_ka_recall | eval_ka_support | eval_kk_f1-score | eval_kk_precision | eval_kk_recall | eval_kk_support | eval_loss | eval_macro avg_f1-score | eval_macro avg_precision | eval_macro avg_recall | eval_macro avg_support | eval_ru_f1-score | eval_ru_precision | eval_ru_recall | eval_ru_support | eval_uk_f1-score | eval_uk_precision | eval_uk_recall | eval_uk_support | eval_weighted avg_f1-score | eval_weighted avg_precision | eval_weighted avg_recall | eval_weighted avg_support |
28
- | ------------- | ---------------- | ----------------- | -------------- | --------------- | ------------------ | ----------------- | ------------------ | --------------- | ------------------ | ----------------- | ------------------ | --------------- | ------------------ | ----------------- | ------------------ | --------------- | ------------------ | ----------------- | ----------------- | --------------- | ------------------ | ----------------- | ------------------ | --------------- | ---------------- | ----------------- | -------------- | --------------- | ------------------ | ----------------- | ------------------ | --------------- | ------------------- | ----------------------- | ------------------------ | --------------------- | ---------------------- | ------------------ | ----------------- | ------------------ | --------------- | ------------------ | ----------------- | ------------------ | --------------- | -------------------------- | --------------------------- | ------------------------ | ------------------------- |
29
- | 0.99 | 0.99849774661993 | 0.997 | 1 | 997 | 0.9960079840319361 | 0.998 | 0.9940239043824701 | 1004 | 0.9762506316321374 | 0.966 | 0.9867211440245148 | 979 | 0.9762376237623762 | 0.986 | 0.9666666666666667 | 1020 | 0.9995002498750626 | 1 | 0.999000999000999 | 1001 | 0.9944806823883593 | 0.991 | 0.9979859013091642 | 993 | 0.999 | 0.999 | 0.999 | 1000 | 0.9955112219451371 | 0.998 | 0.9930348258706467 | 1005 | 0.04831727221608162 | 0.9899994666596248 | 0.99 | 0.9901305007950791 | 10000 | 0.9822425164890917 | 0.968 | 0.9969104016477858 | 971 | 0.9822660098522168 | 0.997 | 0.9679611650485437 | 1030 | 0.9900005333403753 | 0.9901326000000001 | 0.99 | 10000 |
 
 
 
 
 
 
 
 
 
 
 
 
18
  - oscar
19
  ---
20
 
21
+ # RoBERTa for Single Language Classification
22
+ ## Training
23
+ RoBERTa fine-tuned on small parts of Open Subtitles, Oscar and Tatoeba datasets (~9k samples per language).
24
 
25
+ | data source | language |
26
+ |-----------------|----------------|
27
+ | open_subtitles | ka, he, en, de |
28
+ | oscar | be, kk, az, hu |
29
+ | tatoeba | ru, uk |
30
 
31
+ ## Validation
32
+ The metrics obtained from validation on the another part of dataset (~1k samples per language).
33
 
34
+ |index|class|f1-score|precision|recall|support|
35
+ |---|---|---|---|---|---|
36
+ |0|az|0\.998|0\.997|1\.0|997|
37
+ |1|be|0\.996|0\.998|0\.994|1004|
38
+ |2|de|0\.976|0\.966|0\.987|979|
39
+ |3|en|0\.976|0\.986|0\.967|1020|
40
+ |4|he|1\.0|1\.0|0\.999|1001|
41
+ |5|hy|0\.994|0\.991|0\.998|993|
42
+ |6|ka|0\.999|0\.999|0\.999|1000|
43
+ |7|kk|0\.996|0\.998|0\.993|1005|
44
+ |8|uk|0\.982|0\.997|0\.968|1030|
45
+ |9|ru|0\.982|0\.968|0\.997|971|
46
+ |10|macro\_avg|0\.99|0\.99|0\.99|10000|
47
+ |11|weighted avg|0\.99|0\.99|0\.99|10000|