DerivedFunction
/

polyglot-tagger-v2.2

Token Classification

Generated from Trainer

language-identification

Model card Files Files and versions

Metrics Training metrics Community

DerivedFunction commited on 27 days ago

Commit

862babc

·

verified ·

1 Parent(s): 59264e7

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -136,7 +136,7 @@ Fine-tuned `xlm-roberta-base` for sentence-level language tagging across 100 lan
 The model predicts BIO-style language tags over tokens, which makes it useful for
 language identification, code-switch detection, and multilingual document analysis.
 ## Model description

 The model predicts BIO-style language tags over tokens, which makes it useful for
 language identification, code-switch detection, and multilingual document analysis.
+> Compared to version 2.1, this version had training data that cleaned up mixed script training rows not in the target language, such as Arabic present in Cyrllic languages.
 ## Model description