Token Classification
Transformers
TensorBoard
Safetensors
xlm-roberta
Generated from Trainer
language-identification
codeswitching
Instructions to use DerivedFunction/polyglot-tagger-v2.2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use DerivedFunction/polyglot-tagger-v2.2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="DerivedFunction/polyglot-tagger-v2.2")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("DerivedFunction/polyglot-tagger-v2.2") model = AutoModelForTokenClassification.from_pretrained("DerivedFunction/polyglot-tagger-v2.2") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -136,7 +136,7 @@ Fine-tuned `xlm-roberta-base` for sentence-level language tagging across 100 lan
|
|
| 136 |
The model predicts BIO-style language tags over tokens, which makes it useful for
|
| 137 |
language identification, code-switch detection, and multilingual document analysis.
|
| 138 |
|
| 139 |
-
|
| 140 |
|
| 141 |
## Model description
|
| 142 |
|
|
|
|
| 136 |
The model predicts BIO-style language tags over tokens, which makes it useful for
|
| 137 |
language identification, code-switch detection, and multilingual document analysis.
|
| 138 |
|
| 139 |
+
> Compared to version 2.1, this version had training data that cleaned up mixed script training rows not in the target language, such as Arabic present in Cyrllic languages.
|
| 140 |
|
| 141 |
## Model description
|
| 142 |
|