Commit
•
17e46b3
1
Parent(s):
7f1a62f
Update README.md
Browse files
README.md
CHANGED
@@ -12,7 +12,7 @@ metrics:
|
|
12 |
# Web register classification (multilingual model)
|
13 |
|
14 |
A multilingual web register classifier, fine-tuned from XLM-RoBERTa-large.
|
15 |
-
The model is trained with the multilingual CORE corpora across five languages (English, Finnish, French, Swedish, Turkish) to classify documents based on the CORE taxonomy, detailed
|
16 |
The model demonstrates state-of-the-art performance in classifying web registers and achieves good zero-shot performance for additional languages.
|
17 |
It is designed to support the development of open language models and for linguists analyzing register variation.
|
18 |
## Model Details
|
@@ -34,6 +34,8 @@ It is designed to support the development of open language models and for lingui
|
|
34 |
|
35 |
## Register labels and their abbreviations
|
36 |
|
|
|
|
|
37 |
- **MT:** Machine translated or generated
|
38 |
- **LY:** Lyrical
|
39 |
- **SP:** Spoken
|
|
|
12 |
# Web register classification (multilingual model)
|
13 |
|
14 |
A multilingual web register classifier, fine-tuned from XLM-RoBERTa-large.
|
15 |
+
The model is trained with the multilingual CORE corpora across five languages (English, Finnish, French, Swedish, Turkish) to classify documents based on the CORE taxonomy, detailed below.
|
16 |
The model demonstrates state-of-the-art performance in classifying web registers and achieves good zero-shot performance for additional languages.
|
17 |
It is designed to support the development of open language models and for linguists analyzing register variation.
|
18 |
## Model Details
|
|
|
34 |
|
35 |
## Register labels and their abbreviations
|
36 |
|
37 |
+
Below is a list of the register labels predicted by the model. Note that some labels are hierarchical; when a sublabel is predicted, its parent label is also predicted. For a more detailed description, see [here]{https://turkunlp.org/register-annotation-docs/}.
|
38 |
+
|
39 |
- **MT:** Machine translated or generated
|
40 |
- **LY:** Lyrical
|
41 |
- **SP:** Spoken
|