---
license: mit
language:
- multilingual
tags:
- text-classification
- pytorch
metrics:
- f1-score
extra_gated_fields:
  Name: text
  Country: country
  Institution: text
  E-mail: text
  Use case: text
extra_gated_prompt: Our models are intended for academic use only. If you are not
  affiliated with an academic institution, please provide a rationale for using our
  models.
---
# xlm-roberta-large-pooled-MORES
## Model description
An `xlm-roberta-large` model finetuned on multilingual training data hand-annotated using the following labels:

- **0**: "Anger"
- **1**: "Fear"
- **2**: "Disgust"
- **3**: "Sadness"
- **4**: "Joy"
- **5**: "None of Them"

This model can also be used for sentiment classification with the following conversion:

- **Joy (4)** → Positive
- **None of Them (5)** → Neutral (or None of Them)
- **All Other Labels** → Negative

The training data we used was augmented using artificially generated examples and translated texts. It covers 5 languages (English, German, French, Polish, and Hungarian) with nearly identical shares.

## How to use the model
```python
from transformers import AutoTokenizer, pipeline

tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-large")
pipe = pipeline(model="poltextlab/xlm-roberta-large-pooled-MORES", tokenizer=tokenizer, use_fast=False)

text = "We will place an immediate 6-month halt on the finance driven closure of beds and wards, and set up an independent audit of needs and facilities."
pipe(text)
```

## Model performance
The model was evaluated on language-specific test sets and demonstrated nearly identical performance across all languages:
![Model benchmark (language-specific test)](v3_fixed_f1_scores.png)

### Fine-tuning procedure
This model was fine-tuned with the following key hyperparameters:

- **Number of Training Epochs**: 10
- **Batch Size**: 8
- **Learning Rate**: 5e-06
- **Early Stopping**: enabled with a patience of 2 epochs

## Inference platform
This model is used by the [Babel Machine](https://babel.poltextlab.com), an open-source and free natural language processing tool, designed to simplify and speed up projects for comparative research.  

## Cooperation
Model performance can be significantly improved by extending our training sets. We appreciate every submission of coded corpora (of any domain and language) at poltextlab{at}poltextlab{dot}com or by using the [Babel Machine](https://babel.poltextlab.com).

## Debugging and issues
This architecture uses the `sentencepiece` tokenizer. In order to use the model before `transformers==4.27` you need to install it manually.

If you encounter a `RuntimeError` when loading the model using the `from_pretrained()` method, adding `ignore_mismatched_sizes=True` should solve the issue.