File size: 6,367 Bytes
eb49a75
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
language: es
license: mit

widget:
- text: "y porqué es lo que hay que hacer con los menas y con los adultos también!!!! NO a los inmigrantes ilegales!!!!"
---

### Description
This model is a fine-tuned version of [BETO (spanish bert)](https://huggingface.co/dccuchile/bert-base-spanish-wwm-uncased) that has been trained on the *Datathon Against Racism* dataset (2022)

We performed several experiments that will be described in the upcoming paper "Estimating Ground Truth in a Low-labelled Data Regime:A Study of Racism Detection in Spanish" (NEATClasS 2022)
We applied 6 different methods ground-truth estimations, and for each one we performed 4 epochs of fine-tuning. The result is made of 24 models:

| method	| epoch 1	| epoch 3	| epoch 3	| epoch 4	|
|---	|---	|---	|---	|---	|
| raw-label	| [raw-label-epoch-1](https://huggingface.co/MartinoMensio/racism-models-raw-label-epoch-1)	| [raw-label-epoch-2](https://huggingface.co/MartinoMensio/racism-models-raw-label-epoch-2)	| [raw-label-epoch-3](https://huggingface.co/MartinoMensio/racism-models-raw-label-epoch-3)	| [raw-label-epoch-4](https://huggingface.co/MartinoMensio/racism-models-raw-label-epoch-4)	|
| m-vote-strict	| [m-vote-strict-epoch-1](https://huggingface.co/MartinoMensio/racism-models-m-vote-strict-epoch-1)	| [m-vote-strict-epoch-2](https://huggingface.co/MartinoMensio/racism-models-m-vote-strict-epoch-2)	| [m-vote-strict-epoch-3](https://huggingface.co/MartinoMensio/racism-models-m-vote-strict-epoch-3)	| [m-vote-strict-epoch-4](https://huggingface.co/MartinoMensio/racism-models-m-vote-strict-epoch-4)	|
| m-vote-nonstrict	| [m-vote-nonstrict-epoch-1](https://huggingface.co/MartinoMensio/racism-models-m-vote-nonstrict-epoch-1)	| [m-vote-nonstrict-epoch-2](https://huggingface.co/MartinoMensio/racism-models-m-vote-nonstrict-epoch-2)	| [m-vote-nonstrict-epoch-3](https://huggingface.co/MartinoMensio/racism-models-m-vote-nonstrict-epoch-3)	| [m-vote-nonstrict-epoch-4](https://huggingface.co/MartinoMensio/racism-models-m-vote-nonstrict-epoch-4)	|
| regression-w-m-vote	| [regression-w-m-vote-epoch-1](https://huggingface.co/MartinoMensio/racism-models-regression-w-m-vote-epoch-1)	| [regression-w-m-vote-epoch-2](https://huggingface.co/MartinoMensio/racism-models-regression-w-m-vote-epoch-2)	| [regression-w-m-vote-epoch-3](https://huggingface.co/MartinoMensio/racism-models-regression-w-m-vote-epoch-3)	| [regression-w-m-vote-epoch-4](https://huggingface.co/MartinoMensio/racism-models-regression-w-m-vote-epoch-4)	|
| w-m-vote-strict	| [w-m-vote-strict-epoch-1](https://huggingface.co/MartinoMensio/racism-models-w-m-vote-strict-epoch-1)	| [w-m-vote-strict-epoch-2](https://huggingface.co/MartinoMensio/racism-models-w-m-vote-strict-epoch-2)	| [w-m-vote-strict-epoch-3](https://huggingface.co/MartinoMensio/racism-models-w-m-vote-strict-epoch-3)	| [w-m-vote-strict-epoch-4](https://huggingface.co/MartinoMensio/racism-models-w-m-vote-strict-epoch-4)	|
| w-m-vote-nonstrict	| [w-m-vote-nonstrict-epoch-1](https://huggingface.co/MartinoMensio/racism-models-w-m-vote-nonstrict-epoch-1)	| [w-m-vote-nonstrict-epoch-2](https://huggingface.co/MartinoMensio/racism-models-w-m-vote-nonstrict-epoch-2)	| [w-m-vote-nonstrict-epoch-3](https://huggingface.co/MartinoMensio/racism-models-w-m-vote-nonstrict-epoch-3)	| [w-m-vote-nonstrict-epoch-4](https://huggingface.co/MartinoMensio/racism-models-w-m-vote-nonstrict-epoch-4)	|


This model is `regression-w-m-vote-epoch-2`

### Usage

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
from transformers.pipelines import TextClassificationPipeline

class TextRegressionPipeline(TextClassificationPipeline):
    """
    Class based on the TextClassificationPipeline from transformers.
    The difference is that instead of being based on a classifier, it is based on a regressor.
    You can specify the regression threshold when you call the pipeline or when you instantiate the pipeline.
    """
    def __init__(self, **kwargs):
        """
        Builds a new Pipeline based on regression.
        regression_threshold: Optional(float). If None, the pipeline will simply output the score. If set to a specific value, the output will be both the score and the label.
        """
        self.regression_threshold = kwargs.pop("regression_threshold", None)
        super().__init__(**kwargs)
    def __call__(self, *args, **kwargs):
        """
        You can also specify the regression threshold when you call the pipeline.
        regression_threshold: Optional(float). If None, the pipeline will simply output the score. If set to a specific value, the output will be both the score and the label.
        """
        self.regression_threshold_call = kwargs.pop("regression_threshold", None)
        result = super().__call__(*args, **kwargs)
        return result
    def postprocess(self, model_outputs, function_to_apply=None, return_all_scores=False):
        outputs = model_outputs["logits"][0]
        outputs = outputs.numpy()
        scores = outputs
        score = scores[0]
        regression_threshold = self.regression_threshold
        # override the specific threshold if it is specified in the call
        if self.regression_threshold_call:
            regression_threshold = self.regression_threshold_call
        if regression_threshold:
            return {"label": 'racist' if score > regression_threshold else 'non-racist', "score": score}
        else:
            return {"score": score}



model_name = 'regression-w-m-vote-epoch-2'
tokenizer = AutoTokenizer.from_pretrained("dccuchile/bert-base-spanish-wwm-uncased")
full_model_path = f'MartinoMensio/racism-models-{model_name}'
model = AutoModelForSequenceClassification.from_pretrained(full_model_path)

pipe = TextRegressionPipeline(model=model, tokenizer=tokenizer)

texts = [
    'y porqué es lo que hay que hacer con los menas y con los adultos también!!!! NO a los inmigrantes ilegales!!!!',
    'Es que los judíos controlan el mundo'
]
# just get the score of regression
print(pipe(texts))
# [{'score': 0.8367272}, {'score': 0.4402479}]

# or also specify a threshold to cut racist/non-racist
print(pipe(texts, regression_threshold=0.9))
# [{'label': 'non-racist', 'score': 0.8367272}, {'label': 'non-racist', 'score': 0.4402479}]
```

For more details, see https://github.com/preyero/neatclass22