MartinoMensio
/

racism-models-regression-w-m-vote-epoch-2

+---
+language: es
+license: mit
+widget:
+- text: "y porqué es lo que hay que hacer con los menas y con los adultos también!!!! NO a los inmigrantes ilegales!!!!"
+---
+### Description
+This model is a fine-tuned version of [BETO (spanish bert)](https://huggingface.co/dccuchile/bert-base-spanish-wwm-uncased) that has been trained on the *Datathon Against Racism* dataset (2022)
+We performed several experiments that will be described in the upcoming paper "Estimating Ground Truth in a Low-labelled Data Regime:A Study of Racism Detection in Spanish" (NEATClasS 2022)
+We applied 6 different methods ground-truth estimations, and for each one we performed 4 epochs of fine-tuning. The result is made of 24 models:
+| method	| epoch 1	| epoch 3	| epoch 3	| epoch 4	|
+|---	|---	|---	|---	|---	|
+| raw-label	| [raw-label-epoch-1](https://huggingface.co/MartinoMensio/racism-models-raw-label-epoch-1)	| [raw-label-epoch-2](https://huggingface.co/MartinoMensio/racism-models-raw-label-epoch-2)	| [raw-label-epoch-3](https://huggingface.co/MartinoMensio/racism-models-raw-label-epoch-3)	| [raw-label-epoch-4](https://huggingface.co/MartinoMensio/racism-models-raw-label-epoch-4)	|
+| m-vote-strict	| [m-vote-strict-epoch-1](https://huggingface.co/MartinoMensio/racism-models-m-vote-strict-epoch-1)	| [m-vote-strict-epoch-2](https://huggingface.co/MartinoMensio/racism-models-m-vote-strict-epoch-2)	| [m-vote-strict-epoch-3](https://huggingface.co/MartinoMensio/racism-models-m-vote-strict-epoch-3)	| [m-vote-strict-epoch-4](https://huggingface.co/MartinoMensio/racism-models-m-vote-strict-epoch-4)	|
+| m-vote-nonstrict	| [m-vote-nonstrict-epoch-1](https://huggingface.co/MartinoMensio/racism-models-m-vote-nonstrict-epoch-1)	| [m-vote-nonstrict-epoch-2](https://huggingface.co/MartinoMensio/racism-models-m-vote-nonstrict-epoch-2)	| [m-vote-nonstrict-epoch-3](https://huggingface.co/MartinoMensio/racism-models-m-vote-nonstrict-epoch-3)	| [m-vote-nonstrict-epoch-4](https://huggingface.co/MartinoMensio/racism-models-m-vote-nonstrict-epoch-4)	|
+| regression-w-m-vote	| [regression-w-m-vote-epoch-1](https://huggingface.co/MartinoMensio/racism-models-regression-w-m-vote-epoch-1)	| [regression-w-m-vote-epoch-2](https://huggingface.co/MartinoMensio/racism-models-regression-w-m-vote-epoch-2)	| [regression-w-m-vote-epoch-3](https://huggingface.co/MartinoMensio/racism-models-regression-w-m-vote-epoch-3)	| [regression-w-m-vote-epoch-4](https://huggingface.co/MartinoMensio/racism-models-regression-w-m-vote-epoch-4)	|
+| w-m-vote-strict	| [w-m-vote-strict-epoch-1](https://huggingface.co/MartinoMensio/racism-models-w-m-vote-strict-epoch-1)	| [w-m-vote-strict-epoch-2](https://huggingface.co/MartinoMensio/racism-models-w-m-vote-strict-epoch-2)	| [w-m-vote-strict-epoch-3](https://huggingface.co/MartinoMensio/racism-models-w-m-vote-strict-epoch-3)	| [w-m-vote-strict-epoch-4](https://huggingface.co/MartinoMensio/racism-models-w-m-vote-strict-epoch-4)	|
+| w-m-vote-nonstrict	| [w-m-vote-nonstrict-epoch-1](https://huggingface.co/MartinoMensio/racism-models-w-m-vote-nonstrict-epoch-1)	| [w-m-vote-nonstrict-epoch-2](https://huggingface.co/MartinoMensio/racism-models-w-m-vote-nonstrict-epoch-2)	| [w-m-vote-nonstrict-epoch-3](https://huggingface.co/MartinoMensio/racism-models-w-m-vote-nonstrict-epoch-3)	| [w-m-vote-nonstrict-epoch-4](https://huggingface.co/MartinoMensio/racism-models-w-m-vote-nonstrict-epoch-4)	|
+This model is `regression-w-m-vote-epoch-2`
+### Usage
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
+from transformers.pipelines import TextClassificationPipeline
+class TextRegressionPipeline(TextClassificationPipeline):
+    """
+    Class based on the TextClassificationPipeline from transformers.
+    The difference is that instead of being based on a classifier, it is based on a regressor.
+    You can specify the regression threshold when you call the pipeline or when you instantiate the pipeline.
+    """
+    def __init__(self, **kwargs):
+        """
+        Builds a new Pipeline based on regression.
+        regression_threshold: Optional(float). If None, the pipeline will simply output the score. If set to a specific value, the output will be both the score and the label.
+        """
+        self.regression_threshold = kwargs.pop("regression_threshold", None)
+        super().__init__(**kwargs)
+    def __call__(self, *args, **kwargs):
+        """
+        You can also specify the regression threshold when you call the pipeline.
+        regression_threshold: Optional(float). If None, the pipeline will simply output the score. If set to a specific value, the output will be both the score and the label.
+        """
+        self.regression_threshold_call = kwargs.pop("regression_threshold", None)
+        result = super().__call__(*args, **kwargs)
+        return result
+    def postprocess(self, model_outputs, function_to_apply=None, return_all_scores=False):
+        outputs = model_outputs["logits"][0]
+        outputs = outputs.numpy()
+        scores = outputs
+        score = scores[0]
+        regression_threshold = self.regression_threshold
+        # override the specific threshold if it is specified in the call
+        if self.regression_threshold_call:
+            regression_threshold = self.regression_threshold_call
+        if regression_threshold:
+            return {"label": 'racist' if score > regression_threshold else 'non-racist', "score": score}
+        else:
+            return {"score": score}
+model_name = 'regression-w-m-vote-epoch-2'
+tokenizer = AutoTokenizer.from_pretrained("dccuchile/bert-base-spanish-wwm-uncased")
+full_model_path = f'MartinoMensio/racism-models-{model_name}'
+model = AutoModelForSequenceClassification.from_pretrained(full_model_path)
+pipe = TextRegressionPipeline(model=model, tokenizer=tokenizer)
+texts = [
+    'y porqué es lo que hay que hacer con los menas y con los adultos también!!!! NO a los inmigrantes ilegales!!!!',
+    'Es que los judíos controlan el mundo'
+]
+# just get the score of regression
+print(pipe(texts))
+# [{'score': 0.8367272}, {'score': 0.4402479}]
+# or also specify a threshold to cut racist/non-racist
+print(pipe(texts, regression_threshold=0.9))
+# [{'label': 'non-racist', 'score': 0.8367272}, {'label': 'non-racist', 'score': 0.4402479}]
+```
+For more details, see https://github.com/preyero/neatclass22