Create README.md

eb49a75 about 2 years ago

No virus

6.37 kB

	---
	language: es
	license: mit

	widget:
	- text: "y porqué es lo que hay que hacer con los menas y con los adultos también!!!! NO a los inmigrantes ilegales!!!!"
	---

	### Description
	This model is a fine-tuned version of [BETO (spanish bert)](https://huggingface.co/dccuchile/bert-base-spanish-wwm-uncased) that has been trained on the Datathon Against Racism dataset (2022)

	We performed several experiments that will be described in the upcoming paper "Estimating Ground Truth in a Low-labelled Data Regime:A Study of Racism Detection in Spanish" (NEATClasS 2022)
	We applied 6 different methods ground-truth estimations, and for each one we performed 4 epochs of fine-tuning. The result is made of 24 models:

	\| method \| epoch 1 \| epoch 3 \| epoch 3 \| epoch 4 \|
	\|--- \|--- \|--- \|--- \|--- \|
	\| raw-label \| [raw-label-epoch-1](https://huggingface.co/MartinoMensio/racism-models-raw-label-epoch-1) \| [raw-label-epoch-2](https://huggingface.co/MartinoMensio/racism-models-raw-label-epoch-2) \| [raw-label-epoch-3](https://huggingface.co/MartinoMensio/racism-models-raw-label-epoch-3) \| [raw-label-epoch-4](https://huggingface.co/MartinoMensio/racism-models-raw-label-epoch-4) \|
	\| m-vote-strict \| [m-vote-strict-epoch-1](https://huggingface.co/MartinoMensio/racism-models-m-vote-strict-epoch-1) \| [m-vote-strict-epoch-2](https://huggingface.co/MartinoMensio/racism-models-m-vote-strict-epoch-2) \| [m-vote-strict-epoch-3](https://huggingface.co/MartinoMensio/racism-models-m-vote-strict-epoch-3) \| [m-vote-strict-epoch-4](https://huggingface.co/MartinoMensio/racism-models-m-vote-strict-epoch-4) \|
	\| m-vote-nonstrict \| [m-vote-nonstrict-epoch-1](https://huggingface.co/MartinoMensio/racism-models-m-vote-nonstrict-epoch-1) \| [m-vote-nonstrict-epoch-2](https://huggingface.co/MartinoMensio/racism-models-m-vote-nonstrict-epoch-2) \| [m-vote-nonstrict-epoch-3](https://huggingface.co/MartinoMensio/racism-models-m-vote-nonstrict-epoch-3) \| [m-vote-nonstrict-epoch-4](https://huggingface.co/MartinoMensio/racism-models-m-vote-nonstrict-epoch-4) \|
	\| regression-w-m-vote \| [regression-w-m-vote-epoch-1](https://huggingface.co/MartinoMensio/racism-models-regression-w-m-vote-epoch-1) \| [regression-w-m-vote-epoch-2](https://huggingface.co/MartinoMensio/racism-models-regression-w-m-vote-epoch-2) \| [regression-w-m-vote-epoch-3](https://huggingface.co/MartinoMensio/racism-models-regression-w-m-vote-epoch-3) \| [regression-w-m-vote-epoch-4](https://huggingface.co/MartinoMensio/racism-models-regression-w-m-vote-epoch-4) \|
	\| w-m-vote-strict \| [w-m-vote-strict-epoch-1](https://huggingface.co/MartinoMensio/racism-models-w-m-vote-strict-epoch-1) \| [w-m-vote-strict-epoch-2](https://huggingface.co/MartinoMensio/racism-models-w-m-vote-strict-epoch-2) \| [w-m-vote-strict-epoch-3](https://huggingface.co/MartinoMensio/racism-models-w-m-vote-strict-epoch-3) \| [w-m-vote-strict-epoch-4](https://huggingface.co/MartinoMensio/racism-models-w-m-vote-strict-epoch-4) \|
	\| w-m-vote-nonstrict \| [w-m-vote-nonstrict-epoch-1](https://huggingface.co/MartinoMensio/racism-models-w-m-vote-nonstrict-epoch-1) \| [w-m-vote-nonstrict-epoch-2](https://huggingface.co/MartinoMensio/racism-models-w-m-vote-nonstrict-epoch-2) \| [w-m-vote-nonstrict-epoch-3](https://huggingface.co/MartinoMensio/racism-models-w-m-vote-nonstrict-epoch-3) \| [w-m-vote-nonstrict-epoch-4](https://huggingface.co/MartinoMensio/racism-models-w-m-vote-nonstrict-epoch-4) \|


	This model is `regression-w-m-vote-epoch-2`

	### Usage

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
	from transformers.pipelines import TextClassificationPipeline

	class TextRegressionPipeline(TextClassificationPipeline):
	"""
	Class based on the TextClassificationPipeline from transformers.
	The difference is that instead of being based on a classifier, it is based on a regressor.
	You can specify the regression threshold when you call the pipeline or when you instantiate the pipeline.
	"""
	def __init__(self, **kwargs):
	"""
	Builds a new Pipeline based on regression.
	regression_threshold: Optional(float). If None, the pipeline will simply output the score. If set to a specific value, the output will be both the score and the label.
	"""
	self.regression_threshold = kwargs.pop("regression_threshold", None)
	super().__init__(**kwargs)
	def __call__(self, args, *kwargs):
	"""
	You can also specify the regression threshold when you call the pipeline.
	regression_threshold: Optional(float). If None, the pipeline will simply output the score. If set to a specific value, the output will be both the score and the label.
	"""
	self.regression_threshold_call = kwargs.pop("regression_threshold", None)
	result = super().__call__(args, *kwargs)
	return result
	def postprocess(self, model_outputs, function_to_apply=None, return_all_scores=False):
	outputs = model_outputs["logits"][0]
	outputs = outputs.numpy()
	scores = outputs
	score = scores[0]
	regression_threshold = self.regression_threshold
	# override the specific threshold if it is specified in the call
	if self.regression_threshold_call:
	regression_threshold = self.regression_threshold_call
	if regression_threshold:
	return {"label": 'racist' if score > regression_threshold else 'non-racist', "score": score}
	else:
	return {"score": score}



	model_name = 'regression-w-m-vote-epoch-2'
	tokenizer = AutoTokenizer.from_pretrained("dccuchile/bert-base-spanish-wwm-uncased")
	full_model_path = f'MartinoMensio/racism-models-{model_name}'
	model = AutoModelForSequenceClassification.from_pretrained(full_model_path)

	pipe = TextRegressionPipeline(model=model, tokenizer=tokenizer)

	texts = [
	'y porqué es lo que hay que hacer con los menas y con los adultos también!!!! NO a los inmigrantes ilegales!!!!',
	'Es que los judíos controlan el mundo'
	]
	# just get the score of regression
	print(pipe(texts))
	# [{'score': 0.8367272}, {'score': 0.4402479}]

	# or also specify a threshold to cut racist/non-racist
	print(pipe(texts, regression_threshold=0.9))
	# [{'label': 'non-racist', 'score': 0.8367272}, {'label': 'non-racist', 'score': 0.4402479}]
	```

	For more details, see https://github.com/preyero/neatclass22