somosnlp-hackathon-2022
/

es_text_neutralizer

Text2Text Generation

Text2Text Generation

Inclusive Language

Text Neutralization

Inference Endpoints

text-generation-inference

Model card Files Files and versions Community

es_text_neutralizer / README.md

fermaat's picture

Update README.md

d3f038e over 2 years ago

|

raw history blame

3.07 kB

	---
	language:
	- es
	license: apache-2.0
	tags:
	- Text2Text Generation
	- Inclusive Language
	- Text Neutralization
	- pytorch
	# datasets:
	#- {Pending} # Example: common_voice. Use dataset id from https://hf.co/datasets
	metrics:
	- sacrebleu

	model-index:
	- name: es_nlp_text_neutralizer
	results:
	- task:
	type: Text2Text Generation
	name: Neutralization of texts in Spanish
	# dataset:
	# type: {Pending} # Required. Example: common_voice. Use dataset id from https://hf.co/datasets
	# name: {handcrafted dataset} # Optional. Example: Common Voice zh-CN
	# args: {es} # Optional. Example: zh-CN
	metrics:
	- type: sacrebleu # Required. Example: wer
	value: 93.8347 # Required. Example: 20.90
	name: sacrebleu # Optional. Example: Test WER
	- type: bertscore # Required. Example: wer
	value: 0.99
	name: BertScoreF1 # Optional. Example: Test WER
	- type: DiffBleu # Required. Example: wer
	value: 0.38
	name: DiffBleu # Optional. Example: Test WER
	---
	## Model objective

	TBF

	## Model specs

	This model is a fine-tuned version of [spanish-t5-small](https://huggingface.co/flax-community/spanish-t5-small) on the data described below.
	It achieves the following results on the evaluation set:
	- 'eval_bleu': 93.8347,
	- 'eval_f1': 0.9904,

	## Training procedure
	### Training hyperparameters
	The following hyperparameters were used during training:
	- learning_rate: 1e-04
	- train_batch_size: 32
	- seed: 42
	- num_epochs: 10
	- weight_decay: 0,01

	## Training and evaluation data

	TBF

	## Metrics

	For training, we used both Blue (sacrebleu implementation in HF) and BertScore. The first one, a standard in Machine Translation processes, has been added for ensuring robustness of the newly generated data, while the second one is kept for keeping the expected semantic similarity.

	However, given the actual use case, we expect generated segments to be very close to input segments and to label segments in training. As an example, we can take the following:

	inputSegment = 'De acuerdo con las informaciones anteriores , las alumnas se han quejado de la actitud de los profesores en los exámenes finales. Los representantes estudiantiles son los alumnos Juanju y Javi.'
	expectedOutput (label) = 'De acuerdo con las informaciones anteriores, el alumnado se ha quejado de la actitud del profesorado en los exámenes finales. Los representantes estudiantiles son los alumnos Juanju y Javi.'
	actualOutput = 'De acuerdo con las informaciones anteriores, el alumnado se ha quejado de la actitud del profesorado en los exámenes finales. Los representantes estudiantiles son el alumnado Juanju y Javi.'

	As you can see, segments are pretty similar. So, instead of measuring Bleu or BertScore here, we propose an alternate metric that would be DiffBleu:

	$$DiffBleu = BLEU(actualOutput - inputSegment, labels - inputSegment)$$

	Where the minuses as in set notation. This way, we also evaluate DiffBleu after the model has been trained.


	## Usage example




	Enjoy!