fermaat commited on
Commit
fccbfaa
1 Parent(s): 7fde946

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -9
README.md CHANGED
@@ -7,21 +7,22 @@ tags:
7
  - Inclusive Language
8
  - Text Neutralization
9
  - pytorch
10
- # datasets:
11
- #- {Pending} # Example: common_voice. Use dataset id from https://hf.co/datasets
12
  metrics:
13
  - sacrebleu
14
 
 
 
15
  model-index:
16
- - name: es_nlp_text_neutralizer
17
  results:
18
  - task:
19
  type: Text2Text Generation
20
  name: Neutralization of texts in Spanish
21
- # dataset:
22
- # type: {Pending} # Required. Example: common_voice. Use dataset id from https://hf.co/datasets
23
- # name: {handcrafted dataset} # Optional. Example: Common Voice zh-CN
24
- # args: {es} # Optional. Example: zh-CN
25
  metrics:
26
  - type: sacrebleu # Required. Example: wer
27
  value: 93.8347 # Required. Example: 20.90
@@ -50,8 +51,8 @@ By using gender inclusive models we can help reducing gender bias in a language
50
 
51
 
52
  ## Training and evaluation data
53
-
54
- The data used for the model training has been manually created form a compilation of sources, obtained from a series of guidelines and manuals issued by Spanish Ministry of Health, Social Services and Equality in the matter of the usage of non-sexist language, stipulated in this linked [document](https://www.inmujeres.gob.es/servRecursos/formacion/GuiasLengNoSexista/docs/Guiaslenguajenosexista_.pdf):
55
 
56
  ### Compiled sources
57
 
 
7
  - Inclusive Language
8
  - Text Neutralization
9
  - pytorch
10
+ datasets:
11
+ - hackathon-pln-es/neutral-es
12
  metrics:
13
  - sacrebleu
14
 
15
+
16
+
17
  model-index:
18
+ - name: es_text_neutralizer
19
  results:
20
  - task:
21
  type: Text2Text Generation
22
  name: Neutralization of texts in Spanish
23
+ dataset:
24
+ type: hackathon-pln-es/neutral-es
25
+ name: neutral-es
 
26
  metrics:
27
  - type: sacrebleu # Required. Example: wer
28
  value: 93.8347 # Required. Example: 20.90
 
51
 
52
 
53
  ## Training and evaluation data
54
+ One of the major challenges was to obtain a valuable dataset that would suit our purpose, therefore, the team opted to dedicate a considerable amount of time to build it from a scratch.
55
+ The data used for the model training has been created form a compilation of sources, obtained from a series of guidelines and manuals issued by Spanish Ministry of Health, Social Services and Equality in the matter of the usage of non-sexist language, stipulated in this linked [document:](https://www.inmujeres.gob.es/servRecursos/formacion/GuiasLengNoSexista/docs/Guiaslenguajenosexista_.pdf):
56
 
57
  ### Compiled sources
58