File size: 7,997 Bytes
6bf4ad7
 
 
 
 
 
 
 
 
 
 
56a498b
 
6bf4ad7
56a498b
 
 
6bf4ad7
 
 
 
56a498b
 
 
 
 
6bf4ad7
 
760ebce
 
6bf4ad7
 
760ebce
6bf4ad7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9345d8b
 
 
 
 
 
 
 
 
 
 
 
72576a2
 
 
 
 
 
 
 
 
 
 
 
56a498b
 
 
 
 
 
 
 
 
 
 
 
e8df03b
 
 
 
 
 
 
 
 
 
 
 
9345d8b
 
 
 
 
 
 
 
 
 
 
 
05aebdd
 
 
 
 
 
 
 
 
 
 
 
56a498b
 
 
 
 
 
 
 
 
 
 
 
6bf4ad7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f7e60bf
 
 
 
 
 
 
 
 
 
 
 
6bf4ad7
56a498b
6bf4ad7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
article = """
<img src="https://www.iic.uam.es/wp-content/uploads/2017/12/IIC_logoP.png">
<img src="https://drive.google.com/uc?export=view&id=1S8v94q39QRCfmVTMvjLCACmhMe9lJQdc">

<p style="text-align: justify;"> This app is developed by <a href="https://www.iic.uam.es/">IIC - Instituto de Ingeniería del Conocimiento</a> as part of the <a href="https://www.eventbrite.com/e/registro-hackathon-de-pln-en-espanol-273014111557">Somos PLN Hackaton 2022.</a> 

The objective of this app is to expand the existing tools regarding long form question answering in Spanish. In fact, multiple novel methods (in Spanish)
have been introduced to build this app. 
The reason for including audio as a possible input and always as an output is because we wanted to make the App much more accessible to people that cannot read or write.
Below you can find all the pieces that form the system.

1. <a href="https://hf.co/IIC/wav2vec2-spanish-multilibrispeech">Speech2Text</a>: For this we finedtuned a multilingual Wav2Vec2, as explained in the attached link. We use this model to process audio questions.
2. <a href="https://hf.co/IIC/dpr-spanish-passage_encoder-allqa-base">Dense Passage Retrieval for Context</a>: Dense Passage Retrieval is a methodology <a href="https://arxiv.org/abs/2004.04906">developed by Facebook</a> which is currently the SoTA for Passage Retrieval,
that is, the task of getting the most relevant passages to answer a given question with. You can find details about how it was trained on the link attached to the name. 
3. <a href="https://hf.co/IIC/dpr-spanish-question_encoder-allqa-base">Dense Passage Retrieval for Question</a>: It is actually part of the same thing as the above. For more details, go to the attached link.
4. <a href="https://hf.co/sentence-transformers/distiluse-base-multilingual-cased-v1">Sentence Encoder Ranker</a>: To rerank the candidate contexts retrieved by dpr for the generative model to see. This also selects the top 5 passages for the model to read, it is the final filter before the generative model.
5. <a href="https://hf.co/IIC/mt5-base-lfqa-es">Generative Long-Form Question Answering Model</a>: For this we used either mT5 (the one attached) or <a href="https://hf.co/IIC/mbart-large-lfqa-es">mBART</a>. This generative model receives the most relevant
passages and uses them to generate an answer to the question. In the attached link there are more details about how we trained it etc.

On the other hand, we uploaded, and in some cases created, datasets in Spanish to be able to build such a system.

1. <a href="https://hf.co/datasets/IIC/spanish_biomedical_crawled_corpus">Spanish Biomedical Crawled Corpus</a>. Used for finding answers to questions about biomedicine. (More info in the link.)
2. <a href="https://hf.co/datasets/IIC/lfqa_spanish">LFQA_Spanish</a>. Used for training the generative model. (More info in the link.)
3. <a href="https://hf.co/datasets/squad_es">SQUADES</a>. Used to train the DPR models. (More info in the link.)
4. <a href="https://hf.co/datasets/IIC/bioasq22_es">BioAsq22-Spanish</a>. Used to train the DPR models. (More info in the link.)
5. <a href="https://hf.co/datasets/PlanTL-GOB-ES/SQAC">SQAC (Spanish Question Answering Corpus)</a>. Used to train the DPR models. (More info in the link.)
</p>
"""
# 1HOzvvgDLFNTK7tYAY1dRzNiLjH41fZks
# 1kvHDFUPPnf1kM5EKlv5Ife2KcZZvva_1
description = """
<a href="https://www.iic.uam.es/">
    <img src="https://drive.google.com/uc?export=view&id=1HOzvvgDLFNTK7tYAY1dRzNiLjH41fZks"  style="max-width: 100%; max-height: 10%; height: 250px; object-fit: fill">
</a>
<h1> BioMedIA: Abstractive Question Answering of BioMedical Domain in Spanish </h1>
Esta aplicación consiste en sistemas de búsqueda del Estado del Arte en Español junto con un modelo generativo entrenado para componer una respuesta a preguntas a partir de una serie de contextos.
"""


examples = [
    [
        "¿Cuáles son los efectos secundarios más ampliamente reportados en el tratamiento de la enfermedad de Crohn?",
        "vacio.flac",
        "vacio.flac",
        60,
        8,
        3,
        1.0,
        250,
        "wav2vec2-iic",
        False,
    ],
    [
        "¿Para qué sirve la tecnología CRISPR?",
        "vacio.flac",
        "vacio.flac",
        60,
        8,
        3,
        1.0,
        250,
        "wav2vec2-iic",
        False,
    ],
    [
        "¿Qué es el lupus?",
        "vacio.flac",
        "vacio.flac",
        60,
        8,
        3,
        1.0,
        250,
        "wav2vec2-iic",
        False,
    ],
    [
        "¿Por qué sentimos ansiedad?",
        "vacio.flac",
        "vacio.flac",
        50,
        8,
        3,
        1.0,
        250,
        "wav2vec2-iic",
        False,
    ],
    [
        "¿Qué es la gripe aviar?",
        "vacio.flac",
        "vacio.flac",
        50,
        8,
        3,
        1.0,
        250,
        "wav2vec2-iic",
        False,
    ],
    [
        "¿Qué es la tecnología CRISPR?",
        "vacio.flac",
        "vacio.flac",
        50,
        8,
        3,
        1.0,
        250,
        "wav2vec2-iic",
        False,
    ],
    [
        "¿Cómo se genera la apendicitis?",
        "vacio.flac",
        "vacio.flac",
        50,
        8,
        3,
        1.0,
        250,
        "wav2vec2-iic",
        False,
    ],
    [
        "¿Qué es la mesoterapia?",
        "vacio.flac",
        "vacio.flac",
        50,
        8,
        3,
        1.0,
        250,
        "wav2vec2-iic",
        False,
    ],
    [
        "¿Qué alternativas al Paracetamol existen para el dolor de cabeza?",
        "vacio.flac",
        "vacio.flac",
        80,
        8,
        3,
        1.0,
        250,
        "wav2vec2-iic",
        False
    ],
    [
        "¿Cuáles son los principales tipos de disartria del trastorno del habla motor?",
        "vacio.flac",
        "vacio.flac",
        50,
        8,
        3,
        1.0,
        250,
        "wav2vec2-iic",
        False
    ],
    [
        "¿Es la esclerosis tuberosa una enfermedad genética?",
        "vacio.flac",
        "vacio.flac",
        50,
        8,
        3,
        1.0,
        250,
        "wav2vec2-iic",
        False
    ],
    [
        "¿Cuál es la función de la proteína Mis18?",
        "vacio.flac",
        "vacio.flac",
        50,
        8,
        3,
        1.0,
        250,
        "wav2vec2-iic",
        False
    ],
    [
        "¿Cuáles son las principales causas de muerte?",
        "vacio.flac",
        "vacio.flac",
        50,
        8,
        3,
        1.0,
        250,
        "wav2vec2-iic",
        False
    ],
    [
        "¿Qué deficiencia es la causa del síndrome de piernas inquietas?",
        "vacio.flac",
        "vacio.flac",
        50,
        8,
        3,
        1.0,
        250,
        "wav2vec2-iic",
        False
    ],
    [
        "¿Cuál es la función del 6SRNA en las bacterias?",
        "vacio.flac",
        "vacio.flac",
        60,
        8,
        3,
        1.0,
        250,
        "wav2vec2-iic",
        False,
    ],
    [
        "¿Por qué los humanos desarrollamos diabetes?",
        "vacio.flac",
        "vacio.flac",
        50,
        10,
        3,
        1.0,
        250,
        "wav2vec2-iic",
        False,
    ],
    [
        "¿Qué factores de riesgo aumentan la probabilidad de sufrir un ataque al corazón?",
        "vacio.flac",
        "vacio.flac",
        80,
        8,
        3,
        1.0,
        250,
        "wav2vec2-iic",
        False
    ],
    [
        "¿Cómo funcionan las vacunas?",
        "vacio.flac",
        "vacio.flac",
        90,
        8,
        3,
        1.0,
        250,
        "wav2vec2-iic",
        False
    ],
    [
        "¿Tienen conciencia los animales?",
        "vacio.flac",
        "vacio.flac",
        70,
        8,
        3,
        1.0,
        250,
        "wav2vec2-iic",
        False
    ],
]