Quinten Datalab
commited on
Commit
•
5e80cf1
1
Parent(s):
191d0ac
Update README.md
Browse filesNER results fo QUAERO dataset updated
README.md
CHANGED
@@ -30,9 +30,17 @@ widget:
|
|
30 |
AliBERT: is a pre-trained language model for French biomedical text. It is trained with masked language model like RoBERTa.
|
31 |
|
32 |
Here are the main contributions of our work:
|
33 |
-
|
34 |
-
|
35 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
36 |
|
37 |
The Paper can be found here: https://aclanthology.org/2023.bionlp-1.19/
|
38 |
|
@@ -147,6 +155,43 @@ The model is evaluated on two (CAS and QUAERO) publically available Frech biomed
|
|
147 |
</tr>
|
148 |
</tbody>
|
149 |
</table>
|
150 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
151 |
|
152 |
##AliBERT: A Pre-trained Language Model for French Biomedical Text
|
|
|
30 |
AliBERT: is a pre-trained language model for French biomedical text. It is trained with masked language model like RoBERTa.
|
31 |
|
32 |
Here are the main contributions of our work:
|
33 |
+
<ul>
|
34 |
+
<li>
|
35 |
+
A French biomedical language model, a language-specific and domain-specific PLM, which can be used to represent French biomedical text for different downstream tasks.
|
36 |
+
</li>
|
37 |
+
<li>
|
38 |
+
A normalization of a Unigram sub-word tokenization of French biomedical textual input which improves our vocabulary and overall performance of the models trained.
|
39 |
+
</li>
|
40 |
+
<li>
|
41 |
+
It is a foundation model that achieved state-of-the-art results on French biomedical text.
|
42 |
+
</li>
|
43 |
+
</ul>
|
44 |
|
45 |
The Paper can be found here: https://aclanthology.org/2023.bionlp-1.19/
|
46 |
|
|
|
155 |
</tr>
|
156 |
</tbody>
|
157 |
</table>
|
158 |
+
Table 2: NER performances on CAS dataset
|
159 |
+
|
160 |
+
#### QUAERO dataset
|
161 |
+
|
162 |
+
<table class="tg">
|
163 |
+
<thead>
|
164 |
+
<tr>
|
165 |
+
<th>Models</th>
|
166 |
+
<th class="tg-0lax" colspan="3">CamemBERT</th>
|
167 |
+
<th class="tg-0lax" colspan="3">AliBERT</th>
|
168 |
+
<th class="tg-0lax" colspan="3">DrBERT</th>
|
169 |
+
</tr>
|
170 |
+
</thead>
|
171 |
+
<tbody>
|
172 |
+
<tr>
|
173 |
+
<td>Entity </td> <td> P </td> <td> R </td> <td> F1 </td> <td> P </td> <td> R </td> <td> F1 </td> <td> P </td> <td> R </td> <td> F1 </td>
|
174 |
+
</tr>
|
175 |
+
<tr>
|
176 |
+
<td>Anatomy </td> <td> 0.649 </td> <td> 0.641 </td> <td> 0.645 </td> <td> 0.795 </td> <td> 0.811 </td> <td> 0.803 </td> <td> 0.799 </td> <td> 0.801 </td> <td> 0.800 </td>
|
177 |
+
</tr>
|
178 |
+
<tr>
|
179 |
+
<td>Chemical </td> <td> 0.844 </td> <td> 0.847 </td> <td> 0.846 </td> <td> 0.878 </td> <td> 0.893 </td> <td> 0.885 </td> <td> 0.898 </td> <td> 0.818 </td> <td> 0.856 </td>
|
180 |
+
</tr>
|
181 |
+
<tr>
|
182 |
+
<td>Device </td> <td> 0.000 </td> <td> 0.000 </td> <td> 0.000 </td> <td> 0.506 </td> <td> 0.356 </td> <td> 0.418 </td> <td> 0.549 </td> <td> 0.338 </td> <td> 0.419} </td>
|
183 |
+
</tr>
|
184 |
+
<tr>
|
185 |
+
<td>Disorder </td> <td> 0.772 </td> <td> 0.818 </td> <td> 0.794 </td> <td> 0.857 </td> <td> 0.843 </td> <td> 0.850 </td> <td> 0.883 </td> <td> 0.809 </td> <td> 0.845 </td>
|
186 |
+
</tr>
|
187 |
+
<tr>
|
188 |
+
<td>Procedure </td> <td> 0.880 </td> <td> 0.894 </td> <td> 0.887 </td> <td> 0.969 </td> <td> 0.967 </td> <td> 0.968 </td> <td> 0.944 </td> <td> 0.976 </td> <td> 0.960 </td>
|
189 |
+
</tr>
|
190 |
+
<tr>
|
191 |
+
<td>Macro Avg </td> <td> 0.655 </td> <td> 0.656 </td> <td> 0.655 </td> <td> 0.807 </td> <td> 0.783 </td> <td> 0.793 </td> <td> 0.818 </td> <td> 0.755 </td> <td> 0.782 </td>
|
192 |
+
</tr>
|
193 |
+
</tbody>
|
194 |
+
</table>
|
195 |
+
Table 3: NER performances on QUAERO dataset
|
196 |
|
197 |
##AliBERT: A Pre-trained Language Model for French Biomedical Text
|