File size: 1,182 Bytes
0c1dd10
 
0ef570c
 
 
 
 
0c1dd10
0ef570c
 
 
efd1d6c
 
0ef570c
 
efd1d6c
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
---
license: mit
language:
- fr
library_name: flair
tags:
- legal
---

This is a version of the flair/ner-french model fine-tuned with a corpora of 127 case reports from the European Court of Human Rights (ECHR) in French that were built and annotated for anonymization as part of the work presented in the Master's thesis "Automatic anonymization of legal texts from the European Court of Human Rights: building four corpora of case reports in French and Spanish language for anonymization".

The annotation was carried out by projecting the annotations of the English corpus built by Pilán et al. (2022).

It predicts 8 tags: DATETIME, CODE, PER, DEM, MISC, ORG, LOC, QUANTITY.

The corpus and the code used for fine-tuning this model are available on GitHub: https://github.com/msierrofer/automatic-anonymization-ECHR-French-Spanish/tree/full-corpora-(127-texts).

References

Pilán, I., Lison, P., Ovrelid, L., Papadopoulou, A., Sánchez, D. & Batet, M. (2022). The Text Anonymization Benchmark (TAB): A Dedicated Corpus and Evaluation Framework for Text Anonymization. In Computational Linguistics, 48(4), pp. 1053–1101. Cambridge, MA: MIT Press. doi: 10.1162/coli_a_00458.