metadata
language: fr
datasets:
- nlpso/m2m3_fine_tuning_ocr_ptrn_cmbert_io
tag: token-classification
widget:
- text: "Duflot, loueur de carrosses, r. de Paradis-\P 505\P Poissonnière, 22."
example_title: 'Noisy entry #1'
- text: "Duſour el Besnard, march, de bois à bruler,\P quai de la Tournelle, 17. etr. des Fossés-\P SBernard. 11.\P Dí"
example_title: 'Noisy entry #2'
- text: "Dufour (Charles), épicier, r. St-Denis\P ☞\P 332"
example_title: 'Ground-truth entry #1'
m3_hierarchical_ner_ocr_ptrn_cmbert_io
Introduction
This model is a fine-tuned verion from HueyNemud/das22-10-camembert_pretrained for nested NER task on a nested NER Paris trade directories dataset.
Dataset
Abbreviation | Entity group (level) | Description |
---|---|---|
O | 1 & 2 | Outside of a named entity |
PER | 1 | Person or company name |
ACT | 1 & 2 | Person or company professional activity |
TITREH | 2 | Military or civil distinction |
DESC | 1 | Entry full description |
TITREP | 2 | Professionnal reward |
SPAT | 1 | Address |
LOC | 2 | Street name |
CARDINAL | 2 | Street number |
FT | 2 | Geographical feature |
Experiment parameter
- Pretrained-model : HueyNemud/das22-10-camembert_pretrained
- Dataset : noisy (Pero OCR)
- Tagging format : IO
- Recognised entities : 'All'
Load model from the Hugging Face
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("nlpso/m3_hierarchical_ner_ocr_ptrn_cmbert_io")
model = AutoModelForTokenClassification.from_pretrained("nlpso/m3_hierarchical_ner_ocr_ptrn_cmbert_io")