nlpso's picture
Upload README.md with huggingface_hub
545c337
metadata
language: fr
datasets:
  - nlpso/m1_fine_tuning_ocr_ptrn_cmbert_iob2
tag: token-classification
widget:
  - text: "Duflot, loueur de carrosses, r. de Paradis-\P    505\P    Poissonnière, 22."
    example_title: 'Noisy entry #1'
  - text: "Duſour el Besnard, march, de bois à bruler,\P    quai de la Tournelle, 17. etr. des Fossés-\P    SBernard. 11.\P    Dí"
    example_title: 'Noisy entry #2'
  - text: "Dufour (Charles), épicier, r. St-Denis\P    ☞\P    332"
    example_title: 'Ground-truth entry #1'

m1_ind_layers_ocr_ptrn_cmbert_iob2_level_1

Introduction

This model is a model that was fine-tuned from HueyNemud/das22-10-camembert_pretrained for nested NER task on a nested NER Paris trade directories dataset.

Dataset

Abbreviation Entity group (level) Description
O 1 & 2 Outside of a named entity
PER 1 Person or company name
ACT 1 & 2 Person or company professional activity
TITREH 2 Military or civil distinction
DESC 1 Entry full description
TITREP 2 Professionnal reward
SPAT 1 Address
LOC 2 Street name
CARDINAL 2 Street number
FT 2 Geographical feature

Experiment parameter

Load model from the Hugging Face

**Warning 1 ** : this model only recognises level-1 entities of dataset. It has to be used with m1_ind_layers_ocr_ptrn_cmbert_iob2_level_2 to recognise nested entities level-2.

from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("nlpso/m1_ind_layers_ocr_ptrn_cmbert_iob2_level_1")
model = AutoModelForTokenClassification.from_pretrained("nlpso/m1_ind_layers_ocr_ptrn_cmbert_iob2_level_1")