File size: 1,678 Bytes
d3da425
 
33e108a
 
d3da425
 
 
 
 
 
 
 
 
 
 
 
 
 
e570dc1
d3da425
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e570dc1
d3da425
 
 
 
 
 
 
 
 
e570dc1
 
d3da425
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
---
language: fr
datasets:
- nlpso/m2m3_fine_tuning_ocr_cmbert_io
tag: token-classification
widget:
- text: 'Duflot, loueur de carrosses, r. de Paradis-
    505
    Poissonnière, 22.'
  example_title: 'Noisy entry #1'
- text: 'Duſour el Besnard, march, de bois à bruler,
    quai de la Tournelle, 17. etr. des Fossés-
    SBernard. 11.
    Dí'
  example_title: 'Noisy entry #2'
- text: 'Dufour (Charles), épicier, r. St-Denis
    ☞
    332'
  example_title: 'Ground-truth entry #1'
---

# m2_joint_label_ocr_cmbert_io

## Introduction

This model is a fine-tuned verion from [HueyNemud/das22-10-camembert_pretrained](https://huggingface.co/HueyNemud/das22-10-camembert_pretrained) for **nested NER task** on a nested NER Paris trade directories dataset.

## Dataset

Abbreviation|Entity group (level)|Description
-|-|-
O |1 & 2|Outside of a named entity
PER |1|Person or company name
ACT |1 & 2|Person or company professional activity
TITREH |2|Military or civil distinction
DESC |1|Entry full description
TITREP |2|Professionnal reward
SPAT |1|Address
LOC |2|Street name
CARDINAL |2|Street number
FT |2|Geographical feature

## Experiment parameter

* Pretrained-model : [HueyNemud/das22-10-camembert_pretrained](https://huggingface.co/HueyNemud/das22-10-camembert_pretrained)
* Dataset : noisy (Pero OCR)
* Tagging format : IO
* Recognised entities : 'All'

## Load model from the Hugging Face

```python
from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("nlpso/m2_joint_label_ocr_cmbert_io")
model = AutoModelForTokenClassification.from_pretrained("nlpso/m2_joint_label_ocr_cmbert_io")