Cyrile's picture
Create README.md
e1762a5
metadata
language: fr
license: mit
datasets:
  - Jean-Baptiste/wikiner_fr
widget:
  - text: Boulanger, habitant à Boulanger, a acheté une télé à Boulanger.

DistilCamemBERT-NER

We present DistilCamemBERT-NER which is DistilCamemBERT fine tuned for the NER (Named Entity Recognition) task for the French language. The work is inspired by Jean-Baptiste/camembert-ner based on the CamemBERT model. The problem of the modelizations based on CamemBERT is at the scaling moment (for the production phase for example). Indeed, inference cost can be a technological issue. To counteract this effect, we propose this modelization which divides the inference time by 2 with the same consumption power thanks to DistilCamemBER.

Dataset

The dataset used is wikiner_fr which represents ~170k sentences labelized in 5 categories : * I-PER: personality ; * I-LOC: location ; * I-ORG: organization ; * I-MISC: Miscellaneous entities ; * O: background (Other). Evaluation results

class precision (%) recall (%) f1 (%) support
global 98.35 98.36 98.35 492'243
I-PER 96.22 97.41 96.81 27'842
I-LOC 93.93 93.50 93.72 31'431
I-ORG 85.13 87.08 86.10 7'662
I-MISC 88.55 81.84 85.06 13'553
O 99.40 99.55 99.47 411'755

How to use DistilCamemBERT-NER

from transformers import pipeline

ner = pipeline('ner', model=cmarkea/distilcamembert-base-ner, tokenizer=cmarkea/distilcamembert-base-ner, aggregation_strategy="simple")
result = ner("Le Crédit Mutuel Arkéa est une banque Francaise et le CMB est une banque de Bretagne. C'est sous la présidence de Louis Lichou, dans les années 1980 que différentes filiales sont créées au sein du CMB et forme les principales filiales du groupe qui existent encore aujourd'hui (Federal Finance, Suravenir, Financo, etc.).")
# result
# [{'entity_group': 'ORG',
#  'score': 0.9882848,
#  'word': 'Crédit Mutuel Arkéa',
#  'start': 3,
#  'end': 22},
# {'entity_group': 'LOC',
#  'score': 0.94114804,
#  'word': 'Francaise',
#  'start': 38,
#  'end': 47},
# {'entity_group': 'ORG',
#  'score': 0.8854897,
#  'word': 'CMB',
#  'start': 54,
#  'end': 57},
# {'entity_group': 'LOC',
#  'score': 0.9873087,
#  'word': 'Bretagne',
#  'start': 76,
#  'end': 84},
# {'entity_group': 'PER',
#  'score': 0.9989073,
#  'word': 'Louis Lichou',
#  'start': 114,
#  'end': 126},
# {'entity_group': 'ORG',
#  'score': 0.89991987,
#  'word': 'CMB',
#  'start': 197,
#  'end': 200},
# {'entity_group': 'ORG',
#  'score': 0.9965075,
#  'word': 'Federal Finance',
#  'start': 278,
#  'end': 293},
# {'entity_group': 'ORG',
#  'score': 0.99657035,
#  'word': 'Suravenir',
#  'start': 295,
#  'end': 304},
# {'entity_group': 'ORG',
#  'score': 0.9965148,
#  'word': 'Financo',
#  'start': 306,
#  'end': 313}]