README.md · gilf/french-camembert-postag-model at main

metadata

language: fr
widget:
  - text: >-
      Face à un choc inédit, les mesures mises en place par le gouvernement ont
      permis une protection forte et efficace des ménages

About

The french-camembert-postag-model is a part of speech tagging model for French that was trained on the free-french-treebank dataset available on github. The base tokenizer and model used for training is 'camembert-base'.

Supported Tags

It uses the following tags:

Tag	Category	Extra Info
ADJ	adjectif
ADJWH	adjectif
ADV	adverbe
ADVWH	adverbe
CC	conjonction de coordination
CLO	pronom	obj
CLR	pronom	refl
CLS	pronom	suj
CS	conjonction de subordination
DET	déterminant
DETWH	déterminant
ET	mot étranger
I	interjection
NC	nom commun
NPP	nom propre
P	préposition
P+D	préposition + déterminant
PONCT	signe de ponctuation
PREF	préfixe
PRO	autres pronoms
PROREL	autres pronoms	rel
PROWH	autres pronoms	int
U	?
V	verbe
VIMP	verbe imperatif
VINF	verbe infinitif
VPP	participe passé
VPR	participe présent
VS	subjonctif

More information on the tags can be found here:

http://alpage.inria.fr/statgram/frdep/Publications/crabbecandi-taln2008-final.pdf

Usage

The usage of this model follows the common transformers patterns. Here is a short example of its usage:

from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("gilf/french-camembert-postag-model")
model = AutoModelForTokenClassification.from_pretrained("gilf/french-camembert-postag-model")

from transformers import pipeline

nlp_token_class = pipeline('ner', model=model, tokenizer=tokenizer, grouped_entities=True)

nlp_token_class('Face à un choc inédit, les mesures mises en place par le gouvernement ont permis une protection forte et efficace des ménages')

The lines above would display something like this on a Jupyter notebook:

[{'entity_group': 'NC', 'score': 0.5760144591331482, 'word': '<s>'},
 {'entity_group': 'U', 'score': 0.9946700930595398, 'word': 'Face'},
 {'entity_group': 'P', 'score': 0.999615490436554, 'word': 'à'},
 {'entity_group': 'DET', 'score': 0.9995906352996826, 'word': 'un'},
 {'entity_group': 'NC', 'score': 0.9995531439781189, 'word': 'choc'},
 {'entity_group': 'ADJ', 'score': 0.999183714389801, 'word': 'inédit'},
 {'entity_group': 'P', 'score': 0.3710663616657257, 'word': ','},
 {'entity_group': 'DET', 'score': 0.9995903968811035, 'word': 'les'},
 {'entity_group': 'NC', 'score': 0.9995649456977844, 'word': 'mesures'},
 {'entity_group': 'VPP', 'score': 0.9988670349121094, 'word': 'mises'},
 {'entity_group': 'P', 'score': 0.9996246099472046, 'word': 'en'},
 {'entity_group': 'NC', 'score': 0.9995329976081848, 'word': 'place'},
 {'entity_group': 'P', 'score': 0.9996233582496643, 'word': 'par'},
 {'entity_group': 'DET', 'score': 0.9995935559272766, 'word': 'le'},
 {'entity_group': 'NC', 'score': 0.9995369911193848, 'word': 'gouvernement'},
 {'entity_group': 'V', 'score': 0.9993771314620972, 'word': 'ont'},
 {'entity_group': 'VPP', 'score': 0.9991101026535034, 'word': 'permis'},
 {'entity_group': 'DET', 'score': 0.9995885491371155, 'word': 'une'},
 {'entity_group': 'NC', 'score': 0.9995636343955994, 'word': 'protection'},
 {'entity_group': 'ADJ', 'score': 0.9991781711578369, 'word': 'forte'},
 {'entity_group': 'CC', 'score': 0.9991298317909241, 'word': 'et'},
 {'entity_group': 'ADJ', 'score': 0.9992275238037109, 'word': 'efficace'},
 {'entity_group': 'P+D', 'score': 0.9993300437927246, 'word': 'des'},
 {'entity_group': 'NC', 'score': 0.8353511393070221, 'word': 'ménages</s>'}]