README.md · binbin83/fr_sensations_and

metadata

tags:
  - spacy
  - token-classification
language:
  - fr
model-index:
  - name: fr_sensations_and_body
    results:
      - task:
          name: NER
          type: token-classification
        metrics:
          - name: NER Precision
            type: precision
            value: 0.8537117904
          - name: NER Recall
            type: recall
            value: 0.8555798687
          - name: NER F Score
            type: f_score
            value: 0.8546448087
widget:
  - text: >-
      Il y avait du sang partout, les bras et less jambes n'étaient plus aux
      bons endroits.
    example_title: corps
  - text: J'étais un peu fatiguée.
    example_title: Sensations physiques
  - text: >-
      Il y avait commme un silence assourdissant. Et là j'ai vu la beauté du
      lévé de soleil.
    example_title: Perceptions

This model was built to compute detect the lexical field of body, physical sensation and perception. It's main purpose was to automate annotation on a specific dataset. There is no waranty that it will work on any others dataset. We finetune, the camembert-base model using this code; https://github.com/psycholinguistics2125/train_NER.

Feature	Description
Name	`fr_sensations_and_body`
Version	`0.0.1`
spaCy	`>=3.4.4,<3.5.0`
Default Pipeline	`transformer`, `ner`
Components	`transformer`, `ner`
Vectors	0 keys, 0 unique vectors (0 dimensions)
Sources	n/a
License	n/a
Author	n/a

Label Scheme

View label scheme (4 labels for 1 components)

Component	Labels
`ner`	`CORPS`, `MOTS_PERCEPTIONS_SENSORIELLES`, `SENSATIONS_PHYSIQUES`, `VERB_PERCEPTIONS_SENSORIELLES`

Accuracy

Type	Score
`ENTS_F`	85.46
`ENTS_P`	85.37
`ENTS_R`	85.56

Training

We constructed our dataset by manually labeling the documents using Doccano, an open-source tool for collaborative human annotation. The models were trained using 200-word length sequences, 70% of the data were used for the training, 20% to test and finetune hyperparameters, and 10% to evaluate the performances of the model. In order to ensure correct performance evaluation, the evaluation sequences were taken from documents that were not used during the training.

label	train	test	valid
`CORPS`	523	152	106
`MOTS_PERCEPTIONS_SENSORIELLES`	250	108	82
`SENSATIONS_PHYSIQUES`	91	38	31
`VERB_PERCEPTIONS_SENSORIELLES`	617	162	137