This model was built to compute detect the lexical field of body, physical sensation and perception.
It's main purpose was to automate annotation on a specific dataset.
There is no waranty that it will work on any others dataset.
We finetune, the camembert-base model using this code; https://github.com/psycholinguistics2125/train_NER.
Feature |
Description |
Name |
fr_sensations_and_body |
Version |
0.0.1 |
spaCy |
>=3.4.4,<3.5.0 |
Default Pipeline |
transformer , ner |
Components |
transformer , ner |
Vectors |
0 keys, 0 unique vectors (0 dimensions) |
Sources |
n/a |
License |
n/a |
Author |
n/a |
Label Scheme
View label scheme (4 labels for 1 components)
Component |
Labels |
ner |
CORPS , MOTS_PERCEPTIONS_SENSORIELLES , SENSATIONS_PHYSIQUES , VERB_PERCEPTIONS_SENSORIELLES |
Accuracy
Type |
Score |
ENTS_F |
85.46 |
ENTS_P |
85.37 |
ENTS_R |
85.56 |
Training
We constructed our dataset by manually labeling the documents using Doccano, an open-source tool for collaborative human annotation.
The models were trained using 200-word length sequences, 70% of the data were used for the training, 20% to test and finetune hyperparameters, and 10% to evaluate the performances of the model.
In order to ensure correct performance evaluation, the evaluation sequences were taken from documents that were not used during the training.
label |
train |
test |
valid |
CORPS |
523 |
152 |
106 |
MOTS_PERCEPTIONS_SENSORIELLES |
250 |
108 |
82 |
SENSATIONS_PHYSIQUES |
91 |
38 |
31 |
VERB_PERCEPTIONS_SENSORIELLES |
617 |
162 |
137 |