|
--- |
|
tags: |
|
- spacy |
|
- token-classification |
|
language: |
|
- fr |
|
model-index: |
|
- name: fr_sensations_and_body |
|
results: |
|
- task: |
|
name: NER |
|
type: token-classification |
|
metrics: |
|
- name: NER Precision |
|
type: precision |
|
value: 0.8537117904 |
|
- name: NER Recall |
|
type: recall |
|
value: 0.8555798687 |
|
- name: NER F Score |
|
type: f_score |
|
value: 0.8546448087 |
|
|
|
widget: |
|
- text: "Il y avait du sang partout, les bras et less jambes n'étaient plus aux bons endroits." |
|
example_title: "corps" |
|
- text: "J'étais un peu fatiguée." |
|
example_title: "Sensations physiques" |
|
- text: "Il y avait commme un silence assourdissant. Et là j'ai vu la beauté du lévé de soleil." |
|
example_title: "Perceptions" |
|
--- |
|
|
|
This model was built to compute detect the lexical field of body, physical sensation and perception. |
|
It's main purpose was to automate annotation on a specific dataset. |
|
There is no waranty that it will work on any others dataset. |
|
We finetune, the camembert-base model using this code; https://github.com/psycholinguistics2125/train_NER. |
|
|
|
| Feature | Description | |
|
| --- | --- | |
|
| **Name** | `fr_sensations_and_body` | |
|
| **Version** | `0.0.1` | |
|
| **spaCy** | `>=3.4.4,<3.5.0` | |
|
| **Default Pipeline** | `transformer`, `ner` | |
|
| **Components** | `transformer`, `ner` | |
|
| **Vectors** | 0 keys, 0 unique vectors (0 dimensions) | |
|
| **Sources** | n/a | |
|
| **License** | n/a | |
|
| **Author** | [n/a]() | |
|
|
|
### Label Scheme |
|
|
|
<details> |
|
|
|
<summary>View label scheme (4 labels for 1 components)</summary> |
|
|
|
| Component | Labels | |
|
| --- | --- | |
|
| **`ner`** | `CORPS`, `MOTS_PERCEPTIONS_SENSORIELLES`, `SENSATIONS_PHYSIQUES`, `VERB_PERCEPTIONS_SENSORIELLES` | |
|
|
|
</details> |
|
|
|
### Accuracy |
|
|
|
| Type | Score | |
|
| --- | --- | |
|
| `ENTS_F` | 85.46 | |
|
| `ENTS_P` | 85.37 | |
|
| `ENTS_R` | 85.56 | |
|
|
|
### Training |
|
|
|
We constructed our dataset by manually labeling the documents using Doccano, an open-source tool for collaborative human annotation. |
|
The models were trained using 200-word length sequences, 70% of the data were used for the training, 20% to test and finetune hyperparameters, and 10% to evaluate the performances of the model. |
|
In order to ensure correct performance evaluation, the evaluation sequences were taken from documents that were not used during the training. |
|
|
|
|
|
| label | train | test | valid | |
|
| --- | --- |--- |--- | |
|
| `CORPS`| 523 | 152 | 106 | |
|
| `MOTS_PERCEPTIONS_SENSORIELLES`| 250 | 108 | 82 | |
|
| `SENSATIONS_PHYSIQUES`|91 | 38 | 31| |
|
| `VERB_PERCEPTIONS_SENSORIELLES` |617|162 | 137 | |
|
|