|
--- |
|
tags: |
|
- spacy |
|
- token-classification |
|
language: |
|
- fr |
|
model-index: |
|
- name: fr_lexical_death |
|
results: |
|
- task: |
|
name: NER |
|
type: token-classification |
|
metrics: |
|
- name: NER Precision |
|
type: precision |
|
value: 0.8253968254 |
|
- name: NER Recall |
|
type: recall |
|
value: 0.7323943662 |
|
- name: NER F Score |
|
type: f_score |
|
value: 0.776119403 |
|
|
|
widget: |
|
- text: "Vous ne devais pas y aller... vous ne sortirai pas vivantes" |
|
example_title: "Mort implicite" |
|
- text: "Les morts ne parlent pas" |
|
example_title: "Mort explicite 1" |
|
- text: "Les ambulances garées, le cortège des défunts, les cadavres qui sortaient de dessous les décombres" |
|
example_title: "Mort explicite" |
|
|
|
license: agpl-3.0 |
|
|
|
--- |
|
|
|
## Description |
|
|
|
This model was built to compute detect the lexical field of death. It's main purpose was to automate annotation on a specific dataset. |
|
There is no waranty that it will work on any others dataset. |
|
We finetune, the camembert-base model using this code; https://github.com/psycholinguistics2125/train_NER. |
|
|
|
|
|
| Feature | Description | |
|
| --- | --- | |
|
| **Name** | `fr_lexical_death` | |
|
| **Version** | `0.0.1` | |
|
| **spaCy** | `>=3.4.4,<3.5.0` | |
|
| **Default Pipeline** | `transformer`, `ner` | |
|
| **Components** | `transformer`, `ner` | |
|
| **Vectors** | 0 keys, 0 unique vectors (0 dimensions) | |
|
| **Sources** | n/a | |
|
| **License** | agpl-3.0 | |
|
| **Author** | [n/a]() | |
|
|
|
### Label Scheme |
|
|
|
<details> |
|
|
|
<summary>View label scheme (2 labels for 1 components)</summary> |
|
|
|
| Component | Labels | |
|
| --- | --- | |
|
| **`ner`** | `MORT_EXPLICITE`, `MORT_IMPLICITE` | |
|
|
|
</details> |
|
|
|
### Accuracy |
|
|
|
| Type | Score | |
|
| --- | --- | |
|
| `ENTS_F` | 77.61 | |
|
| `ENTS_P` | 82.54 | |
|
| `ENTS_R` | 73.24 | |
|
|
|
|
|
### Training |
|
|
|
We constructed our dataset by manually labeling the documents using Doccano, an open-source tool for collaborative human annotation. |
|
The models were trained using 200-word length sequences, 70% of the data were used for the training, 20% to test and finetune hyperparameters, |
|
and 10% to evaluate the performances of the model. In order to ensure correct performance evaluation, |
|
the evaluation sequences were taken from documents that were not used during the training. |
|
|
|
Tain dataset 147 labels for MORT_EXPLICITE |
|
|
|
Test dataset is 35 labels for MORT_EXPLICITE |
|
|
|
Valid dataset is 18 labels for MORT_EXPLICITE |
|
|
|
|