fr_lexical_death / README.md
binbin83's picture
Update README.md
9b710d3
---
tags:
- spacy
- token-classification
language:
- fr
model-index:
- name: fr_lexical_death
results:
- task:
name: NER
type: token-classification
metrics:
- name: NER Precision
type: precision
value: 0.8253968254
- name: NER Recall
type: recall
value: 0.7323943662
- name: NER F Score
type: f_score
value: 0.776119403
widget:
- text: "Vous ne devais pas y aller... vous ne sortirai pas vivantes"
example_title: "Mort implicite"
- text: "Les morts ne parlent pas"
example_title: "Mort explicite 1"
- text: "Les ambulances garées, le cortège des défunts, les cadavres qui sortaient de dessous les décombres"
example_title: "Mort explicite"
license: agpl-3.0
---
## Description
This model was built to compute detect the lexical field of death. It's main purpose was to automate annotation on a specific dataset.
There is no waranty that it will work on any others dataset.
We finetune, the camembert-base model using this code; https://github.com/psycholinguistics2125/train_NER.
| Feature | Description |
| --- | --- |
| **Name** | `fr_lexical_death` |
| **Version** | `0.0.1` |
| **spaCy** | `>=3.4.4,<3.5.0` |
| **Default Pipeline** | `transformer`, `ner` |
| **Components** | `transformer`, `ner` |
| **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
| **Sources** | n/a |
| **License** | agpl-3.0 |
| **Author** | [n/a]() |
### Label Scheme
<details>
<summary>View label scheme (2 labels for 1 components)</summary>
| Component | Labels |
| --- | --- |
| **`ner`** | `MORT_EXPLICITE`, `MORT_IMPLICITE` |
</details>
### Accuracy
| Type | Score |
| --- | --- |
| `ENTS_F` | 77.61 |
| `ENTS_P` | 82.54 |
| `ENTS_R` | 73.24 |
### Training
We constructed our dataset by manually labeling the documents using Doccano, an open-source tool for collaborative human annotation.
The models were trained using 200-word length sequences, 70% of the data were used for the training, 20% to test and finetune hyperparameters,
and 10% to evaluate the performances of the model. In order to ensure correct performance evaluation,
the evaluation sequences were taken from documents that were not used during the training.
Tain dataset 147 labels for MORT_EXPLICITE
Test dataset is 35 labels for MORT_EXPLICITE
Valid dataset is 18 labels for MORT_EXPLICITE