File size: 2,361 Bytes

ee359ee
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4b7ceac
ee359ee
 
4b7ceac
ee359ee
 
4b7ceac
62e814a
7db14b6
99a6eb8
 
 
 
625b9f3
99a6eb8
62e814a
 
 
ee359ee
7db14b6
 
 
 
99a6eb8
 
7db14b6
 
4b7ceac
 
 
 
 
 
 
 
 
62e814a
4b7ceac
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9b710d3
7db14b6

---
tags:
- spacy
- token-classification
language:
- fr
model-index:
- name: fr_lexical_death
  results:
  - task:
      name: NER
      type: token-classification
    metrics:
    - name: NER Precision
      type: precision
      value: 0.8253968254
    - name: NER Recall
      type: recall
      value: 0.7323943662
    - name: NER F Score
      type: f_score
      value: 0.776119403

widget:
- text: "Vous ne devais pas y aller... vous ne sortirai pas vivantes"
  example_title: "Mort implicite"
- text: "Les morts ne parlent pas"
  example_title: "Mort explicite 1"
- text: "Les ambulances garées, le cortège des défunts, les cadavres qui sortaient de dessous les décombres"
  example_title: "Mort explicite"
  
license: agpl-3.0

---

## Description

This model was built to compute detect the lexical field of death. It's main purpose was to automate annotation on a specific dataset. 
There is no waranty that it  will work on any others dataset. 
We finetune, the camembert-base model using this code; https://github.com/psycholinguistics2125/train_NER.


| Feature | Description |
| --- | --- |
| **Name** | `fr_lexical_death` |
| **Version** | `0.0.1` |
| **spaCy** | `>=3.4.4,<3.5.0` |
| **Default Pipeline** | `transformer`, `ner` |
| **Components** | `transformer`, `ner` |
| **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
| **Sources** | n/a |
| **License** | agpl-3.0 |
| **Author** | [n/a]() |

### Label Scheme

<details>

<summary>View label scheme (2 labels for 1 components)</summary>

| Component | Labels |
| --- | --- |
| **`ner`** | `MORT_EXPLICITE`, `MORT_IMPLICITE` |

</details>

### Accuracy

| Type | Score |
| --- | --- |
| `ENTS_F` | 77.61 |
| `ENTS_P` | 82.54 |
| `ENTS_R` | 73.24 |


###  Training

We constructed our dataset by manually  labeling the documents using Doccano, an open-source tool for collaborative human annotation. 
The models were trained using 200-word length sequences, 70% of the data were used for the training, 20% to test and finetune hyperparameters, 
and 10% to evaluate the performances of the model. In order to ensure correct performance evaluation, 
the evaluation sequences were taken from documents that were not used during the training.

Tain dataset 147 labels for MORT_EXPLICITE

Test  dataset is 35 labels for  MORT_EXPLICITE

Valid dataset is 18 labels  for MORT_EXPLICITE