fr_lexical_death / README.md
binbin83's picture
Update README.md
9b710d3
|
raw
history blame
2.36 kB
metadata
tags:
  - spacy
  - token-classification
language:
  - fr
model-index:
  - name: fr_lexical_death
    results:
      - task:
          name: NER
          type: token-classification
        metrics:
          - name: NER Precision
            type: precision
            value: 0.8253968254
          - name: NER Recall
            type: recall
            value: 0.7323943662
          - name: NER F Score
            type: f_score
            value: 0.776119403
widget:
  - text: Vous ne devais pas y aller... vous ne sortirai pas vivantes
    example_title: Mort implicite
  - text: Les morts ne parlent pas
    example_title: Mort explicite 1
  - text: >-
      Les ambulances garées, le cortège des défunts, les cadavres qui sortaient
      de dessous les décombres
    example_title: Mort explicite
license: agpl-3.0

Description

This model was built to compute detect the lexical field of death. It's main purpose was to automate annotation on a specific dataset. There is no waranty that it will work on any others dataset. We finetune, the camembert-base model using this code; https://github.com/psycholinguistics2125/train_NER.

Feature Description
Name fr_lexical_death
Version 0.0.1
spaCy >=3.4.4,<3.5.0
Default Pipeline transformer, ner
Components transformer, ner
Vectors 0 keys, 0 unique vectors (0 dimensions)
Sources n/a
License agpl-3.0
Author n/a

Label Scheme

View label scheme (2 labels for 1 components)
Component Labels
ner MORT_EXPLICITE, MORT_IMPLICITE

Accuracy

Type Score
ENTS_F 77.61
ENTS_P 82.54
ENTS_R 73.24

Training

We constructed our dataset by manually labeling the documents using Doccano, an open-source tool for collaborative human annotation. The models were trained using 200-word length sequences, 70% of the data were used for the training, 20% to test and finetune hyperparameters, and 10% to evaluate the performances of the model. In order to ensure correct performance evaluation, the evaluation sequences were taken from documents that were not used during the training.

Tain dataset 147 labels for MORT_EXPLICITE

Test dataset is 35 labels for MORT_EXPLICITE

Valid dataset is 18 labels for MORT_EXPLICITE