|
--- |
|
tags: |
|
- spacy |
|
- token-classification |
|
language: |
|
- fr |
|
model-index: |
|
- name: fr_present_tense_value |
|
results: |
|
- task: |
|
name: NER |
|
type: token-classification |
|
metrics: |
|
- name: NER Precision |
|
type: precision |
|
value: 0.7757731959 |
|
- name: NER Recall |
|
type: recall |
|
value: 0.7969991174 |
|
- name: NER F Score |
|
type: f_score |
|
value: 0.7862429256 |
|
|
|
widget: |
|
- text: "Le 2 décembre, c'est un vendredi, on avait un concert. On se retrouve avec des amis chez moi." |
|
example_title: "present historique" |
|
- text: "On danse toute la nuit et la vous vous dites qu c'est la meilleure manière de vivre." |
|
example_title: "present génrique" |
|
- text: "Je me souviens d'avoir vu un enfant danser sur le toît du monde !" |
|
example_title: "présent ennonciation" |
|
|
|
license: agpl-3.0 |
|
--- |
|
|
|
## Description |
|
|
|
This model was built to compute detect diffferent value of *present tense* in French (them). It's main purpose was to automate annotation on a specific dataset. |
|
There is no waranty that it will work on any others dataset. |
|
We finetune, the camembert-base model using this code; https://github.com/psycholinguistics2125/train_NER. |
|
Tthe present tense might have different meanings depending on the context. It can have a historical value, referring to the past, and it also makes the speech more alive. |
|
Another meaning is generic, to express general truths like definitions or properties. Finally, it can have an enunciation value by referring to the present moment, to describe an ongoing action. |
|
These different values of the present tense can only be differentiated by the context. |
|
This is the reason why models based on contextual embedding (BERT like) should be relevant to differentiate them. |
|
|
|
--- |
|
| Feature | Description | |
|
| --- | --- | |
|
| **Name** | `fr_present_tense_value` | |
|
| **Version** | `0.0.1` | |
|
| **spaCy** | `>=3.4.4,<3.5.0` | |
|
| **Default Pipeline** | `transformer`, `ner` | |
|
| **Components** | `transformer`, `ner` | |
|
| **Vectors** | 0 keys, 0 unique vectors (0 dimensions) | |
|
| **Sources** | n/a | |
|
| **License** | agpl-3.0 | |
|
| **Author** | [n/a]() | |
|
|
|
### Label Scheme |
|
|
|
<details> |
|
|
|
<summary>View label scheme (3 labels for 1 components)</summary> |
|
|
|
| Component | Labels | |
|
| --- | --- | |
|
| **`ner`** | `PRESENT_ENNONCIATION`, `PRESENT_GENERIQUE`, `PRESENT_HISTORIQUE` | |
|
|
|
</details> |
|
|
|
### Accuracy |
|
|
|
| Type | Score | |
|
| --- | --- | |
|
| `ENTS_F` | 78.62 | |
|
| `ENTS_P` | 77.58 | |
|
| `ENTS_R` | 79.70 | |
|
|
|
|
|
### training |
|
|
|
We constructed our dataset by manually labeling the documents using Doccano, an open-source tool for collaborative human annotation. |
|
The models were trained using 200-word length sequences, 70% of the data were used for the training, 20% to test and finetune hyperparameters, and 10% to evaluate the performances of the model. |
|
In order to ensure correct performance evaluation, the evaluation sequences were taken from documents that were not used during the training. |
|
|
|
| label | train | test | valid | |
|
| --- | --- |--- |--- | |
|
| `PRESENT_ENNONCIATION`| 2069 | 673 | 438 | |
|
| `PRESENT_GENERIQUE`| 704 | 177 | 147 | |
|
| `PRESENT_HISTORIQUE`|1005 | 289 | 285| |
|
|
|
|