File size: 3,126 Bytes
8c32445
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
69a68fc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8c32445
 
 
 
 
 
 
 
 
 
69a68fc
8c32445
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
69a68fc
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
tags:
- spacy
- token-classification
language:
- fr
model-index:
- name: fr_present_tense_value
  results:
  - task:
      name: NER
      type: token-classification
    metrics:
    - name: NER Precision
      type: precision
      value: 0.7757731959
    - name: NER Recall
      type: recall
      value: 0.7969991174
    - name: NER F Score
      type: f_score
      value: 0.7862429256
      
widget:
- text: "Le 2 décembre, c'est un vendredi, on avait un concert. On se retrouve avec des amis chez moi."
  example_title: "present historique"
- text: "On danse toute la nuit et la vous vous dites qu c'est la meilleure manière de vivre."
  example_title: "present génrique"
- text: "Je me souviens d'avoir vu un enfant danser sur le toît du monde !"
  example_title: "présent ennonciation"
  
license: agpl-3.0
---

## Description

This model was built to compute detect diffferent value of *present tense* in French (them). It's main purpose was to automate annotation on a specific dataset. 
There is no waranty that it  will work on any others dataset. 
We finetune, the camembert-base model using this code; https://github.com/psycholinguistics2125/train_NER.
Tthe present tense might have different meanings depending on the context. It can have a historical value, referring to the past, and it also makes the speech more alive. 
Another meaning is generic, to express general truths like definitions or properties. Finally, it can have an enunciation value by referring to the present moment, to describe an ongoing action. 
These different values of the present tense can only be differentiated by the context. 
This is the reason why models based on contextual embedding (BERT like) should be relevant to differentiate them.

---
| Feature | Description |
| --- | --- |
| **Name** | `fr_present_tense_value` |
| **Version** | `0.0.1` |
| **spaCy** | `>=3.4.4,<3.5.0` |
| **Default Pipeline** | `transformer`, `ner` |
| **Components** | `transformer`, `ner` |
| **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
| **Sources** | n/a |
| **License** |  agpl-3.0 |
| **Author** | [n/a]() |

### Label Scheme

<details>

<summary>View label scheme (3 labels for 1 components)</summary>

| Component | Labels |
| --- | --- |
| **`ner`** | `PRESENT_ENNONCIATION`, `PRESENT_GENERIQUE`, `PRESENT_HISTORIQUE` |

</details>

### Accuracy

| Type | Score |
| --- | --- |
| `ENTS_F` | 78.62 |
| `ENTS_P` | 77.58 |
| `ENTS_R` | 79.70 |


### training

We constructed our dataset by manually labeling the documents using Doccano, an open-source tool for collaborative human annotation. 
The models were trained using 200-word length sequences, 70% of the data were used for the training, 20% to test and finetune hyperparameters, and 10% to evaluate the performances of the model.
In order to ensure correct performance evaluation, the evaluation sequences were taken from documents that were not used during the training.

| label | train | test | valid |
| --- | --- |--- |--- |
| `PRESENT_ENNONCIATION`| 2069 | 673 | 438 | 
| `PRESENT_GENERIQUE`| 704 | 177 | 147 | 
| `PRESENT_HISTORIQUE`|1005 | 289 | 285|