binbin83 commited on
Commit
7db14b6
1 Parent(s): 4b7ceac

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -1
README.md CHANGED
@@ -20,7 +20,19 @@ model-index:
20
  - name: NER F Score
21
  type: f_score
22
  value: 0.776119403
 
 
 
 
 
23
  ---
 
 
 
 
 
 
 
24
  | Feature | Description |
25
  | --- | --- |
26
  | **Name** | `fr_lexical_death` |
@@ -53,4 +65,18 @@ model-index:
53
  | `ENTS_P` | 82.54 |
54
  | `ENTS_R` | 73.24 |
55
  | `TRANSFORMER_LOSS` | 51778.17 |
56
- | `NER_LOSS` | 41163.78 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  - name: NER F Score
21
  type: f_score
22
  value: 0.776119403
23
+ license: agpl-3.0
24
+ widget:
25
+ - example 1: "Il faut pas sortir, vous reviendrez pas vivantes."
26
+ - example 2: "Les morts ne parlents pas."
27
+ - example 3: "Les Ambulances garés, les cortèges de defunts, les cadavres qu'on sortait des décombres"
28
  ---
29
+
30
+ ## Description
31
+
32
+ This model was built to compute detect the lexical field of death. It's main purpose was to automate annotation on a specific dataset.
33
+ There is no waranty that it will work on any others dataset. We finetune, the camembert-base model using this code; https://github.com/psycholinguistics2125/train_NER.
34
+
35
+
36
  | Feature | Description |
37
  | --- | --- |
38
  | **Name** | `fr_lexical_death` |
 
65
  | `ENTS_P` | 82.54 |
66
  | `ENTS_R` | 73.24 |
67
  | `TRANSFORMER_LOSS` | 51778.17 |
68
+ | `NER_LOSS` | 41163.78 |
69
+
70
+ ### Training
71
+
72
+ We constructed our dataset by manually labeling the documents using Doccano, an open-source tool for collaborative human annotation.
73
+ The models were trained using 200-word length sequences, 70% of the data were used for the training, 20% to test and finetune hyperparameters,
74
+ and 10% to evaluate the performances of the model. In order to ensure correct performance evaluation,
75
+ the evaluation sequences were taken from documents that were not used during the training.
76
+
77
+ Tain dataset 147 labels for MORT_EXPLICITE
78
+
79
+ Test dataset is 35 labels for MORT_EXPLICITE
80
+
81
+ Valid dataset is 18 labels for MORT_EXPLICITE
82
+