Update README.md
Browse files
README.md
CHANGED
@@ -1,9 +1,99 @@
|
|
1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
|
3 |
-
|
|
|
|
|
|
|
4 |
|
5 |
-
|
6 |
|
7 |
-
|
|
|
8 |
|
9 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: eupl-1.1
|
3 |
+
datasets:
|
4 |
+
- EuropeanParliament/cellar_eurovoc
|
5 |
+
language:
|
6 |
+
- en
|
7 |
+
metrics:
|
8 |
+
- type: f1
|
9 |
+
value: 0.XX
|
10 |
+
name: micro F1
|
11 |
+
args:
|
12 |
+
threshold: 0.XX
|
13 |
+
- type: NDCG@3
|
14 |
+
value: 0.X
|
15 |
+
name: NDCG@5
|
16 |
+
- type: NDCG@5
|
17 |
+
value: 0.XX
|
18 |
+
name: NDCG@5
|
19 |
+
- type: NDCG@10
|
20 |
+
value: 0.XX
|
21 |
+
name: NDCG@10
|
22 |
+
tags:
|
23 |
+
- eurovoc
|
24 |
+
pipeline_tag: text-classification
|
25 |
|
26 |
+
widget:
|
27 |
+
- text: "The Union condemns the continuing grave human rights violations by the Myanmar armed forces, including torture, sexual and gender-based violence, the persecution of civil society actors, human rights defenders and journalists, and attacks on the civilian population, including ethnic and religious minorities."
|
28 |
+
|
29 |
+
---
|
30 |
|
31 |
+
# Eurovoc Multilabel Classifer
|
32 |
|
33 |
+
[EuroVoc](https://op.europa.eu/fr/web/eu-vocabularies) is a large multidisciplinary multilingual hierarchical thesaurus of more than 7000 classes covering the activities of EU institutions.
|
34 |
+
Given the number of legal documents produced every day and the huge mass of pre-existing documents to be classified high quality automated or semi-automated classification methods are most welcome in this domain.
|
35 |
|
36 |
+
This model based on BERT Deep Neural Network was trained on more than 3, 200,000 documents to achieve that task and is used in a production environment via the huggingface inference endpoint.
|
37 |
+
This model support the 24 languages of the European Union.
|
38 |
+
|
39 |
+
|
40 |
+
## Architecture
|
41 |
+
|
42 |
+
![architecture](img/architecture.png)
|
43 |
+
|
44 |
+
7331 Eurovoc labels
|
45 |
+
|
46 |
+
## Usage
|
47 |
+
|
48 |
+
```python
|
49 |
+
from eurovoc import EurovocTagger
|
50 |
+
model = EurovocTagger.from_pretrained("EuropeanParliament/eurovoc_eu")
|
51 |
+
```
|
52 |
+
|
53 |
+
## Metrics
|
54 |
+
|
55 |
+
|
56 |
+
### Eurlex57k Dataset
|
57 |
+
|
58 |
+
| Metric | Value | Threshold Value |
|
59 |
+
|------------|----------|-----------------|
|
60 |
+
| Micro F1 | 0.XX | 0.XX |
|
61 |
+
| NDCG@3 | 0.XX | - |
|
62 |
+
| NDCG@5 | 0.XX | - |
|
63 |
+
| NDCG@10 | 0.XX | - |
|
64 |
+
|
65 |
+
These values are in line with the state of the art in the field, see the publication [Large Scale Legal Text Classification Using Transformer Models](https://arxiv.org/pdf/2010.12871.pdf).
|
66 |
+
|
67 |
+
|
68 |
+
## Inference Endpoint
|
69 |
+
|
70 |
+
### Payload example
|
71 |
+
|
72 |
+
```json
|
73 |
+
{
|
74 |
+
"inputs": "The Union condemns the continuing grave human rights violations by the Myanmar armed forces, including torture, sexual and gender-based violence, the persecution of civil society actors, human rights defenders and journalists, and attacks on the civilian population, including ethnic and religious minorities. ",
|
75 |
+
"topk": 10,
|
76 |
+
"threshold": 0.16
|
77 |
+
}
|
78 |
+
|
79 |
+
```
|
80 |
+
|
81 |
+
result:
|
82 |
+
|
83 |
+
```json
|
84 |
+
{'results': [{'label': 'international sanctions', 'score': 0.9994925260543823},
|
85 |
+
{'label': 'economic sanctions', 'score': 0.9991770386695862},
|
86 |
+
{'label': 'natural person', 'score': 0.9591936469078064},
|
87 |
+
{'label': 'EU restrictive measure', 'score': 0.8388392329216003},
|
88 |
+
{'label': 'legal person', 'score': 0.45630475878715515},
|
89 |
+
{'label': 'Burma/Myanmar', 'score': 0.43375277519226074}]}
|
90 |
+
```
|
91 |
+
|
92 |
+
Only six results, because the following one score is less that 0.16
|
93 |
+
|
94 |
+
Default value, topk = 5 and threshold = 0.16
|
95 |
+
|
96 |
+
|
97 |
+
## Author(s)
|
98 |
+
|
99 |
+
Sébastien Campion <sebastien.campion@europarl.europa.eu>
|