Text Classification
PyTorch
Safetensors
English
eurovoc
Inference Endpoints
scampion commited on
Commit
7699d98
1 Parent(s): 3f189e8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +95 -5
README.md CHANGED
@@ -1,9 +1,99 @@
1
- 👷🏻 Work in progress
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
- NDCG@3: 0.7127
 
 
 
4
 
5
- NDCG@5: 0.6549
6
 
7
- NDCG@10: 0.6382
 
8
 
9
- Micro F1 Score: 0.31 (0.5295)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: eupl-1.1
3
+ datasets:
4
+ - EuropeanParliament/cellar_eurovoc
5
+ language:
6
+ - en
7
+ metrics:
8
+ - type: f1
9
+ value: 0.XX
10
+ name: micro F1
11
+ args:
12
+ threshold: 0.XX
13
+ - type: NDCG@3
14
+ value: 0.X
15
+ name: NDCG@5
16
+ - type: NDCG@5
17
+ value: 0.XX
18
+ name: NDCG@5
19
+ - type: NDCG@10
20
+ value: 0.XX
21
+ name: NDCG@10
22
+ tags:
23
+ - eurovoc
24
+ pipeline_tag: text-classification
25
 
26
+ widget:
27
+ - text: "The Union condemns the continuing grave human rights violations by the Myanmar armed forces, including torture, sexual and gender-based violence, the persecution of civil society actors, human rights defenders and journalists, and attacks on the civilian population, including ethnic and religious minorities."
28
+
29
+ ---
30
 
31
+ # Eurovoc Multilabel Classifer
32
 
33
+ [EuroVoc](https://op.europa.eu/fr/web/eu-vocabularies) is a large multidisciplinary multilingual hierarchical thesaurus of more than 7000 classes covering the activities of EU institutions.
34
+ Given the number of legal documents produced every day and the huge mass of pre-existing documents to be classified high quality automated or semi-automated classification methods are most welcome in this domain.
35
 
36
+ This model based on BERT Deep Neural Network was trained on more than 3, 200,000 documents to achieve that task and is used in a production environment via the huggingface inference endpoint.
37
+ This model support the 24 languages of the European Union.
38
+
39
+
40
+ ## Architecture
41
+
42
+ ![architecture](img/architecture.png)
43
+
44
+ 7331 Eurovoc labels
45
+
46
+ ## Usage
47
+
48
+ ```python
49
+ from eurovoc import EurovocTagger
50
+ model = EurovocTagger.from_pretrained("EuropeanParliament/eurovoc_eu")
51
+ ```
52
+
53
+ ## Metrics
54
+
55
+
56
+ ### Eurlex57k Dataset
57
+
58
+ | Metric | Value | Threshold Value |
59
+ |------------|----------|-----------------|
60
+ | Micro F1 | 0.XX | 0.XX |
61
+ | NDCG@3 | 0.XX | - |
62
+ | NDCG@5 | 0.XX | - |
63
+ | NDCG@10 | 0.XX | - |
64
+
65
+ These values are in line with the state of the art in the field, see the publication [Large Scale Legal Text Classification Using Transformer Models](https://arxiv.org/pdf/2010.12871.pdf).
66
+
67
+
68
+ ## Inference Endpoint
69
+
70
+ ### Payload example
71
+
72
+ ```json
73
+ {
74
+ "inputs": "The Union condemns the continuing grave human rights violations by the Myanmar armed forces, including torture, sexual and gender-based violence, the persecution of civil society actors, human rights defenders and journalists, and attacks on the civilian population, including ethnic and religious minorities. ",
75
+ "topk": 10,
76
+ "threshold": 0.16
77
+ }
78
+
79
+ ```
80
+
81
+ result:
82
+
83
+ ```json
84
+ {'results': [{'label': 'international sanctions', 'score': 0.9994925260543823},
85
+ {'label': 'economic sanctions', 'score': 0.9991770386695862},
86
+ {'label': 'natural person', 'score': 0.9591936469078064},
87
+ {'label': 'EU restrictive measure', 'score': 0.8388392329216003},
88
+ {'label': 'legal person', 'score': 0.45630475878715515},
89
+ {'label': 'Burma/Myanmar', 'score': 0.43375277519226074}]}
90
+ ```
91
+
92
+ Only six results, because the following one score is less that 0.16
93
+
94
+ Default value, topk = 5 and threshold = 0.16
95
+
96
+
97
+ ## Author(s)
98
+
99
+ Sébastien Campion <sebastien.campion@europarl.europa.eu>