Text Classification
PyTorch
Safetensors
English
eurovoc
Inference Endpoints
File size: 3,330 Bytes
7699d98
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
428ac08
7699d98
 
 
 
b7e02de
7699d98
b7e02de
7699d98
 
b7e02de
7699d98
 
 
 
 
 
23073a8
7699d98
bcd3aaa
7699d98
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
---
license: eupl-1.1
datasets:
- EuropeanParliament/cellar_eurovoc
language:
- en
metrics:
  - type: f1         
    value: 0.XX 
    name: micro F1
    args:
      threshold: 0.XX
  - type: NDCG@3         
    value: 0.X  
    name: NDCG@5
  - type: NDCG@5         
    value: 0.XX 
    name: NDCG@5
  - type: NDCG@10         
    value: 0.XX  
    name: NDCG@10
tags:
- eurovoc
pipeline_tag: text-classification

widget:
- text: "The Union condemns the continuing grave human rights violations by the Myanmar armed forces, including torture, sexual and gender-based violence, the persecution of civil society actors, human rights defenders and journalists, and attacks on the civilian population, including ethnic and religious minorities."
 
---

# Eurovoc Multilabel Classifer

[EuroVoc](https://op.europa.eu/fr/web/eu-vocabularies) is a large multidisciplinary multilingual hierarchical thesaurus of more than 7000 classes covering the activities of EU institutions.
Given the number of legal documents produced every day and the huge mass of pre-existing documents to be classified high quality automated or semi-automated classification methods are most welcome in this domain.

This model based on BERT Deep Neural Network was trained on more than 3, 200,000 documents to achieve that task and is used in a production environment via the huggingface inference endpoint.
This model support the 24 languages of the European Union.


## Architecture

![architecture](architecture.png)

This classification model is build on top of [EUBERT](https://huggingface.co/EuropeanParliament/EUBERT) with 7331 Eurovoc labels

## Usage 

```python
from eurovoc import EurovocTagger
model = EurovocTagger.from_pretrained("EuropeanParliament/eurovoc_eu")
```

## Metrics


### Eurlex57k Dataset

| Metric     | Value    | Threshold Value |
|------------|----------|-----------------|
| Micro F1   | 0.XX     | 0.XX            |
| NDCG@3     | 0.XX     | -               |
| NDCG@5     | 0.XX     | -               |
| NDCG@10    | 0.XX     | -               |

These values are in line with the state of the art in the field, see the publication [Large Scale Legal Text Classification Using Transformer Models](https://arxiv.org/pdf/2010.12871.pdf).


## Inference Endpoint

### Payload example 

```json 
{
  "inputs": "The Union condemns the continuing grave human rights violations by the Myanmar armed forces, including torture, sexual and gender-based violence, the persecution of civil society actors, human rights defenders and journalists, and attacks on the civilian population, including ethnic and religious minorities. ",
  "topk": 10,
  "threshold": 0.16
}

```

result: 

```json 
{'results': [{'label': 'international sanctions', 'score': 0.9994925260543823},
             {'label': 'economic sanctions', 'score': 0.9991770386695862},
             {'label': 'natural person', 'score': 0.9591936469078064},
             {'label': 'EU restrictive measure', 'score': 0.8388392329216003},
             {'label': 'legal person', 'score': 0.45630475878715515},
             {'label': 'Burma/Myanmar', 'score': 0.43375277519226074}]}
```

Only six results, because the following one score is less that 0.16

Default value, topk = 5 and threshold = 0.16


## Author(s)

Sébastien Campion <sebastien.campion@europarl.europa.eu>