File size: 2,662 Bytes
c7ccaee
 
 
 
 
 
 
abe4541
8b658ac
 
c7ccaee
 
 
 
6e11131
c7ccaee
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8b658ac
c7ccaee
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71cf7b1
c7ccaee
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70

---
language: Deustch Italian  
tags:
- translation Deustch Italian  model
datasets:
- dcep europarl jrc-acquis
widget:
- text: "Die Mitgliedstaaten müssen bei Verstößen gegen die Pflicht, beim Überschreiten der Außengrenzen der Europäischen Union Bewegungen von Barmitteln anzumelden, wirksame, angemessene und abschreckende Strafen verhängen."

---

# legal_t5_small_trans_de_it model

Model on translating legal text from Deustch to Italian. It was first released in
[this repository](https://github.com/agemagician/LegalTrans). This model is trained on three parallel corpus from jrc-acquis, europarl and dcep.


## Model description

legal_t5_small_trans_de_it is based on the `t5-small` model and was trained on a large corpus of parallel text. This is a smaller model, which scales the baseline model of t5 down by using `dmodel = 512`, `dff = 2,048`, 8-headed attention, and only 6 layers each in the encoder and decoder. This variant has about 60 million parameters.

## Intended uses & limitations

The model could be used for translation of legal texts from Deustch to Italian.

### How to use

Here is how to use this model to translate legal text from Deustch to Italian in PyTorch:

```python
from transformers import AutoTokenizer, AutoModelWithLMHead, TranslationPipeline

pipeline = TranslationPipeline(
model=AutoModelWithLMHead.from_pretrained("SEBIS/legal_t5_small_trans_de_it"),
tokenizer=AutoTokenizer.from_pretrained(pretrained_model_name_or_path = "SEBIS/legal_t5_small_trans_de_it", do_lower_case=False, 
                                            skip_special_tokens=True),
    device=0
)

de_text = "Die Mitgliedstaaten müssen bei Verstößen gegen die Pflicht, beim Überschreiten der Außengrenzen der Europäischen Union Bewegungen von Barmitteln anzumelden, wirksame, angemessene und abschreckende Strafen verhängen."

pipeline([de_text], max_length=512)
```

## Training data

The legal_t5_small_trans_de_it model was trained on [JRC-ACQUIS](https://wt-public.emm4u.eu/Acquis/index_2.2.html), [EUROPARL](https://www.statmt.org/europarl/), and [DCEP](https://ec.europa.eu/jrc/en/language-technologies/dcep) dataset consisting of 5 Million parallel texts.

## Training procedure

### Preprocessing

### Pretraining
An unigram model with 88M parameters is trained over the complete parallel corpus to get the vocabulary (with byte pair encoding), which is used with this model.


## Evaluation results

When the model is used for translation test dataset, achieves the following results:

Test results :

| Model | BLEU score |
|:-----:|:-----:|
|   legal_t5_small_trans_de_it | 43.3|


### BibTeX entry and citation info