File size: 4,601 Bytes
2e07fe1
 
 
 
 
 
 
741ad14
 
a7c129a
 
 
 
1241907
a7c129a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
191c00e
a7c129a
 
 
 
 
 
 
 
 
 
 
191c00e
a7c129a
 
 
 
 
 
 
 
 
 
191c00e
a7c129a
1241907
a7c129a
 
 
 
c3fd437
a7c129a
c3fd437
 
 
a7c129a
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
Model card for jurBERT-base

---
language: 
- ro
---

# jurBERT-base


## Pretrained juridical BERT model for Romanian 

BERT Romanian juridical model trained using a masked language modeling (MLM) and next sentence prediction (NSP) objective. 
It was introduced in this [paper](https://aclanthology.org/2021.nllp-1.8/). Two BERT models were released: **jurBERT-base** and **jurBERT-large**, all versions uncased.

| Model          | Weights   |   L    |   H    |    A   | MLM accuracy   | NSP accuracy   |
|----------------|:---------:|:------:|:------:|:------:|:--------------:|:--------------:|
| *jurBERT-base*    | *111M*      | *12*     | *768*    | *12*     | *0.8936*         | *0.9923*         |
| jurBERT-large   | 337M      | 24     | 1024   | 24     | 0.9005         | 0.9929         |




All models are available:

* [jurBERT-base](https://huggingface.co/readerbench/jurBERT-base)
* [jurBERT-large](https://huggingface.co/readerbench/jurBERT-large)



#### How to use

```python
# tensorflow
from transformers import AutoModel, AutoTokenizer, TFAutoModel
tokenizer = AutoTokenizer.from_pretrained("readerbench/jurBERT-base")
model = TFAutoModel.from_pretrained("readerbench/jurBERT-base")
inputs = tokenizer("exemplu de propoziție", return_tensors="tf")
outputs = model(inputs)


# pytorch
from transformers import AutoModel, AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("readerbench/jurBERT-base")
model = AutoModel.from_pretrained("readerbench/jurBERT-base")
inputs = tokenizer("exemplu de propoziție", return_tensors="pt")
outputs = model(**inputs)
```


## Datasets

The model is trained on a private corpus (that can nevertheless be rented for a fee), that is comprised of all the final ruling, containing both civil and criminal cases, published by any Romanian civil court between 2010 and 2018. Validation is performed on two other datasets, RoBanking and BRDCases. We extracted from RoJur common types of cases pertinent to the banking domain (e.g. administration fee litigations, enforcement appeals), kept only the summary of the arguments provided by both the plaitiffs and the defendants and the final verdict (in the form of a boolean value) to build RoBanking. BRDCases represents a collection of cases in which BRD Groupe Société Générale Romania was directly involved. 

| Corpus    | Scope        |Entries    |  Size (GB)|
|-----------|:------------:|:---------:|:---------:|
| RoJur     | pre-training | 11M       | 160       |
| RoBanking | downstream   | 108k      | -         |
| BRDCases  | downstream   | 149       | -         |


## Downstream performance

We report Mean AUC and Std AUC on the task of predicting the outcome of a case. 

### Results on RoBanking using only the plea of the plaintiff.

| Model              | Mean AUC | Std AUC  |
|--------------------|:--------:|:--------:|
| CNN                | 79.60    | -        |
| BI-LSTM            | 80.99    | 0.26     |
| RoBERT-small       | 70.54    | 0.28     |
| RoBERT-base        | 79.74    | 0.21     |
| RoBERT-base + hf   | 79.82    | 0.11     |
| RoBERT-large       | 76.53    | 5.43     |
| *jurBERT-base*     | **81.47**| **0.18** |
| *jurBERT-base + hf*| *81.40*  | *0.18*   |
| jurBERT-large      | 78.38    | 1.77     |

### Results on RoBanking using pleas from both the plaintiff and defendant.

| Model               | Mean AUC | Std AUC  |
|---------------------|:--------:|:--------:|
| BI-LSTM             | 84.60    | 0.59     |
| RoBERT-base         | 84.40    | 0.26     |
| RoBERT-base + hf    | 84.43    | 0.15     |
| *jurBERT-base*      | *86.63*  | *0.18*   |
| *jurBERT-base + hf* | **86.73**| **0.22** |
| jurBERT-large       | 82.04    | 0.64     |

### Results on BRDCases

| Model               | Mean AUC | Std AUC  |
|---------------------|:--------:|:--------:|
| SVM with SK         | 57.72    | 2.15     |
| RoBERT-base         | 53.24    | 1.76     |
| RoBERT-base + hf    | 55.40    | 0.96     |
| *jurBERT-base*      | *59.65*  | *1.16*   |
| *jurBERT-base + hf* | **61.46**| **1.76** |

For complete results and discussion please refer to the [paper](https://aclanthology.org/2021.nllp-1.8/).

### BibTeX entry and citation info

```bibtex
@inproceedings{masala2021jurbert,
  title={jurBERT: A Romanian BERT Model for Legal Judgement Prediction},
  author={Masala, Mihai and Iacob, Radu Cristian Alexandru and Uban, Ana Sabina and Cidota, Marina and Velicu, Horia and Rebedea, Traian and Popescu, Marius},
  booktitle={Proceedings of the Natural Legal Language Processing Workshop 2021},
  pages={86--94},
  year={2021}
}
```