---
language: fr
license: cc-by-4.0
---

# Cour de Cassation automatic *titrage* prediction model

Model for the automatic prediction of *titrages* (keyword sequence) from *sommaires* (synthesis of legal cases). The models are described in [this paper](https://hal.inria.fr/hal-03663110/file/LREC_2022___CCass_Inria-camera-ready.pdf). If you use this model, please cite our research paper (see [below](#cite)).

## Model description

The model is a transformer-base model trained on parallel data (sommaires-titrages) provided by the Cour de Cassation. The model was intially trained using the Fairseq toolkit, converted to HuggingFace and then fine-tuned on the original training data to smooth out minor differences that arose during the conversion process. Tokenisation is performed using a SentencePiece model, the BPE strategy and a vocab size of 8000.

### Intended uses & limitations

This model is to be used to produce *titrages* for those *sommaires* that do not have them or to complement existing (manually) created *titrages*. 

### How to use

```
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokeniser = AutoTokenizer.from_pretrained("rbawden/CCASS-pred-titrages-base", use_auth_token=True)
model = AutoModelForSeq2SeqLM.from_pretrained("rbawden/CCASS-pred-titrages-base", use_auth_token=True)

matiere = "matter"
sommaire = "full text from the sommaire on a single line"
inputs = tokeniser([matiere + " <t> " + sommaire], return_tensors='pt')
outputs = model.generate(inputs['input_ids'])
tokeniser.batch_decode(outputs, skip_special_tokens=True, clean_up_tokenisation_spaces=True)
```


### Limitations and bias


## Training data


## Training procedure

### Preprocessing


### Training

### Evaluation results

Coming soon

## BibTex entry and citation info
<a name="cite"></a>

If you use this work, please cite the following article:

Thibault Charmet, Inès Cherichi, Matthieu Allain, Urszula Czerwinska, Amaury Fouret, Benoît Sagot and Rachel Bawden, 2022. **Complex Labelling and Similarity Prediction in Legal Texts: Automatic Analysis of France’s Court of Cassation Rulings**. In Proceedings of the 13th Language Resources and Evaluation Conference, Marseille, France.

```
@inproceedings{charmet-et-al-2022-complex,
  tite = {Complex Labelling and Similarity Prediction in Legal Texts: Automatic Analysis of France’s Court of Cassation Rulings},
  author = {Charmet, Thibault and Cherichi, Inès and Allain, Matthieu and Czerwinska, Urszula and Fouret, Amaury, and Sagot, Benoît and Bawden, Rachel},
  booktitle = {Proceedings of the 13th Language Resources and Evaluation Conference},
  year = {2022},
  address = {Marseille, France}
```