--- language: fr license: cc-by-4.0 --- # Cour de Cassation automatic *titrage* prediction model Model for the automatic prediction of *titrages* (keyword sequence) from *sommaires* (synthesis of legal cases). The models are described in [this paper](https://hal.inria.fr/hal-03663110/file/LREC_2022___CCass_Inria-camera-ready.pdf). If you use this model, please cite our research paper (see [below](#cite)). ## Model description The model is a transformer-base model trained on parallel data (sommaires-titrages) provided by the Cour de Cassation. The model was intially trained using the Fairseq toolkit, converted to HuggingFace and then fine-tuned on the original training data to smooth out minor differences that arose during the conversion process. Tokenisation is performed using a SentencePiece model, the BPE strategy and a vocab size of 8000. ### Intended uses & limitations This model is to be used to produce *titrages* for those *sommaires* that do not have them or to complement existing (manually) created *titrages*. ### How to use ``` from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokeniser = AutoTokenizer.from_pretrained("rbawden/CCASS-pred-titrages-base", use_auth_token=True) model = AutoModelForSeq2SeqLM.from_pretrained("rbawden/CCASS-pred-titrages-base", use_auth_token=True) matiere = "matter" sommaire = "full text from the sommaire on a single line" inputs = tokeniser([matiere + " " + sommaire], return_tensors='pt') outputs = model.generate(inputs['input_ids']) tokeniser.batch_decode(outputs, skip_special_tokens=True, clean_up_tokenisation_spaces=True) ``` ### Limitations and bias ## Training data ## Training procedure ### Preprocessing ### Training ### Evaluation results Coming soon ## BibTex entry and citation info If you use this work, please cite the following article: Thibault Charmet, Inès Cherichi, Matthieu Allain, Urszula Czerwinska, Amaury Fouret, Benoît Sagot and Rachel Bawden, 2022. **Complex Labelling and Similarity Prediction in Legal Texts: Automatic Analysis of France’s Court of Cassation Rulings**. In Proceedings of the 13th Language Resources and Evaluation Conference, Marseille, France. ``` @inproceedings{charmet-et-al-2022-complex, tite = {Complex Labelling and Similarity Prediction in Legal Texts: Automatic Analysis of France’s Court of Cassation Rulings}, author = {Charmet, Thibault and Cherichi, Inès and Allain, Matthieu and Czerwinska, Urszula and Fouret, Amaury, and Sagot, Benoît and Bawden, Rachel}, booktitle = {Proceedings of the 13th Language Resources and Evaluation Conference}, year = {2022}, address = {Marseille, France} ```