--- license: mit language: - es tags: - flair - token-classification - sequence-tagger-model datasets: - ArJuzPCyF10 metrics: - precision - recall - f1-score widget: - text: 1. DECLARAR EXTINGUIDA LA ACCIÓN PENAL en este caso por cumplimiento de la suspensión del proceso a prueba, y SOBRESEER a EZEQUIEL CAMILO MARCONNI, DNI 11.222.333, en orden a los delitos de lesiones leves agravadas, amenazas simples y agravadas por el uso de armas. library_name: flair pipeline_tag: token-classification --- # Model Description Following the FLAIR guidelines for training a NER model, we trained a model on top of [BETO embeddings](https://huggingface.co/dccuchile/bert-base-spanish-wwm-uncased) (a spanish version of BERT trained in a spanish corpus) and a BiLSTM-CRF architecture. This model was developed by [{ collective.ai }](https://collectiveai.io) as part of the [AymurAI](https://www.aymurai.info) project by [DataGenero](https://www.datagenero.org). # About AymurAI, its uses and limitations AymurAI is intended to be used as a tool to address the lack of available data in the judicial system on gender-based violence (GBV) rulings in Latin America. The goal is to increase report levels, build trust in the justice system, and improve access to justice for women and LGBTIQ+ people. AymurAI will generate and maintain anonymized datasets from legal rulings to understand GBV and support policy making, and also contribute to feminist collectives' campaigns. AymurAI is still a prototype and is only being implemented in Criminal Court N°10 in the City of Buenos Aires, Argentina. Its capabilities are limited to semi-automated data anonymization, collection and analysis, and the results may be subject to limitations such as the quality and consistency of the data, and the availability of the data. Additionally, the effectiveness of AymurAI in addressing the lack of transparency in the judicial system and improving access to justice may also depend on other factors such as the level of cooperation from court officials and the broader cultural and political context. This model was trained with a closed dataset from an Argentine criminal court. It's designed to be used as a tool to address the need for anonymization of legal documents. The objective is to safeguard the confidentiality of individuals implicated in legal cases, yet simultaneously facilitate the examination and comprehension of legal tendencies and patterns. The use of a domain specific dataset from an Argentine criminal court ensures that the model is tailored to the specific legal and cultural context, allowing for more accurate results. However, it also means that the model may not be applicable or effective in other countries or regions with different legal systems or cultural norms. # Usage ## How to use the model in Flair Requires: **[Flair](https://github.com/flairNLP/flair/)**. Install it with `pip install flair` ```python from flair.data import Sentence from flair.models import SequenceTagger # load tagger tagger = SequenceTagger.load("aymurai/anonymizer-beto-cased-flair") # make example sentence sentence = Sentence("1. DECLARAR EXTINGUIDA LA ACCIÓN PENAL en este caso por cumplimiento de la suspensión del proceso a prueba, y SOBRESEER a EZEQUIEL CAMILO MARCONNI, DNI 11.222.333, en orden a los delitos de lesiones leves agravadas, amenazas simples y agravadas por el uso de armas.") # predict NER tags tagger.predict(sentence) # print sentence print(sentence) # print predicted NER spans print('The following NER tags are found:') # iterate over entities and print for entity in sentence.get_spans('ner'): print(entity) ``` This yields the following output: ``` Span[22:25]: "EZEQUIEL CAMILO MARCONNI" → PER (0.9541) Span[27:28]: "11.222.333" → DNI (1.0) ``` ## Using the model in AymurAI platform Please refeer to [aymurai.info](https://www.aymurai.info) for more information of the full platform. You can also check the development repository [here](https://github.com/aymurai/dev). # Entities and metrics ## Description Please refer to the entities' description table ([en](docs/en-entities-table.md)|[es](docs/es-entities-table.md)). ## Data The model was trained with a dataset of 535 legal rulings from an Argentine criminal court. Due to the nature of the data (personal data, complaint characteristics and victim protection) the documents are kept private. ## Metrics | label | precision | recall | f1-score | |------------------|-----------|----------|----------| | BANCO| 1.00| 0.90| 0.95| | CBU| 0.92| 0.92| 0.92| |CORREO_ELECTRONICO| 1.00| 1.00| 1.00| | CUIJ| 1.00| 1.00| 1.00| | CUIT_CUIL| 1.00| 1.00| 1.00| | DIRECCION| 0.97| 0.85| 0.91| | DNI| 0.96| 1.00| 0.98| | EDAD| 1.00| 0.95| 0.97| | ESTUDIOS| 1.00| 1.00| 1.00| | FECHA| 1.00| 0.99| 1.00| | LINK| 1.00| 0.94| 0.97| | LOC| 0.99| 0.72| 0.83| | MARCA_AUTOMOVIL| 0.95| 1.00| 0.97| | NACIONALIDAD| 1.00| 0.94| 0.97| | NUM_ACTUACION| 0.84| 0.96| 0.90| | NUM_CAJA_AHORRO| 0.00| 0.00| 0.00| | NUM_EXPEDIENTE| 0.98| 0.92| 0.95| | NUM_MATRICULA| 0.33| 0.50| 0.40| | O| 0.99| 1.00| 1.00| | PATENTE_DOMINIO| 1.00| 1.00| 1.00| | PER| 0.98| 0.97| 0.98| | TELEFONO| 0.97| 1.00| 0.99| | TEXTO_ANONIMIZAR| 0.98| 0.61| 0.75| | | | | | | macro avg| 0.91| 0.88| 0.89| # GitHub You can see our open-source development [here](https://github.com/AymurAI/). # Citation Please cite [the following paper](https://drive.google.com/file/d/1P-hW0JKXWZ44Fn94fDVIxQRTExkK6m4Y/view) when using AymurAI: ```bibtex @techreport{feldfeber2022, author = "Feldfeber, Ivana and Quiroga, Yasmín Belén and Guevara, Clarissa and Ciolfi Felice, Marianela", title = "Feminisms in Artificial Intelligence: Automation Tools towards a Feminist Judiciary Reform in Argentina and Mexico", institution = "DataGenero", year = "2022", url = "https://drive.google.com/file/d/1P-hW0JKXWZ44Fn94fDVIxQRTExkK6m4Y/view" } ```