|
--- |
|
language: |
|
- es |
|
license: mit |
|
widget: |
|
- text: "La Constitución española de 1978 es la <mask> suprema del ordenamiento jurídico español." |
|
tags: |
|
- Long documents |
|
- longformer |
|
- robertalex |
|
- spanish |
|
- legal |
|
|
|
--- |
|
|
|
# Legal ⚖️ longformer-base-4096-spanish |
|
|
|
## [Longformer](https://arxiv.org/abs/2004.05150) is a Transformer model for long documents. |
|
|
|
`legal-longformer-base-4096` is a BERT-like model started from the RoBERTa checkpoint (**[RoBERTalex](PlanTL-GOB-ES/RoBERTalex)** in this case) and pre-trained for *MLM* on long documents from the [Spanish Legal Domain Corpora](https://zenodo.org/record/5495529/#.Y205lpHMKV5). It supports sequences of length up to **4,096**! |
|
|
|
**Longformer** uses a combination of a sliding window (*local*) attention and *global* attention. Global attention is user-configured based on the task to allow the model to learn task-specific representations. |
|
|
|
|
|
This model was made following the research done by [Iz Beltagy and Matthew E. Peters and Arman Cohan](https://arxiv.org/abs/2004.05150). |
|
|
|
## Model (base checkpoint) |
|
[RoBERTalex](https://huggingface.co/PlanTL-GOB-ES/RoBERTalex?) |
|
There are few models trained for the Spanish language. Some of the models have been trained with a low resource, unclean corpora. The ones derived from the Spanish National Plan for Language Technologies are proficient in solving several tasks and have been trained using large-scale clean corpora. However, the Spanish Legal domain language could be thought of as an independent language on its own. We, therefore, created a Spanish Legal model from scratch trained exclusively on legal corpora. |
|
|
|
## Dataset |
|
[Spanish Legal Domain Corpora](https://zenodo.org/record/5495529) |
|
A collection of corpora of Spanish legal domain. |
|
|
|
More legal domain resources: https://github.com/PlanTL-GOB-ES/lm-legal-es |
|
|
|
## Citation |
|
If you want to cite this model you can use this: |
|
|
|
```bibtex |
|
@misc{narrativa2022legal-longformer-base-4096-spanish, |
|
title={Legal Spanish LongFormer by Narrativa}, |
|
author={Romero, Manuel}, |
|
publisher={Hugging Face}, |
|
journal={Hugging Face Hub}, |
|
howpublished={\url{https://huggingface.co/Narrativa/legal-longformer-base-4096-spanish}}, |
|
year={2022} |
|
} |
|
``` |