Narrativa's picture
Update README.md
9a51c41
|
raw
history blame
No virus
2.2 kB
metadata
language:
  - es
license: mit
widget:
  - text: >-
      La Constitución española de 1978 es la <mask> suprema del ordenamiento
      jurídico español.
tags:
  - Long documents
  - longformer
  - robertalex
  - spanish
  - legal

Legal ⚖️ longformer-base-4096-spanish

Longformer is a Transformer model for long documents.

legal-longformer-base-4096 is a BERT-like model started from the RoBERTa checkpoint (RoBERTalex in this case) and pre-trained for MLM on long documents from the Spanish Legal Domain Corpora. It supports sequences of length up to 4,096!

Longformer uses a combination of a sliding window (local) attention and global attention. Global attention is user-configured based on the task to allow the model to learn task-specific representations.

This model was made following the research done by Iz Beltagy and Matthew E. Peters and Arman Cohan.

Model (base checkpoint)

RoBERTalex There are few models trained for the Spanish language. Some of the models have been trained with a low resource, unclean corpora. The ones derived from the Spanish National Plan for Language Technologies are proficient in solving several tasks and have been trained using large-scale clean corpora. However, the Spanish Legal domain language could be thought of as an independent language on its own. We, therefore, created a Spanish Legal model from scratch trained exclusively on legal corpora.

Dataset

Spanish Legal Domain Corpora A collection of corpora of Spanish legal domain.

More legal domain resources: https://github.com/PlanTL-GOB-ES/lm-legal-es

Citation

If you want to cite this model you can use this:

@misc{narrativa2022legal-longformer-base-4096-spanish,
  title={Legal Spanish LongFormer by Narrativa},
  author={Romero, Manuel},
  publisher={Hugging Face},
  journal={Hugging Face Hub},
  howpublished={\url{https://huggingface.co/Narrativa/legal-longformer-base-4096-spanish}},
  year={2022}
}