Edit model card
Alpacoom logo

BART Legal Spanish ⚖️

BART Legal Spanish (base) is a BART-like model trained on A collection of corpora of Spanish legal domain.

BART is a transformer encoder-decoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder. BART is pre-trained by (1) corrupting text with an arbitrary noising function and (2) learning a model to reconstruct the original text.

This model is particularly effective when fine-tuned for text generation tasks (e.g., summarization, translation) but also works well for comprehension tasks (e.g., text classification, question answering).

Training details

  • Dataset: Spanish-legal-corpora - 90% for training / 10% for validation.
  • Training script: see here

Evaluation metrics

Metric # Value
Accuracy 0.86
Loss 0.68

Benchmarks 🔨

WIP 🚧

How to use with transformers

from transformers import BartForConditionalGeneration, BartTokenizer

model_id = "mrm8488/bart-legal-base-es"

model = BartForConditionalGeneration.from_pretrained(model_id, forced_bos_token_id=0)
tokenizer = BartTokenizer.from_pretrained(model_id)

def fill_mask_span(text):
  batch = tokenizer(text, return_tensors="pt")
  generated_ids = model.generate(batch["input_ids"])
  print(tokenizer.batch_decode(generated_ids, skip_special_tokens=True))

text = "Los españoles son <mask> ante la ley."
fill_mask_span(text)
# Output: ['Los españoles son iguales ante la ley.1.ª y 2.ª ante la']

text = "Los proyectos de reforma Constitucional deberán <mask> por una mayoría de tres quintos de cada una de las Cámaras."
fill_mask_span(text)
# Output: ['Los proyectos de reforma Constitucional deberán ser aprobados por una mayoría de tres quintos de cada']

Acknowledgments

Citation

If you want to cite this model, you can use this:

@misc {manuel_romero_2023,
    author       = { {Manuel Romero} },
    title        = { bart-legal-base-es (Revision c33ed22) },
    year         = 2023,
    url          = { https://huggingface.co/mrm8488/bart-legal-base-es },
    doi          = { 10.57967/hf/0472 },
    publisher    = { Hugging Face }
}

Created by Manuel Romero/@mrm8488

Made with in Spain

Downloads last month
284
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using mrm8488/bart-legal-base-es 1

Collections including mrm8488/bart-legal-base-es