NarbioBART logo

🦠 NarbioBART πŸ₯

NarbioBART (base) is a BART-like model trained on Spanish Biomedical Crawled Corpus.

BART is a transformer encoder-decoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder. BART is pre-trained by (1) corrupting text with an arbitrary noising function and (2) learning a model to reconstruct the original text.

This model is particularly effective when fine-tuned for text generation tasks (e.g., summarization, translation) but also works well for comprehension tasks (e.g., text classification, question answering).

Training details

  • Dataset: Spanish Biomedical Crawled Corpus - 90% for training / 10% for validation.
  • Training script: see here

Evaluation metrics

Metric # Value
Accuracy 0.802
Loss 1.04

Benchmarks πŸ”¨

WIP 🚧

How to use with transformers

from transformers import BartForConditionalGeneration, BartTokenizer

model_id = "Narrativa/NarbioBART"

model = BartForConditionalGeneration.from_pretrained(model_id, forced_bos_token_id=0)
tokenizer = BartTokenizer.from_pretrained(model_id)

def fill_mask_span(text):
  batch = tokenizer(text, return_tensors="pt")
  generated_ids = model.generate(batch["input_ids"])
  print(tokenizer.batch_decode(generated_ids, skip_special_tokens=True))

text = "your text with a <mask> token."
fill_mask_span(text)

Citation

@misc {narrativa_2023,
    author       = { {Narrativa} },
    title        = { NarbioBART (Revision c9a4e07) },
    year         = 2023,
    url          = { https://huggingface.co/Narrativa/NarbioBART },
    doi          = { 10.57967/hf/0500 },
    publisher    = { Hugging Face }
}
Downloads last month
29
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.