BigBird base model

BigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. Moreover, BigBird comes along with a theoretical understanding of the capabilities of a complete transformer that the sparse model can handle.

It is a pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in this paper and first released in this repository.

Model description

BigBird relies on block sparse attention instead of normal attention (i.e. BERT's attention) and can handle sequences up to a length of 4096 at a much lower compute cost compared to BERT. It has achieved SOTA on various tasks involving very long sequences such as long documents summarization, question-answering with long contexts.

Original implementation

Follow this link to see the original implementation.

How to use

Download the model by cloning the repository via git clone https://huggingface.co/OWG/bigbird-roberta-base.

Then you can use the model with the following code:

from onnxruntime import InferenceSession, SessionOptions, GraphOptimizationLevel
from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained("google/bigbird-roberta-base")

options = SessionOptions()
options.graph_optimization_level = GraphOptimizationLevel.ORT_ENABLE_ALL

session = InferenceSession("path/to/model.onnx", sess_options=options)
session.disable_fallback()

text = "Replace me by any text you want to encode."

input_ids = tokenizer(text, return_tensors="pt", return_attention_mask=True)
inputs = {k: v.cpu().detach().numpy() for k, v in input_ids.items()}

outputs_name = session.get_outputs()[0].name
outputs = session.run(output_names=[outputs_name], input_feed=inputs)

OWG
/

bigbird-roberta-base

BigBird base model

Model description

Original implementation

How to use

Datasets used to train OWG/bigbird-roberta-base