The RetriBERT model was proposed in the blog post Explain Anything Like I’m Five: A Model for Open Domain Long Form Question Answering. RetriBERT is a small model that uses either a single or pair of BERT encoders with lower-dimension projection for dense semantic indexing of text.
( vocab_size = 30522 hidden_size = 768 num_hidden_layers = 8 num_attention_heads = 12 intermediate_size = 3072 hidden_act = 'gelu' hidden_dropout_prob = 0.1 attention_probs_dropout_prob = 0.1 max_position_embeddings = 512 type_vocab_size = 2 initializer_range = 0.02 layer_norm_eps = 1e-12 share_encoders = True projection_dim = 128 pad_token_id = 0 **kwargs )
int, optional, defaults to 30522) — Vocabulary size of the RetriBERT model. Defines the number of different tokens that can be represented by the
inputs_idspassed when calling RetriBertModel
- hidden_size (
int, optional, defaults to 768) — Dimensionality of the encoder layers and the pooler layer.
- num_hidden_layers (
int, optional, defaults to 12) — Number of hidden layers in the Transformer encoder.
int, optional, defaults to 12) — Number of attention heads for each attention layer in the Transformer encoder.
int, optional, defaults to 3072) — Dimensionality of the “intermediate” (often named feed-forward) layer in the Transformer encoder.
- hidden_act (
function, optional, defaults to
"gelu") — The non-linear activation function (function or string) in the encoder and pooler. If string,
- hidden_dropout_prob (
float, optional, defaults to 0.1) — The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
float, optional, defaults to 0.1) — The dropout ratio for the attention probabilities.
int, optional, defaults to 512) — The maximum sequence length that this model might ever be used with. Typically set this to something large just in case (e.g., 512 or 1024 or 2048).
int, optional, defaults to 2) — The vocabulary size of the token_type_ids passed into BertModel.
float, optional, defaults to 0.02) — The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
float, optional, defaults to 1e-12) — The epsilon used by the layer normalization layers.
bool, optional, defaults to
True) — Whether or not to use the same Bert-type encoder for the queries and document
int, optional, defaults to 128) — Final dimension of the query and document representation after projection
This is the configuration class to store the configuration of a RetriBertModel. It is used to instantiate a RetriBertModel model according to the specified arguments, defining the model architecture.
( vocab_file do_lower_case = True do_basic_tokenize = True never_split = None unk_token = '[UNK]' sep_token = '[SEP]' pad_token = '[PAD]' cls_token = '[CLS]' mask_token = '[MASK]' tokenize_chinese_chars = True strip_accents = None **kwargs )
Constructs a RetriBERT tokenizer.
RetroBertTokenizer is identical to BertTokenizer and runs end-to-end
tokenization: punctuation splitting and wordpiece.
Refer to superclass BertTokenizer for usage examples and documentation concerning parameters.
( vocab_file = None tokenizer_file = None do_lower_case = True unk_token = '[UNK]' sep_token = '[SEP]' pad_token = '[PAD]' cls_token = '[CLS]' mask_token = '[MASK]' tokenize_chinese_chars = True strip_accents = None **kwargs )
Construct a “fast” RetriBERT tokenizer (backed by HuggingFace’s tokenizers library).
Refer to superclass BertTokenizerFast for usage examples and documentation concerning parameters.
( config )
Bert Based model to embed queries or document for document retrieval.
This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)
This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.
( input_ids_query attention_mask_query input_ids_doc attention_mask_doc checkpoint_batch_size = -1 ) → `torch.FloatTensor“
(batch_size, sequence_length)) — Indices of input sequence tokens in the vocabulary for the queries in a batch.
(batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
(batch_size, sequence_length)) — Indices of input sequence tokens in the vocabulary for the documents in a batch.
(batch_size, sequence_length), optional) — Mask to avoid performing attention on documents padding token indices.
int, optional, defaults to “
-1) — If greater than 0, uses gradient checkpointing to only compute sequence representation on
checkpoint_batch_sizeexamples at a time on the GPU. All query representations are still compared to all document representations in the batch.
The bidirectional cross-entropy loss obtained while trying to match each query to its corresponding document and each document to its corresponding query in the batch