dpr-ctx_encoder-bert-base-multilingual
Description
Multilingual DPR Model base on bert-base-multilingual-cased. DPR model DPR repo
Data
question pairs for train
๏ผ 644,217question pairs for dev
๏ผ 73,710
*DRCD and MLQA are converted using script from haystack squad_to_dpr.py
Training Script
I use the script from haystack
Usage
from transformers import DPRQuestionEncoder, DPRQuestionEncoderTokenizer
tokenizer = DPRQuestionEncoderTokenizer.from_pretrained('voidful/dpr-question_encoder-bert-base-multilingual')
model = DPRQuestionEncoder.from_pretrained('voidful/dpr-question_encoder-bert-base-multilingual')
input_ids = tokenizer("Hello, is my dog cute ?", return_tensors='pt')["input_ids"]
embeddings = model(input_ids).pooler_output
Follow the tutorial from haystack
:
Better Retrievers via "Dense Passage Retrieval"
from haystack.retriever.dense import DensePassageRetriever
retriever = DensePassageRetriever(document_store=document_store,
query_embedding_model="voidful/dpr-question_encoder-bert-base-multilingual",
passage_embedding_model="voidful/dpr-ctx_encoder-bert-base-multilingual",
max_seq_len_query=64,
max_seq_len_passage=256,
batch_size=16,
use_gpu=True,
embed_title=True,
use_fast_tokenizers=True)
- Downloads last month
- 118
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.