SentenceTransformers is a set of models and frameworks that enable training and generating sentence embeddings from given data. The generated sentence embeddings can be utilized for Clustering, Semantic Search and other tasks. We used two separate pretrained mpnet-base models and trained them using contrastive learning objective. Question and answer pairs from StackExchange and other datasets were used as training data to make the model robust to Question / Answer embedding similarity. We developed this model during the Community week using JAX/Flax for NLP & CV, organized by Hugging Face. We developed this model as part of the project: Train the Best Sentence Embedding Model Ever with 1B Training Pairs. We benefited from efficient hardware infrastructure to run the project: 7 TPUs v3-8, as well as assistance from Google’s Flax, JAX, and Cloud team members about efficient deep learning frameworks.
This model set is intended to be used as a sentence encoder for a search engine. Given an input sentence, it ouptuts a vector which captures the sentence semantic information. The sentence vector may be used for semantic-search, clustering or sentence similarity tasks. Two models should be used on conjunction for Semantic Search purposes.
- multi-QA_v1-mpnet-asymmetric-Q - Model to encode Questions
- multi-QA_v1-mpnet-asymmetric-Q - Model to encode AnswersSentenceTransformers library:
from sentence_transformers import SentenceTransformer model_Q = SentenceTransformer('flax-sentence-embeddings/multi-QA_v1-mpnet-asymmetric-Q') model_A = SentenceTransformer('flax-sentence-embeddings/multi-QA_v1-mpnet-asymmetric-A') question = "Replace me by any question you'd like." question_embbedding = model_Q.encode(text) answer = "Replace me by any answer you'd like." answer_embbedding = model_A.encode(text) answer_likeliness = cosine_similarity(question_embedding, answer_embedding)
Mpnet-base. Please refer to the model card for more detailed information about the pre-training procedure.
Dataset Paper Number of training tuples Stack Exchange QA - Title & Answer - 4,750,619 Stack Exchange - 364,001 TriviaqQA - 73,346 SQuAD2.0 paper 87,599 Quora Question Pairs - 103,663 Eli5 paper 325,475 PAQ paper 64,371,441 WikiAnswers paper 77,427,422 MS MARCO paper 9,144,553 GOOAQ: Open Question Answering with Diverse Answer Types paper 3,012,496 Yahoo Answers Question/Answer paper 681,164 SearchQA - 582,261 Natural Questions (NQ) paper 100,231
- Downloads last month