metadata

datasets:
  - quora
language: en
license: mit
pipeline_tag: text-classification
tags:
  - roberta
  - text-classification

Cross-Encoder for Quora Duplicate Questions Detection

This model was trained using SentenceTransformers Cross-Encoder class.

This model uses roberta-large.

Training Data

This model was trained on the Quora Duplicate Questions dataset.

The model will predict a score between 0 and 1: How likely the two given questions are duplicates.

Note: The model is not suitable to estimate the similarity of questions, e.g. the two questions "How to learn Java" and "How to learn Python" will result in a rahter low score, as these are not duplicates.

Usage and Performance

The trained model can be used like this:

from sentence_transformers import CrossEncoder

model = CrossEncoder('model_name')
scores = model.predict([('Question 1', 'Question 2'), ('Question 3', 'Question 4')])

print(scores)