This model is a fine-tuned version of distilbert-base-uncased originally released in "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter" and trained on the Quora Question Pairs dataset; part of the General Language Understanding Evaluation (GLUE) benchmark. This model was fine-tuned by the team at AssemblyAI and is released with the corresponding blog post.
To download and utilize this model for duplicate question detection please execute the following:
import torch.nn.functional as F from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("assemblyai/distilbert-base-uncased-qqp") model = AutoModelForSequenceClassification.from_pretrained("assemblyai/distilbert-base-uncased-qqp") tokenized_segments = tokenizer(["How many hours does it take to fly from California to New York?"], ["What is the flight time from New York to Seattle?"], return_tensors="pt", padding=True, truncation=True) tokenized_segments_input_ids, tokenized_segments_attention_mask = tokenized_segments.input_ids, tokenized_segments.attention_mask model_predictions = F.softmax(model(input_ids=tokenized_segments_input_ids, attention_mask=tokenized_segments_attention_mask)['logits'], dim=1) print("Duplicate probability: "+str(model_predictions.item()*100)+"%") print("Non-duplicate probability: "+str(model_predictions.item()*100)+"%")
For questions about how to use this model feel free to contact the team at AssemblyAI!
- Downloads last month