Introduction
Dataset
The dataset used to train our model is paws. https://huggingface.co/datasets/paws
Dataset Summary
PAWS: Paraphrase Adversaries from Word Scrambling
This dataset contains 108,463 human-labeled and 656k noisily labeled pairs that feature the importance of modeling structure, context, and word order information for the problem of paraphrase identification. The dataset has two subsets, one based on Wikipedia and the other one based on the Quora Question Pairs (QQP) dataset.
Below are two examples from the dataset:
Sentence 1 Sentence 2 Label
Although interchangeable, the body pieces on the 2 cars are not similar. Although similar, the body parts are not interchangeable on the 2 cars. 0
Katz was born in Sweden in 1947 and moved to New York City at the age of 1. Katz was born in 1947 in Sweden and moved to New York at the age of one. 1
Column Name Data id A unique id for each pair sentence1 The first sentence sentence2 The second sentence (noisy_)label (Noisy) label for each pair Each label has two possible values: 0 indicates the pair has a different meaning, while 1 indicates the pair is a paraphrase.
Output
Model
Training
- Downloads last month
- 28