Varun-Chowdary/hallucination-detector

Introduction

Dataset

The dataset used to train our model is paws. https://huggingface.co/datasets/paws

Dataset Summary

PAWS: Paraphrase Adversaries from Word Scrambling

This dataset contains 108,463 human-labeled and 656k noisily labeled pairs that feature the importance of modeling structure, context, and word order information for the problem of paraphrase identification. The dataset has two subsets, one based on Wikipedia and the other one based on the Quora Question Pairs (QQP) dataset.

Below are two examples from the dataset:

          Sentence 1	               Sentence 2	               Label
        Although interchangeable, the body pieces on the 2 cars are not similar.	Although similar, the body parts are not interchangeable on the 2 cars.	 0
        Katz was born in Sweden in 1947 and moved to New York City at the age of 1.	Katz was born in 1947 in Sweden and moved to New York at the age of one. 1

Column Name Data id A unique id for each pair sentence1 The first sentence sentence2 The second sentence (noisy_)label (Noisy) label for each pair Each label has two possible values: 0 indicates the pair has a different meaning, while 1 indicates the pair is a paraphrase.

Varun-Chowdary
/

hallucination-detector

Introduction

Dataset

Output

Model

Training

Dataset used to train Varun-Chowdary/hallucination-detector