Full-text search
416 results
factckbr
README.md
dataset
4 matches
tags:
task_categories:text-classification, task_ids:fact-checking, annotations_creators:expert-generated, language_creators:found, multilinguality:monolingual, size_categories:1K<n<10K, source_datasets:original, language:pt, license:mit, croissant, region:us
101
102
103
104
105
tive fact check and classification.
The data is collected from the ClaimReview, a structured data schema used by fact check agencies to share their results in search engines, enabling data collect in real time.
The FACTCK.BR dataset contains 1309 claims with its corresponding label.
### Supported Tasks and Leaderboards
cl-nagoya / auto-wiki-qa
README.md
dataset
2 matches
sem_eval_2020_task_11
README.md
dataset
2 matches
tags:
task_categories:text-classification, task_categories:token-classification, annotations_creators:expert-generated, language_creators:found, multilinguality:monolingual, size_categories:n<1K, source_datasets:original, language:en, license:unknown, propaganda-span-identification, propaganda-technique-classification, arxiv:2009.02696, region:us
173
174
175
176
177
Bias/Fact Check,3
and we retrieved articles from these sources. We
deduplicated the articles on the basis of word n-grams matching (Barron-Cede ´ no and Rosso, 2009) and ˜
we discarded faulty entries (e.g., empty entries from blocking websites).
fake-news-UFG / FactChecksbr
README.md
dataset
10 matches
Cofacts / line-msg-fact-check-tw
README.md
dataset
7 matches
tags:
task_categories:text-classification, task_categories:question-answering, size_categories:100K<n<1M, language:zh, license:cc-by-sa-4.0, fact-checking, crowd-sourcing, croissant, region:us
70
71
72
73
74
rced Fact-Check Replies
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1qdE-OMJTi6ZO68J6KdzGdxNdheW4ct6T?usp=sharing)
The Cofacts dataset encompasses instant messages that have been reported by users of the [Cofacts chatbot](https://line.me/R/ti/p/@cofacts) and the replies provided by the [Cofacts crowd-sourced fact-checking community](https://www.facebook.com/groups/cofacts/).
akozlova / RuFacts
README.md
dataset
4 matches
tags:
task_categories:text-classification, size_categories:1K<n<10K, language:ru, license:cc-by-4.0, fact-checking, croissant, region:us
17
18
19
20
21
rnal fact-checking for the Russian language. The dataset contains tagged examples labeled consistent and inconsistent.
For inconsistent examples, ranges containing violations of facts in the source text and the generated text are also collected and presented on the [Kaggle competition page](https://www.kaggle.com/competitions/internal-fact-checking-for-the-russian-language).
Various data sources and approaches for data generation were used to create the training and test datasets for the fact-checking task. We consider the data on the sentence level and small texts. The average length of texts is 198 symbols, the minimum is 10 symbols, and the maximum is 3,402 symbols.
ctu-aic / csfever_v2
README.md
dataset
5 matches
tags:
task_categories:text-classification, task_categories:text-retrieval, task_ids:natural-language-inference, task_ids:document-retrieval, multilinguality:monolingual, size_categories:100K<n<1M, source_datasets:fever, language:cs, license:cc-by-sa-3.0, Fact-checking, arxiv:2201.11115, region:us
23
24
25
26
27
zech fact-checking developed as part of a bachelor thesis at the Artificial Intelligence Center of the Faculty of Electrical Engineering of
the Czech technical university in Prague. The dataset consists of an **original** subset, which is only an iteration of CsFEVER with new data and better processing and
**f1**, **precision**, and **07** subsets filtered using an NLI model and optimized threshold values. The subset **wiki_pages** is a processed Wikipedia dump from
August 2022 with correct revids. This subset should be used to map evidence from datasets to Wikipedia texts. Additionaly preprocessed datasets **original_nli**, **f1_nli**, **precision_nli**, **07_nli**,
for training of NLI models are included.
Gameselo / monolingual-wideNLI
README.md
dataset
3 matches
tags:
task_categories:text-classification, size_categories:100M<n<1B, language:en, natural-language-inference, fact-checking, croissant, region:us
62
63
64
65
66
arly Fact-Checking oriented.
Dev split is oriented to teach the model how to deal well with pure NLI (ANLI is well designed for this task) and test his general knowledge (Fact-Checking skills) with VitaminC, which is known for its robustness for this task.
It contains:
- 14.5k examples for the dev split of which:
kundank / usb
README.md
dataset
1 matches
tags:
task_categories:summarization, size_categories:1K<n<10K, language:en, license:apache-2.0, factchecking, summarization, nli, region:us
16
17
18
19
20
# USB: A Unified Summarization Benchmark Across Tasks and Domains
This benchmark contains labeled datasets for 8 text summarization based tasks given below.
The labeled datasets are created by collecting manual annotations on top of Wikipedia articles from 6 different domains.
copenlu / spanex
README.md
dataset
2 matches
tags:
task_categories:text-classification, size_categories:1K<n<10K, language:en, license:mit, rationale-extraction, reasoning, nli, fact-checking, explainability, croissant, region:us
61
62
63
h as fact-checking (FC), machine reading comprehension (MRC) or natural language inference (NLI). However, existing highlight-based explanations primarily focus on identifying individual important features or interactions only between adjacent tokens or tuples of tokens. Most notably, there is a lack of annotations capturing the human decision-making process with respect to the necessary interactions for informed decision-making in such tasks. To bridge this gap, we introduce SpanEx, a multi-annotator dataset of human span interaction explanations for two NLU tasks: NLI and FC. We then investigate the decision-making processes of multiple fine-tuned large language models in terms of the employed connections between spans in separate parts of the input and compare them to the human reasoning processes. Finally, we present a novel community detection based unsupervised method to extract such interaction explanations. We make the code and the dataset available on [Github](https://github.com/copenlu/spanex). The dataset is also available on [Huggingface datasets](https://huggingface.co/datasets/copenlu/spanex).",
}
```
SEACrowd / x_fact
README.md
dataset
3 matches
tags:
language:ara, language:aze, language:ben, language:deu, language:spa, language:fas, language:fra, language:guj, language:hin, language:ind, language:ita, language:kat, language:mar, language:nor, language:nld, language:pan, language:pol, language:por, language:ron, language:rus, language:sin, language:srp, language:sqi, language:tam, language:tur, license:mit, fact-checking, region:us
45
46
47
48
49
gual Fact Checking}},
author={Gupta, Ashim and Srikumar, Vivek},
booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics",
month = jul,
year = "2021",
ctu-aic / csfever
README.md
dataset
3 matches
tags:
license:cc-by-sa-3.0, croissant, arxiv:1803.05355, arxiv:2201.11115, region:us
4
5
6
7
8
ntal Fact-Checking dataset
Czech dataset for fact verification localized from the data points of [FEVER](https://arxiv.org/abs/1803.05355) using the localization scheme described in the [CTKFacts: Czech Datasets for Fact Verification](https://arxiv.org/abs/2201.11115) paper which is currently being revised for publication in LREV journal.
The version you are looking at was reformatted to *Claim*-*Evidence* string pairs for the specific task of NLI - a more general Document-Retrieval-ready interpretation of our datapoints which can be used for training and evaluating the DR models over the June 2016 wikipedia snapshot can be found in the [data_dr]() folder in the JSON Lines format.
amanrangapur / Fin-Fact
README.md
dataset
16 matches
clu-ling / clupubhealth
README.md
dataset
1 matches
tags:
task_categories:summarization, size_categories:1K<n<10K, size_categories:10K<n<100K, language:en, license:apache-2.0, medical, region:us
16
17
18
19
20
ALTH fact-checking dataset](https://github.com/neemakot/Health-Fact-Checking).
The PUBHEALTH dataset contains claims, explanations, and main texts. The explanations function as vetted summaries of the main texts. The CLUPubhealth dataset repurposes these fields into summaries and texts for use in training Summarization models such as Facebook's BART.
There are currently 4 dataset configs which can be called, each has three splits (see Usage):
justinqbui / covid_fact_checked_google_api
README.md
dataset
6 matches
tags:
croissant, region:us
1
2
3
4
5
ogle Fact Checker API](https://toolbox.google.com/factcheck/explorer), using an automatic web scraper. 10,000 facts were pulled, but for the sake of simplicity, only ones were the ratings were singular words "false" or "true", were kept, which filtered it down to ~3000 fact checks, with about 90% of the facts being false.
annotations_creators:
- expert-generated
language_creators:
eduagarcia / FactNews
README.md
dataset
2 matches
tags:
task_categories:text-classification, annotations_creators:expert-generated, language_creators:found, multilinguality:monolingual, size_categories:1K<n<10K, language:pt, language:por, license:unknown, subjectivity, mediabias, media-bias, croissant, region:us
131
132
133
134
135
the FactCheck dataset on HuggingFace, the original data is made avaliable by Vargas et. al, 2023 and can be downloaded from the link: https://github.com/franciellevargas/FactNews*
*Modifications:*
- *The "original" subset contains the unmodified original CSV*
- *The subsets for the task of "bias_prediction" and "factuality_prediction" were splited between train (70%) AND test (30%) by randomly selecting
lytang / LLM-AggreFact
README.md
dataset
5 matches
tags:
size_categories:10K<n<100K, language:en, license:cc-by-nd-4.0, croissant, arxiv:2404.10774, arxiv:2402.13249, arxiv:2402.00559, arxiv:2311.09000, arxiv:2309.07852, arxiv:2310.12150, region:us
49
50
51
52
53
is a fact verification benchmark from the work ([GitHub Repo](https://github.com/Liyan06/MiniCheck)):
📃 **MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents** ([link](https://arxiv.org/pdf/2404.10774.pdf))
It aggregates 10 of the most up-to-date publicly available datasets on factual consistency evaluation across
ctu-aic / ctkfacts_nli
README.md
dataset
2 matches
tags:
croissant, arxiv:2201.11115, region:us
3
4
5
6
d of fact-checking experiments concluded and described within the CsFEVER and [CTKFacts: Czech Datasets for Fact Verification](https://arxiv.org/abs/2201.11115) paper currently being revised for publication in LREV journal.
## Document retrieval version
Can be found at https://huggingface.co/datasets/ctu-aic/ctkfacts
ctu-aic / ctkfacts
README.md
dataset
2 matches
tags:
license:cc-by-sa-3.0, croissant, arxiv:2201.11115, region:us
7
8
9
10
d of fact-checking experiments concluded and described within the [CsFEVER andCTKFacts: Acquiring Czech data for Fact Verification](https://arxiv.org/abs/2201.11115) paper currently being revised for publication in LREV journal.
## NLI version
Can be found at https://huggingface.co/datasets/ctu-aic/ctkfacts_nli