Polish Question Answering

piotr-rybak 's Collections

Extract anything datasets

updated Oct 17

Collection of models and datasets for Polish Question Answering.

Upvote

ipipan/silver-retriever-base-v1.1

Sentence Similarity • Updated Oct 26 • 7.47k • 10

Note SilverRetriever is a state-of-the-art neural passage retriever trained on the PolQA and MAUPQA datasets.
ipipan/silver-retriever-base-v1

Sentence Similarity • Updated May 24 • 227 • 10

Note SilverRetriever is a state-of-the-art neural passage retriever trained on the PolQA and MAUPQA datasets.
ipipan/polqa

Updated May 24 • 412 • 8

Note PolQA is the first Polish dataset for open-domain question answering. It consists of 7,000 questions, 87,525 manually labeled evidence passages, and a corpus of over 7 million candidate passages. The dataset can be used to train both a passage retriever and an abstractive reader.
ipipan/maupqa

Updated May 24 • 215 • 4

Note MAUPQA is a collection of 14 datasets for Polish document retrieval. Most of the datasets are either machine-generated or machine-translated from English. Across all datasets, it consists of over 1M questions, 1M positive, and 7M hard-negative question-passage pairs.
clarin-pl/poquad

Viewer • Updated Jul 4, 2023 • 52k • 326 • 4

Note PoQuAD is a Polish equivalent of the SQuAD. It consists of more than 70,000 question-passage pairs, as well as extractive and abstractive answers.
allegro/polish-question-passage-pairs

Viewer • Updated Sep 23, 2021 • 10.4k • 87 • 4

Note Over 10,000 manually annotated question-passage pairs. While the questions are taken from the PolQA dataset, the passages are often unique. In particular, the dataset consists mostly of hard negatives (8k pairs).
allegro/klej-dyk

Viewer • Updated Oct 26, 2022 • 5.18k • 372 • 1

Note The "Czy wiesz?" (eng. "Did you know?") dataset consists of almost 5k question-passage pairs obtained from "Czy wiesz..." section of Polish Wikipedia. Each question is written by a Wikipedia collaborator and is answered with a link to a relevant Wikipedia article.
piotr-rybak/allegro-faq

Viewer • Updated Aug 19, 2023 • 1.88k • 48

Note Allegro FAQ is one of the PolEval 2022 test sets. It consists of 900 frequently asked questions and 921 help articles regarding the large Polish e-commerce platform - Allegro.com. Each question-passage pair is manually checked and edited where necessary.
piotr-rybak/legal-questions

Updated Dec 14, 2023 • 66

Note Legal Questions is one of the PolEval 2022 test sets. It consists of 718 questions and approximately 26,000 passages extracted from over 1,000 acts of law.
Running

25

📈

Polish Information Retrieval Benchmark (PIRB)

Note The benchmark for Polish Information Retrieval, consisting of 41 datasets.
sdadas/mmlw-retrieval-roberta-base

Sentence Similarity • Updated Oct 29 • 349 • 1

Note Neural text encoder for Polish, see more models here: https://huggingface.co/sdadas?search_models=mmlw
sdadas/gpt-exams

Viewer • Updated Sep 9, 2023 • 8.13k • 44 • 3

Note The dataset contains 8131 multi-domain question-answer pairs. It was created semi-automatically using the gpt-3.5-turbo-0613 model available in the OpenAI API.
apohllo/plt5-base-poquad

Text2Text Generation • Updated Nov 28, 2023 • 7 • 1

Note This is a plT5-base model trained on the PoQuAD dataset. This model was trained as a result of single experiment run, so don't expect state-of-the-art results.
sdadas/polish-reranker-large-ranknet

Text Classification • Updated Apr 23 • 413 • 2

Note Cross-encoder for Polish, see more models here: https://huggingface.co/sdadas?search_models=reranker
amu-cai/PES-2018-2022

Viewer • Updated Jul 3 • 35.6k • 44 • 3

Note This dataset is 297 Polish Board Certification Examinations from years 2018-2022 in a form of multiple choice questions.
OrlikB/KartonBERT-USE-base-v1

Sentence Similarity • Updated Oct 1 • 2.01k • 7

Note This universal sentence encoder model aims to be proficient in tasks involving sentence / document similarity.
sdadas/polish-reranker-roberta-v2

Text Classification • Updated Oct 29 • 634 • 2

Note This is an improved version of reranker based on sdadas/polish-roberta-large-v2 trained with RankNet loss on a large dataset of text pairs.
sdadas/stella-pl-retrieval

Sentence Similarity • Updated Oct 2 • 331 • 8

Note This is a text encoder based on stella_en_1.5B_v5 and further fine-tuned for Polish information retrieval tasks.

Upvote

Polish Question Answering

Polish Information Retrieval Benchmark (PIRB)