arxiv:2602.17377

Corpus Prevalence of Multiple-Choice Question Options

Published on Jun 22

Authors:

Abstract

Large language models exhibit corpus frequency biases in multiple choice question creation, where correct answers consistently appear more frequently than incorrect options, potentially compromising assessment validity.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

In recent years, corpus-driven AI methods, such as Large Language Models (LLMs), have seen widespread use in education. While on the surface their abilities look promising for tasks ranging from generating assessment materials to simulating student performance, we should be aware of the subtle nuances of their frequentist nature that might be affecting their behaviour. In this work, we focus on the aspect of corpus frequency in the context of creating high-quality Multiple Choice Questions (MCQs), specifically asking: What if corpus prevalence were enough to identify the correct answer to an MCQ? We propose a computational method of assessing corpus prevalence of MCQ options in large text corpora leveraging textual embeddings using both expert- and machine-generated MCQ sets. The key finding, across three large question sets, is that correct answers, independently of the question stem, are significantly more available than incorrect options. Specifically, using Wikipedia as the retrieval corpus, we find that always selecting the most prevalent option leads to scores up to 9.0% above the random-guess baseline. We also find that MCQ distractors generated by LLMs often show similar patterns of prevalence compared to expert-created options, despite the LLMs' frequentist nature and their training on large collections of textual data. Moreover, we find that corpus prevalence does not necessarily correlate with how recognisable terms are to humans. This highlights the need to better understand how corpora are used in AI-driven methods for education, whether applied directly or indirectly via LLMs.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2602.17377

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.17377 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.17377 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.17377 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.