Papers
arxiv:2309.13356

Exploring Large Language Models' Cognitive Moral Development through Defining Issues Test

Published on Sep 23, 2023
· Featured in Daily Papers on Sep 26, 2023

Abstract

The development of large language models has instilled widespread interest among the researchers to understand their inherent reasoning and problem-solving capabilities. Despite good amount of research going on to elucidate these capabilities, there is a still an appreciable gap in understanding moral development and judgments of these models. The current approaches of evaluating the ethical reasoning abilities of these models as a classification task pose numerous inaccuracies because of over-simplification. In this study, we built a psychological connection by bridging two disparate fields-human psychology and AI. We proposed an effective evaluation framework which can help to delineate the model's ethical reasoning ability in terms of moral consistency and Kohlberg's moral development stages with the help of Psychometric Assessment Tool-Defining Issues Test.

Community

Ok, this is a fun one because it touches on a big question, which is "can AI be moral?" My highlights:

Researchers from Microsoft have proposed using a psychological assessment tool called the Defining Issues Test (DIT) to evaluate the moral reasoning capabilities of large language models (LLMs) like GPT-3, ChatGPT, etc.

The DIT presents moral dilemmas and has subjects rate and rank the importance of various ethical considerations related to the dilemma. It allows quantifying the sophistication of moral thinking through a P-score.

In this new paper, the researchers tested prominent LLMs with adapted DIT prompts containing AI-relevant moral scenarios.

Key findings:

  • Large models like GPT-3 failed to comprehend prompts and scored near random baseline in moral reasoning.
  • ChatGPT, Text-davinci-003 and GPT-4 showed coherent moral reasoning with above-random P-scores.
  • Surprisingly, the smaller 70B LlamaChat model outscored larger models in its P-score, demonstrating advanced ethics understanding is possible without massive parameters.
  • The models operated mostly at intermediate conventional levels as per Kohlberg's moral development theory. No model exhibited highly mature moral reasoning.

I think this is an interesting framework to evaluate and improve LLMs' moral intelligence before deploying them into sensitive real-world environments - to the extent that a model can be said to possess moral intelligence (or, seem to possess it?).

Here's a link to my full summary with a lot more background on Kohlberg's model (had to read up on it since I didn't study psych).

I'm curious ... did GPT write this paper (or at least the intro)? There is so much verbatim text repetition in the paper, almost to a comical degree. This is directly from the paper:

"For example, a seemingly controversial statement by the Dalai Lama, "suck my tongue," is a Tibetan expression but stirred global debate"

"For example, a statement made by the Dalai Lama, which may appear unconventional to some, is actually a Tibetan expression that generated global attention"

" Similarly, the swastika, associated with hatred in the West, symbolizes prosperity in Hinduism. "

"Likewise, the swastika, commonly associated with negativity in the West, holds a symbol of prosperity in Hinduism."

"The concept of "value pluralism" underscores the absence of a universal set of values applicable in all situations, further complicated by extreme variations in people’s preferences" (repeated verbatim X2)

"To address these challenges and risks, several works have proposed ethical frameworks, principles, guidelines, methods, and tools for designing, developing, evaluating, and deploying LLMs in a responsible and ethical manner." (repeated verbatim X2)

Also from the paper: "Prisoner dilemma - Should a man who escaped from prison but has since been leading an exemplary life be reported to authorities?" This is not the prisoner's dilemma.

The limitation you found with GPT3.5 and davinci-002 is an interesting one: given a list of options, it "chose" ones at particular indices regardless of how you permuted the list of options.

For GPT4 and other models, did the answer to question 3 correspond to the same actual statement regardless of permutation? Why did you limit evaluation to "8 specific orders" of the statements, as opposed to fully randomly shuffling? How did you pick the orders to use?

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2309.13356 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2309.13356 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2309.13356 in a Space README.md to link it from this page.

Collections including this paper 7