arxiv:2310.08678

Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams

Published on Oct 12, 2023

· Submitted by

akhaliq on Oct 16, 2023

Upvote

Authors:

Ethan Callanan ,

Antony Papadimitriou ,

Mathieu Sibue ,

Zhiqiang Ma ,

Xiaomo Liu ,

Abstract

Large Language Models (LLMs) have demonstrated remarkable performance on a wide range of Natural Language Processing (NLP) tasks, often matching or even beating state-of-the-art task-specific models. This study aims at assessing the financial reasoning capabilities of LLMs. We leverage mock exam questions of the Chartered Financial Analyst (CFA) Program to conduct a comprehensive evaluation of ChatGPT and GPT-4 in financial analysis, considering Zero-Shot (ZS), Chain-of-Thought (CoT), and Few-Shot (FS) scenarios. We present an in-depth analysis of the models' performance and limitations, and estimate whether they would have a chance at passing the CFA exams. Finally, we outline insights into potential strategies and improvements to enhance the applicability of LLMs in finance. In this perspective, we hope this work paves the way for future studies to continue enhancing LLMs for financial reasoning through rigorous evaluation.

View arXiv page View PDF Add to collection

Community

mikelabs

Oct 17, 2023

•

edited Oct 17, 2023

My summary for the day:

Researchers evaluated ChatGPT and GPT-4 on mock CFA exam questions to see if they could pass the real tests. The CFA exams rigorously test practical finance knowledge and are known for being quite difficult.

They tested the models in zero-shot, few-shot, and chain-of-thought prompting settings on mock Level I and Level II exams.

The key findings:

GPT-4 consistently beat ChatGPT, but both models struggled way more on the more advanced Level II questions.
Few-shot prompting helped ChatGPT slightly
Chain-of-thought prompting exposed knowledge gaps rather than helping much.
Based on estimated passing scores, only GPT-4 with few-shot prompting could potentially pass the exams.

The models definitely aren't ready to become charterholders yet. Their difficulties with tricky questions and core finance concepts highlight the need for more specialized training and knowledge.

But GPT-4 did better overall, and few-shot prompting shows their ability to improve. So with targeted practice on finance formulas and reasoning, we could maybe see step-wise improvements.

TLDR: Tested on mock CFA exams, ChatGPT and GPT-4 struggle with the complex finance concepts and fail. With few-shot prompting, GPT-4 performance reaches the boundary between passing and failing but doesn't clearly pass.

Full summary here.