Papers
arxiv:2306.07075

Large Language Models as Tax Attorneys: A Case Study in Legal Capabilities Emergence

Published on Jun 12, 2023
· Featured in Daily Papers on Jun 13, 2023
Authors:
,
,
,
,
,
,
,

Abstract

Better understanding of Large Language Models' (LLMs) legal analysis abilities can contribute to improving the efficiency of legal services, governing artificial intelligence, and leveraging LLMs to identify inconsistencies in law. This paper explores LLM capabilities in applying tax law. We choose this area of law because it has a structure that allows us to set up automated validation pipelines across thousands of examples, requires logical reasoning and maths skills, and enables us to test LLM capabilities in a manner relevant to real-world economic lives of citizens and companies. Our experiments demonstrate emerging legal understanding capabilities, with improved performance in each subsequent OpenAI model release. We experiment with retrieving and utilising the relevant legal authority to assess the impact of providing additional legal context to LLMs. Few-shot prompting, presenting examples of question-answer pairs, is also found to significantly enhance the performance of the most advanced model, GPT-4. The findings indicate that LLMs, particularly when combined with prompting enhancements and the correct legal texts, can perform at high levels of accuracy but not yet at expert tax lawyer levels. As LLMs continue to advance, their ability to reason about law autonomously could have significant implications for the legal profession and AI governance.

Community

I haven't been following the research for long, but I'm confused by researchers pointing to OpenAI models when talking about "emergent capabilities." If those models are fine-tuned on task-specific instruction/response data they haven't published information about, including user-submitted data and "massive amounts of" synthetic data, how can researchers make inferences about emergent capabilities in larger models when counting those models among the examples?

For example, if GPT-4 immediately shows step-by-step thinking when asked to do a math problem, I would say that's clearly not a behavior that emerged due to parameter count, but rather carefully curated instruction/response examples, with the specific aim of improving performance on math problems by "thinking step by step"

I'm assuming you're evaluating the statement "emergent capabilities" from the point of view of whether these systems are moving toward AGI and from that point of view, I agree with you. They don't have emergent capabilities. They pick up high-dimensional "edges" and relationships of what they're trained on and then they remix those capabilities to accomplish novel tasks.

So, was GPT-4 specifically trained to perform Tax Law tasks? I seriously doubt it. Success on these tasks is, therefore, a certain type of "emergent capability". It's not like the system is demonstrating new fundamental capabilities that were completely absent from all training and fine-tuning.

talk about labor force

If you doubt GPT-4 was trained on tax law tasks, you have been really misled. They fine-tune on everthing they can generate and further fine-tune on every combination and permutation thereof

You can see tax-related tasks in the evals, but that's just scratching the surface

If you doubt GPT-4 was trained on tax law tasks, you have been really misled. They fine-tune on everthing they can generate and further fine-tune on every combination and permutation thereof

You can see tax-related tasks in the evals, but that's just scratching the surface

@brandonglockaby Yeah, I will admit I overstated my beliefs here. I don't think that tax law tasks were excluded. I think I had some "focused" definition of "specifically trained to perform" that I'm not even sure is particularly valuable.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2306.07075 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2306.07075 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2306.07075 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.