Spaces:

cjber
/

planning-ai

Sleeping

App Files Files Community

planning-ai / reports /eval.qmd

cjber

feat: add eval doc

b9e65e6 2 months ago

raw

history blame contribute delete

1.9 kB

	---
	title: Evaluation of Summaries
	author: Cillian Berragan
	format: pdf
	fontfamily: libertinus
	monofont: 'JetBrains Mono'
	monofontoptions:
	- Scale=0.75
	---

	This document compares summaries written by Cambridge, to the summaries generated automatically by our model.

	# Overview

	For all representations, the original summary was compared with the generated summary provided by the LLM. A separate LLM call was used to determine which of these two summaries was preferred, based on set criteria:

	> A good summary should:
	> 1. Be accurate – It should not include information that is not present in the source document.
	> 2. Be comprehensive – It should reflect all key points in the source document without omitting important details.
	> 3. Be well-grounded – It should be based entirely on the source document without adding interpretations, opinions, or external information.

	This model was given the option to return 4 different scores; 0 meaning neither summaries are suitable, 1 meaning the original summary is preferred, 2 meaning the LLM-generated summary is preferred, or 3 meaning both summaries are suitable.

	@tbl-eval gives the results of this processing. We can see that the majority of the preferred summaries are those generated by the LLM (2). There are however 8 cases where the original summary is considered better, and 17 where both summaries are considered suitable.

	```{python}
	#\| label: tbl-eval
	#\| caption: Comparison between original summary and LLM-generated summary
	#\| echo: false
	#\| output: asis

	import polars as pl

	summaries = pl.read_parquet("./data/out/eval.parquet")
	print(
	summaries["score"]
	.value_counts()
	.sort("score")
	.transpose(include_header=True)
	.rename({"column_0": "Original", "column_1": "LLM-generated", "column_2": "Both"})
	.drop("column").tail(1)
	.to_pandas().to_markdown(index=False)
	)
	```