RESEARCH CONTEXT

5

This thesis started almost concurrently with the rise of the global COVID19 pandemic, making it hard to foster collaborations in the early stages. At
the start of the PhD, DU methodology was fairly established, with OCR and
Transformer-based pipelines such as BERT [94] and LayoutLM [502], which
is why we first prioritized the more fundamental challenge of decision-making
under uncertainty (Part I); which was followed by a step back, closer to applied
DU research (Part II).
The research community’s understanding of ‘reliability’ has also evolved over
time. When starting the work of Chapter 3, the notion of reliability was mostly
associated with uncertainty quantification and calibration. However, calibration
is not a panacea, and only fairly recently, Jaeger et al. [193] proposed a more
general framework encapsulating reliability and robustness. They promote the
more concrete and useful notion of failure prediction, which still involves
confidence/uncertainty estimation yet with an explicit definition of the failure
source which one wants to detect or guard against, e.g., in-domain test errors,
changing input feature distributions, novel class shifts, etc. Since I share a
similar view of the problem, I have focused following works on the more general
notion of failure prediction, which is also more in line with the business context
of IA.
Whereas we originally intended to work on multi-task learning of DU subtasks,
the rise of general-purpose LLMs offering a natural language interface to
documents rather than discriminative modeling (e.g., ChatGPT [52, 344]),
prompted us toward evaluating this promising technology in the context of
DU. More importantly, we observed the lack of sufficiently complex datasets
and benchmarks in DU that would allow us to tackle larger, more fundamental
questions such as ’Do text-only LLMs suffice for most low-level DU subtasks?’
(subsequently tackled in Chapter 5), which is why we shifted our focus to the
more applied research questions of benchmarking and evaluation (Part II).
Finally, the business context has also evolved over time. Originally, IDP was
practiced by legacy OCR companies; specialized vendors, offering a range of
solutions for specific document types (e.g., invoices, contracts, tax forms, etc.);
or cloud service providers, offering IDP as part of a larger suite of services
(e.g., AWS Textract, Azure Form Recognizer, etc.). However, the rise of both
open-source LLM development and powerful, though closed-source models has
lowered the barrier to entry for any new entrants or incumbents. This has led
to a commoditization of IDP, with the quality of the LLMs and the ease of
integration with existing business processes becoming key differentiators.