8 INTRODUCTION CDIP [252]), or are restricted to a single domain or a small set of document types. We posit that larger, fundamental questions in DU remain unanswered due to a lack of sufficiently complex datasets and benchmarks with a rich methodology covering evaluation beyond the independent and identically distributed (i.i.d.) test set setting. While there exist performant models for DU subtasks such as OCR, DC, KIE, etc., it is unclear how to move from these specific analysis and recognition tasks to models that can reason and understand documents. A truly end-to-end DU solution must handle the complexity and variety of realworld documents and subtasks, which could be expressed as natural language questions. Moreover, it should be able to generalize to any question on any document and reason over multiple pages and modalities. The following research questions are addressed in Chapters 4 and 5: RQ 6. How can we iteratively close the gap between research and practice in DU? RQ 7. How can we design a resource that comprehensively challenges the state-ofthe-art? RQ 8. Which DU aspects are most challenging for current state-of-the-art LLMs? How can these be incorporated in a benchmark to allow proper measurements of future improvements? However, moving the goalpost beyond a single-page context inevitably requires us to reconsider the research challenge of efficiency in DU. The rise of LLMs has enabled a new generation of DU pipelines, which are more flexible and easier to maintain than separate and specialized subtask modules, but also more computationally demanding. Importantly, most LLMs are not designed to handle the multimodality and long context windows of multipage documents, and are often unaware of the visual and layout semantics of documents. The research questions for Chapter 6 address the efficiency challenge in DU: RQ 9. How can we efficiently infuse LLMs with semantic layout awareness for more focused information extraction? RQ 10. To what degree can model compression resolve the problem of efficiency in processing documents?