Spaces:
Paused
Paused
File size: 1,791 Bytes
e0a78f5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
10 INTRODUCTION Chapter 4 reflects on the current state of DU research, and proposes guidelines to foster document dataset construction efforts. It introduces two novel document classification datasets, RVL-CDIP_MP and RVL-CDIP-N_MP, as extensions of the RVL-CDIP dataset [165] with multipage documents. The datasets are accompanied by a comprehensive experimental analysis, which shows promise from advancing multipage document representations and inference. Chapter 5 introduces the multi-faceted DUDE benchmark for assessing generic DU, that was also hosted as a competition to challenge the DU community. It describes the complete methodology and design of the dataset, targeting model innovations that can handle the complexity and variety of real-world documents and subtasks, and generalize to any documents and any questions. Next to a discussion of the competition results, it also presents our own comprehensive benchmarking study of SOTA LLMs with varying the context length and what modalities are represented. Chapter 6 investigates how to efficiently obtain more semantic document layout awareness. We explore what affects the teacher-student knowledge gap in KD-based model compression methods, and design a downstream task setup to evaluate the robustness of distilled DLA models on zero-shot layout-aware DocVQA. Finally, Chapter 7 concludes the thesis with a summary of the main contributions (Section 7.1), and a discussion of future research directions. As a logical followup to Chapter 5, we propose in Section 7.2.2.1 how the DUDE dataset could be extended to become the ‘ultimate’ DU benchmark. The thesis ends with a hypothetical, informed design of how the research presented would form part of an end-to-end, fully-fledged IA-DU solution (Section 7.2.2.2). |