HYPJUDY/layoutlmv3-base-finetuned-publaynet · is ocr result as part of input in finetuned？

Apr 26, 2023

i use model(layoutlmv3-base-finetuned-publaynet ) to eval part of val dataset in publaynet , the result is very good:
thanks very much for your work!
[04/26 11:00:15] d2.evaluation.coco_evaluation INFO: Evaluation results for bbox:

AP	AP50	AP75	APs	APm	APl
98.772	100.000	100.000	88.232	94.161	100.000
[04/26 11:00:15] d2.evaluation.coco_evaluation INFO: Per-category bbox AP:
category	AP	category	AP	category	AP
:-----------	:--------	:-----------	:--------	:-----------	:-------
text	98.920	title	95.794	list	99.147
table	100.000	figure	100.000

my question is if i want ft layoutlmv3 on private dataset,did i need ocr my image which can used as part of input?

wanbiguizhao

Apr 26, 2023

i read paper again,find the words :"We model this task as an object detection problem without text embedding, which is effective in existing works"

wanbiguizhao changed discussion status to closed Apr 26, 2023