is ocr result as part of input in finetuned?

#4
by wanbiguizhao - opened

i use model(layoutlmv3-base-finetuned-publaynet ) to eval part of val dataset in publaynet , the result is very good:
thanks very much for your work!
[04/26 11:00:15] d2.evaluation.coco_evaluation INFO: Evaluation results for bbox:

AP AP50 AP75 APs APm APl
98.772 100.000 100.000 88.232 94.161 100.000
[04/26 11:00:15] d2.evaluation.coco_evaluation INFO: Per-category bbox AP:
category AP category AP category AP
:----------- :-------- :----------- :-------- :----------- :-------
text 98.920 title 95.794 list 99.147
table 100.000 figure 100.000

my question is if i want ft layoutlmv3 on private dataset,did i need ocr my image which can used as part of input?

i read paper again,find the words :"We model this task as an object detection problem without text embedding, which is effective in existing works"

wanbiguizhao changed discussion status to closed

Sign up or log in to comment