is ocr result as part of input in finetuned?
#4
by
wanbiguizhao
- opened
i use model(layoutlmv3-base-finetuned-publaynet ) to eval part of val dataset in publaynet , the result is very good:
thanks very much for your work!
[04/26 11:00:15] d2.evaluation.coco_evaluation INFO: Evaluation results for bbox:
AP | AP50 | AP75 | APs | APm | APl |
---|---|---|---|---|---|
98.772 | 100.000 | 100.000 | 88.232 | 94.161 | 100.000 |
[04/26 11:00:15] d2.evaluation.coco_evaluation INFO: Per-category bbox AP: | |||||
category | AP | category | AP | category | AP |
:----------- | :-------- | :----------- | :-------- | :----------- | :------- |
text | 98.920 | title | 95.794 | list | 99.147 |
table | 100.000 | figure | 100.000 |
my question is if i want ft layoutlmv3 on private dataset,did i need ocr my image which can used as part of input?
i read paper again,find the words :"We model this task as an object detection problem without text embedding, which is effective in existing works"
wanbiguizhao
changed discussion status to
closed