Hallucination after the title

#6
by james146 - opened

Hi,
while using the model, I've noticed that the title is followed by a non-existent text (hallucination). The model usually adds something like a glossary or table of contents.
I have turned off quantization and generation args stayed as in sample usage:

generation_args = { "max_new_tokens": 1024, "temperature": 0.1, "do_sample": False}

@james146 May I ask what your input looks like?
This model should (ideally) be used together with TFT-ID (https://huggingface.co/yifeihu/TFT-ID-1.0). The TFT-ID model will perform layout analysis on full document pages and identify the text sections (single-column, top-to-bottom order). TB-OCR can then perform OCR on those text sections.

Sign up or log in to comment