Pdf

#2
by nickmuchi - opened

Can one use a pdf instead of an image?

Impira org

Hi @nickmuchi absolutely! To clarify, this repo corresponds to the model (layoutlm-document-qa) which takes text + bounding boxes as input. We also created a library called DocQuery which includes tools for parsing images, PDFs, webpages, etc. and takes care of converting them into text + bounding boxes and feeding them into this model.

I'd recommend starting from DocQuery. Feel free to post any questions or issues you run into either here or on DocQuery's Github page as you explore.

Impira org

Oh and this space also lets you upload PDF files :). It uses DocQuery behind the scenes.

I have tried the space and works very well but was hoping to replicate that in my notebook using a PDF as other options only take images. Will start with DocQuery as you suggested, thank you!

nickmuchi changed discussion status to closed

Tried pip installing DocQuery and running it in jupyter-lab but getting a weird Symlink error to switch on Developer Mode on Windows. I tried switching it on and restarted but still getting the same error

image.png

Not sure if you have seen this before

nickmuchi changed discussion status to open
Impira org

I have not. That code is not part of DocQuery or this model, so I'm unfortunately less familiar with it. Judging by the stack trace you sent, it seems like it would occur under any circumstance where it fails to create a symlink. I would edit the source file (file_download.py) and add a bit more detail to that error message, e.g.

except OSError as e:
    if os.name == "nt":
        raise OSError(
            ...
            f"{str(e)}"
         )

to see what the actual error is. If possible, we should also move this discussion into DocQuery (https://github.com/impira/docquery/issues) so that if others encounter this, they can reference our discussion and the solution.

Sure that sounds fine, ye I realised I am getting the same error when I try download any HuggingFace tokenizer.from_pretrained. Have reported it to the forum too and saw a few Windows users are experiencing the same.

nickmuchi changed discussion status to closed

Sign up or log in to comment