working mechanism

#11
by BoccheseGiacomo - opened

I have a question: do this only works with text documents or also images? if i have a pdf formatted as image, do this work? and if i have a pdf with tables, do it convert all to raw text utf-8 or is able to process structures (images,tables,html text) as they are?

Thanks

As far as I can tell, it's just text from the images. and needs to be in a "segmentId" format.

However, check katanami here and also git https://github.com/katanaml/sparrow

thanks for the github repo, that's really cool

BoccheseGiacomo changed discussion status to closed

Sign up or log in to comment