Kleister NDA - Training code / Example Data

#1
by jordanparker6 - opened

Wondering if you could share the training code for this?

I have found the KleisterNDA dataset but I am unaware of the pre-processing required to transform the dataset into NER labels.

Hi Jordan, you can find the code for my experiments here: https://github.com/AleRosae/thesis-layoutlm
It includes both the pre-processing of Kleister-NDA and the training code for LayoutLMv1/v2/v3. Please notice that my code for pre-processing Kleister-NDA is based on the assumption that the .pdf files were processed using an internal software that extracts text + bounding boxes for the whole document, and not for a single page like most software do (e.g. Tesseract).

Cheers,
AR

@Sennodipoi thank for this.

I will try convert it using Tesseract and then upload the annotations and image to HF as a dataset.

jordanparker6 changed discussion status to closed

Sign up or log in to comment