--- license: apache-2.0 --- # Extract Legal Entities from Insurance Documents using BERT transfomers This model is a fine tuned BERT transfomers for NER of legal entities in Life Insurance demand letters. Dataset is publicly available here https://github.com/aws-samples/aws-legal-entity-extraction.git The model extracts the following entities: * Law Firm * Law Office Address * Insurance Company * Insurance Company Address * Policy Holder Name * Beneficiary Name * Policy Number * Payout * Required Action * Sender ## HF Space https://huggingface.co/spaces/aimlnerd/legal-entity-ner-transformers This space expose the model as gradio app and contains, training dataset and code for training. Dataset consists of legal requisition/demand letters for Life Insurance, however this approach can be used across any industry & document which may benefit from spatial data in NER training. ## Data preprocessing The OCRed data is present as JSON here ```data/raw_data/annotations```. I wrote this code to convert the JSON data in format suitable for HF TokenClassification ```source/services/ner/awscomprehend_2_ner_format.py``` ## Finetuning BERT Transformers model ```source/services/ner/train/train.py``` This code fine tune the BERT model and uploads to huggingface